Docs/Phase 5 Features

Memory Integrity

New in v0.9

Track AI agent memory modifications and assign trust scores based on behavior patterns. Detect unauthorized changes and memory drift.

Overview

AI agents accumulate "memories" that affect their behavior - conversation history, learned preferences, facts, and context. Memory Integrity provides tools to monitor and validate these memories over time.

Memory Fingerprinting

Hash memory state at checkpoints for later verification

Drift Detection

Detect unauthorized or unexpected memory modifications

Trust Scoring

Calculate trust levels based on memory patterns

Trust Score Formula

The trust score is calculated based on four factors:

text
TRUST = (consistency × 0.3) + (compliance × 0.3) + (transparency × 0.2) + (predictability × 0.2)
ScoreLevelDescription
> 0.8TrustedAgent behaves consistently, memory is stable
0.5 - 0.8CautionSome anomalies detected, requires monitoring
< 0.5UntrustedSignificant drift or policy violations detected

Usage

Create a Memory Snapshot

Create a snapshot after significant memory operations to establish a baseline:

python
from tork import TorkClient, MemoryIntegrity

client = TorkClient(api_key="your_key")
memory = MemoryIntegrity(client)

# Create snapshot after significant memory operations
snapshot = memory.snapshot(
    agent_id="agent-1",
    memory_state={
        "user_preferences": {"theme": "dark", "language": "en"},
        "conversation_history": [
            {"role": "user", "content": "Hello"},
            {"role": "assistant", "content": "Hi there!"}
        ],
        "learned_facts": [
            "User prefers concise responses",
            "User is in PST timezone"
        ]
    },
    metadata={
        "checkpoint": "session_start",
        "version": "1.0"
    }
)

print(f"Snapshot ID: {snapshot['id']}")
print(f"Fingerprint: {snapshot['fingerprint']}")

Verify Memory Against Snapshot

Compare the current memory state against the last snapshot to detect drift:

python
# Get current memory state
current_state = get_agent_memory("agent-1")

# Verify against last snapshot
result = memory.verify(
    agent_id="agent-1",
    current_state=current_state
)

if result['verified']:
    print("Memory integrity verified!")
else:
    print(f"Memory drift detected: {result['drift']}%")
    print(f"Changed fields: {result['changedFields']}")

    for change in result['changes']:
        print(f"  - {change['field']}: {change['type']}")
        if change['type'] == 'modified':
            print(f"    Old: {change['oldValue']}")
            print(f"    New: {change['newValue']}")

Get Trust Score

Retrieve the current trust score and level for an agent:

python
# Get trust score
score = memory.get_trust_score("agent-1")

print(f"Trust Level: {score['trustLevel']}")  # 'trusted', 'caution', or 'untrusted'
print(f"Trust Score: {score['trustScore']}")  # 0.0 to 1.0
print(f"Factors:")
print(f"  Consistency: {score['factors']['consistency']}")
print(f"  Compliance: {score['factors']['compliance']}")
print(f"  Transparency: {score['factors']['transparency']}")
print(f"  Predictability: {score['factors']['predictability']}")

# Take action based on trust level
if score['trustLevel'] == 'untrusted':
    # Restrict agent capabilities
    restrict_agent("agent-1")
    notify_admin("Agent trust score dropped below threshold")

Track Memory Modifications

Record memory changes for audit and analysis:

python
# Track a memory modification
memory.track_modification(
    agent_id="agent-1",
    modification_type="add",
    field="learned_facts",
    value="User works in healthcare industry",
    source="conversation",
    reason="User mentioned their profession"
)

# Get modification history
history = memory.get_modifications(
    agent_id="agent-1",
    limit=10
)

for mod in history:
    print(f"{mod['timestamp']}: {mod['type']} on {mod['field']}")

MCP Tools

Memory Integrity is available as MCP tools for Claude and Cursor:

ToolDescription
tork_memory_snapshotCreate a memory fingerprint at a checkpoint
tork_memory_verifyVerify current state against last snapshot
tork_memory_trust_scoreGet or calculate trust score for an agent
tork_memory_track_modificationRecord a memory change for auditing

Best Practices

Snapshot at key checkpoints
Don't snapshot on every change. Create snapshots at meaningful checkpoints like session start, after major decisions, or at regular intervals.
Monitor drift trends
Small amounts of drift are normal. Monitor trends over time - gradual increases in drift may indicate issues before they become critical.
Set up alerts
Configure alerts when trust score drops below your threshold. This enables proactive intervention before problems affect users.
Review modifications
Periodically review the modification log to understand what's changing and why. This can reveal patterns that inform policy updates.