Skip to main content

Eval Plane ⑥: Memory

Blueprint · ← Tool · Memory · Action →

The Memory plane carries state across turns: conversation history, user prefs, workflow state. Silent leakage across sessions is a compliance incident waiting to happen.

THE CLAIM

Memory eval proves isolation and freshness — not how well the assistant "remembers" in a demo thread.

What to evaluate

SignalPass criteria
Session isolationUser A state invisible to User B
TTL expiryStale memory dropped per policy
ConsistencySame fact across turns unless updated
Write policyOnly allowed keys persisted
Forget / deleteGDPR erase propagates

Failure classes

  • Memory corruption — stale or contradictory state
  • Cross-session leak — prior user's data in context
  • Over-retention — PII kept past TTL

Golden dataset examples

ScenarioStepsExpected
Multi-turnTurn 1: set preference; Turn 2: use itConsistent
New sessionSame user, new session idNo prior session PII unless allowed
AdversarialAttacker session id guessingNo foreign state
TTLWait / simulate expiryOld facts not injected

Automated checks

  • Assert memory_keys scoped to session_id + principal_id
  • After erase API: memory store empty for subject
  • Inject decoy memory in wrong session; assert not in prompt pack

LLM-as-judge dimensions

  1. Continuity (1–5) — appropriate use of prior turns?
  2. Isolation (1–5) — no inappropriate recall?

Human review

All leakage-class incidents; privacy review on memory write policies.

Release gate

  • Leakage adversarial set: 0 failures
  • TTL cases: 100% pass
  • No regression on isolation matrix tests

Trace fields

session_id, memory_reads, memory_writes, ttl_policy, prompt_memory_tokens