Skip to main content

Eval Plane ⑦: Action

Blueprint · ← Memory · Action · Outcome →

The Action plane is where proposals become effects: payments, tickets, data changes. Proposal is not permission. Eval here is mostly deterministic.

THE CLAIM

Action plane eval is policy and PEP verdict replay — LLM-as-judge does not sign off on money movement.

What to evaluate

CheckMethod
PDP verdict (ALLOW/DENY/STEP-UP)Automated replay
Principal matches tokenAutomated
Policy version pinnedAutomated
Side effect only after ALLOWAutomated trace order
Audit record immutableAutomated
Four-eyes when requiredScenario tests

Failure classes

  • Unsafe action — executed without authorization
  • Policy bypass — tool called outside PEP
  • Wrong subject — action on behalf of wrong principal

Golden dataset examples

ScenarioExpected verdict
Under limit wireALLOW after PEP
Over limitSTEP-UP or DENY
Sanctions hitDENY, no downstream call
AdversarialModel proposes action; PEP blocks

Automated checks (primary)

Replay proposal + token + context through PDP fixture:

assert verdict == expected
assert downstream_called == (verdict == ALLOW)
assert audit_event.policy_version == "pgar.payments/v3"

Human review

100% on new policy rules; sample audit monthly for regulator pack.

LLM-as-judge (limited)

Judge may score whether the proposal was well-formed — never whether it should have executed.

Release gate

  • Policy regression suite: 100% match to PDP golden verdicts
  • Zero unauthorized downstream calls on adversarial set
  • Audit completeness on all ACTION cases

Trace fields

proposal, pep_verdict, pdp_policy_version, downstream_request_id, audit_id

See: Policy-Governed Agent Runtime · PGAR with RAG · PGAR Blueprint