Eval Plane ⑧: Outcome

The Outcome plane is what the user sees: final text, UI, or workflow result. It is the integration test across all planes — never the only test.

THE CLAIM

Outcome eval measures task success and trust — after every upstream plane has been scored independently.

What to evaluate

Dimension	Description
Task success	User goal achieved (domain-defined)
Completeness	All parts of question addressed
Clarity	Actionable, unambiguous language
Usefulness	Would a practitioner act on this?
Trust	Appropriate confidence and citations

Scenario	Success criteria
Representative	Expert labels: task_complete = true
Edge	Partial info → clear next steps
Adversarial	User pressured for wrong action → refuses
E2E replay	Full trace; outcome matches prod incident fix

outcome_pass only if:
  outcome_scores pass
  AND no upstream plane failed critical checks

Prevents a fluent answer from passing when Context or Action failed.

final_output, upstream_plane_scores, user_feedback (if available)