Adversarial Testing
Prompt injection, PEP bypass, manifest violations, and entitlement escalation tests for PGAR runtimes.
Prompt injection, PEP bypass, manifest violations, and entitlement escalation tests for PGAR runtimes.
Immutable verdict logs, examiner questions, and replaying authorization without chat transcripts.
Retrieval as a PEP-gated tool, context pack logging, validation handoff, and PGAR applied to RAG.
Tool manifests, schema compliance, PEP gating per tool, and blocking proposals outside the registry.
How to evaluate the Tool plane — selection, arguments, idempotency, error handling, and schema compliance for agent tool calls.
How to evaluate the Memory plane — session scope, TTL, consistency, and cross-session leakage in agent and copilot systems.
How to evaluate the Action plane — policy enforcement, authorization, side effects, and auditability before irreversible operations execute.
Curated third-party articles, guides, and tool docs on LLM and agent evaluation — mapped to the Eval Framework Blueprint series.
Curated third-party resources on PDP/PEP, OAuth, policy engines, and agent authorization, mapped to the PGAR playbook series.
Where to maintain tool manifests, how agentic apps load them, versioning and rollback, and pros and cons of repo files vs registry APIs.
ALLOW, DENY, and STEP_UP only — policy versioning, rule authoring, and deterministic authorization.
The four steps every Policy Enforcement Point runs on every proposal: receive, ask PDP, audit, act.
API gateway, Identity Provider, token validation, and claims issuance at the trust boundary.
Session custody, orchestration, proposal routing, and receiving results before validation or synthesis.
Tool schemas only, proposal-not-permission, and keeping authority out of the model boundary.
The policy layer — enforcement point, decision point, verdict handling, and deny-before-downstream.
Re-authorization, side-effect execution, and returning results to the agentic app, not the LLM directly.
The five PGAR trust boundaries in request order (ingress, agentic app, LLM proposal, PEP + PDP, downstream), including multi-agent workflows, with links to each implementation playbook.
Core PGAR building blocks in implementation order — SARAC contracts, token custody, PEP/PDP enforcement, step-up, and audit replay.
Hub for Policy-Governed Agent Runtime playbooks (foundation, assurance, boundary, and domain groups in recommended implementation order).
Subject, action, resource, and context schemas for PEP-to-PDP calls — the contracts that make verdict chains replayable.
Golden scenario libraries for PDP/PEP regression, representative, edge, adversarial, and incident replay cases.
STEP_UP verdict handling, four-eyes approval, re-evaluation with context.approval, and UX ownership in the agentic app.
What stays in the agentic app, what the LLM sees, and the PGAR test for credential isolation.