The agent said done. It wasn’t.
1 · The work nobody pays for
On a regulated delivery, you became the agent’s missing parts — its memory, its verifier, its auditor. A suite went green, a PR merged, and a component that was never wired into a live path surfaced in production weeks later, owned by you. The proving was unpaid, invisible, and entirely yours.
Every one of these problems is already solved — by Claude Code, Neon, Temporal, Langfuse, Playwright, Semgrep, and a dozen other proven Tier-1 tools. What no one has done is wire them all together, gate them correctly, and ship it as a product.
2 · The closed loop
The only thing that can halt or redirect a running agent is a deterministic Claude Code command hook (PreToolUse, PostToolUse, Stop, SubagentStop, PreCompact, SessionStart) — not a prompt, not CLAUDE.md — and command hooks fail closed. Four bounded subagents (initializer, implementer, verifier, research) do the work; the verifier never grades its own output.
Deterministic gates decide completeness from verifiable facts only; model self-assessment informs, never gates.
When work exceeds its bounds it routes to HANDOFF — terminal, and distinct from COMPLETE. We found and fixed our own HANDOFF infinite-block defect; a proof company audits itself.
3 · Why now
Deterministic per-edit enforcement only recently became possible — command-type hooks that fail closed. Already solved, never assembled. The product is the inevitable assembly of tools you already trust.
Vendor-neutral by construction
Opinionated defaults, not a cage.
The platform takes strong positions on how work gets proven. None of those positions trap your record of it. Everything the engine produces leaves in formats you already own and tools you already run — so the proof of what was done outlives any decision to keep using us.
feature_list.jsonThe capability surface as a plain manifest.
It is a file you own — readable, diffable, and committable to your own repository.
EvidenceRecordWhat was checked, by which independent verifier, with what verdict.
Structured JSON you can store, query, and re-verify outside this platform.
requirement-ID BaggageThe requirement that a unit of work satisfies, carried alongside it.
Standard distributed-tracing headers — the same propagation your existing tracing already reads. No proprietary format to adopt.