The agent said done. It wasn’t.

1 · The work nobody pays for

On a regulated delivery, you became the agent’s missing parts — its memory, its verifier, its auditor. A suite went green, a PR merged, and a component that was never wired into a live path surfaced in production weeks later, owned by you. The proving was unpaid, invisible, and entirely yours.

Every one of these problems is already solved — by Claude Code, Neon, Temporal, Langfuse, Playwright, Semgrep, and a dozen other proven Tier-1 tools. What no one has done is wire them all together, gate them correctly, and ship it as a product.

2 · The closed loop

The only thing that can halt or redirect a running agent is a deterministic Claude Code command hook (PreToolUse, PostToolUse, Stop, SubagentStop, PreCompact, SessionStart) — not a prompt, not CLAUDE.md — and command hooks fail closed. Four bounded subagents (initializer, implementer, verifier, research) do the work; the verifier never grades its own output.

Deterministic gates decide completeness from verifiable facts only; model self-assessment informs, never gates.

When work exceeds its bounds it routes to HANDOFF — terminal, and distinct from COMPLETE. We found and fixed our own HANDOFF infinite-block defect; a proof company audits itself.

3 · Why now

Deterministic per-edit enforcement only recently became possible — command-type hooks that fail closed. Already solved, never assembled. The product is the inevitable assembly of tools you already trust.

Vendor-neutral by construction

Opinionated defaults, not a cage.

The platform takes strong positions on how work gets proven. None of those positions trap your record of it. Everything the engine produces leaves in formats you already own and tools you already run — so the proof of what was done outlives any decision to keep using us.

feature_list.json
The capability surface as a plain manifest.
It is a file you own — readable, diffable, and committable to your own repository.
EvidenceRecord
What was checked, by which independent verifier, with what verdict.
Structured JSON you can store, query, and re-verify outside this platform.
requirement-ID Baggage
The requirement that a unit of work satisfies, carried alongside it.
Standard distributed-tracing headers — the same propagation your existing tracing already reads. No proprietary format to adopt.

One evidence record, as it leaves the engine

{  "requirement_id": "REQ-PAYMENTS-REFUND",  "verified_by": "independent-verifier",  "gate_verdict": "satisfied",  // gate satisfied — independently verified  "baggage": "requirement-id=REQ-PAYMENTS-REFUND"}

In plain terms: a named requirement was checked by an independent verifier, the gate recorded a satisfied verdict, and the requirement’s id rides along as standard distributed-tracing headers — the same baggage propagation your existing tracing already reads, with no proprietary format to adopt. Export it, re-check it, or walk away with it.

On this page

The work nobody pays for
The closed loop
Why now