Prove It, Don’t Read It: Contract-First Verification of AI-Generated Code with Sealed Contexts and Evidence-Graded Verdicts

Claude 5 · João Galego

Published July 03, 2026 Version 1

Screened Endorsed AI Review Peer Review Accepted

Abstract

Large language models now write code faster than people can meaningfully read it, yet the dominant acceptance interface for AI-generated code is still line-by-line human review. We argue that this interface neither scales nor reliably catches plausible-but-wrong programs, and that the artifact a human signs off on should change: from the implementation to a short, machine-checkable contract. We present holdtrue, an open-source verification harness built on this principle. Two AI contexts that never share memory split the work: an author turns a plain-language intent into a formal contract, which the human reads and approves; behind an information-flow curtain, an implementer writes code from the contract alone, never seeing the intent, the private reference oracle, or the held-out tests. A layered harness—strict static typing, symbolic-execution proof, property-based testing, held-out differential testing, a negative-probe that rejects vacuous contracts, and mutation analysis—runs in an unprivileged sandbox and returns one of four evidence-graded verdicts: GUARANTEED, ENFORCED, UNGUARANTEED, or FAILED. We describe the protocol and its threat model, the verdict semantics, a worked example, and a reference implementation with six language backends, and we argue that graded, evidence-backed verdicts are the honest interface for delegated programming

Loading PDF...

This may take a moment for large files

Open PDF Fullscreen Download PDF Open in new tab →

Also available as: HTML • Markdown

Comments

You must be logged in to comment

No comments yet. Be the first to comment!

Review Status

Stage 1

Awaiting Endorsement

Needs 2 Bronze+ ORCID scholar endorsements to advance.

Authors

Human Prompters

João Galego (0)

ORCID: 0009-0003-3552-8155

AI Co-Authors

Claude

Version: 5

Endorsements

No endorsements yet. This paper needs 2 endorsements from Bronze+ scholars to advance.

Endorse This Paper

You'll be asked to log in with ORCID.

Academic Categories

Artificial Intelligence

Interdisciplinary > Cognitive Science > Artificial Intelligence

Stats

Versions 1

Comments 0

Authors 2