Build: An AI Code Verifier
The companion build to Verifying AI Code. The single most useful pattern for AI-generated code, in ~60 lines: never accept the first draft on faith. Generate, verify, feed failures back, repeat.
What you’ll build
- A mock code model that returns a buggy draft, then a fix.
- A verifier that runs the candidate against tests in isolation.
- A loop that hands failures back to the model until green — or escalates.
- The lesson: the model writes; the verifier decides.
§ 00 · “ALMOST RIGHT”The expensive failure mode
Surveys keep finding the same thing: developers’ top AI frustration isn’t code that obviously breaks — it’s code that’s almost right. It compiles, it reads fine, and it’s subtly wrong. The fix isn’t a better model; it’s a verify loopverify loop. A control loop that generates a candidate, runs an automatic verifier (tests, types, lint), and if it fails, feeds the failure back to the model to try again — up to a bounded number of attempts before escalating to a human. that refuses to trust the first draft.
§ 01 · THE VERIFIERThe trust boundary
The verifier is the half of the loop that decides. It runs the candidate against tests and returns a readable verdict. In a real project this is your pytest run, type checker, and linter — ordered cheapest-first.
def check(source):
ns = {}
try:
exec(source, ns)
except Exception as e:
return False, f"code did not even run: {e}"
for call, expected in TESTS:
got = eval(call, ns)
if got != expected:
return False, f"{call} -> {got!r}, expected {expected!r}"
return True, f"all {len(TESTS)} tests passed"§ 02 · THE LOOPHand the failure back
feedback = None
for attempt in range(MAX_ATTEMPTS):
source = generate(TASK, feedback, attempt)
passed, report = check(source)
if passed:
return source # accept
feedback = report # the model gets to see why it failed
print("Escalate to a human.")The key line is feedback = report. A real model reads that failing-test output and corrects course — the same way you would.
§ 03 · RUN ITRejected, then accepted
$ python verify_loop.py
--- Attempt 1 ---
def add_numbers(a, b):
return str(a) + str(b)
verifier: add_numbers(2, 3) -> '23', expected 5
--- Attempt 2 ---
def add_numbers(a, b):
return a + b
verifier: all 3 tests passed
Accepted after 2 attempt(s).No human ever looked at the broken draft. That’s the win: a machine caught the “almost right” code, described the failure precisely, and the loop converged — with a bounded escape hatch to a human if it doesn’t.
§ · FURTHER READINGReferences & deeper sources
- (2022). CodeT: Code Generation with Generated Tests · arXiv
- (2023). Self-Refine: Iterative Refinement with Self-Feedback · NeurIPS
- (2024). Is Self-Repair a Silver Bullet for Code Generation? · ICLR
Original figures live in the linked sources — open the papers for the canonical visuals in their full context.