What You Get When You Let AI Work Through the Night

What AI Builds While You Sleep

While you sleep, the AI writes code, and by morning the results have piled up. With a tool like Claude Code, that's a real workflow now. The trouble starts right after. You have no good way to tell whether any of it actually works.

Abhishek Ray, who has run Claude Code workshops for more than 100 engineers, took on exactly this problem. After adopting AI, his team's weekly code merges jumped from 10 to 40–50 — yet review time went up, not down. The building got faster while the checking stayed exactly as slow as before.

The Work to Do Before You Open the Editor

The answer Ray landed on turned out to be an old principle: before you write a line of code, write down what it's supposed to do. That's the heart of TDDtest-driven development.

The Trap of the "Self-Congratulation Machine"

The AI wrote the code, so why not just have the AI test it too? It sounds reasonable, but there's a catch.

When the AI writes the code and the AI writes the tests, what you end up verifying isn't "what the user wanted" — it's "what the AI thought the user wanted." If it misread the request, the tests still pass, because both the code and the tests share the same misunderstanding. Ray calls this a "self-congratulation machine."

Hiring more people doesn't solve it either. No team can keep up with the sheer volume the AI churns out.

The reason TDD never caught on before was that it was slow. You were already scrambling just to ship the feature; there was no room to design tests first. But AI has solved the speed problem. Now the only slow part left is confirming that the thing is correct.

Take a login feature. Instead of "build me a login," you write the criteria first.

— A correct email and password sends the user to /dashboard.
— A wrong password shows "Your email or password is incorrect."
— If any field is empty, the submit button is disabled.
— After 5 failed attempts in a row, access is blocked for 60 seconds and the wait time is shown.

You don't need a developer's background to write these. "When the user does this, this is what should happen" is enough.

You Only Have to Look at What Failed

Once the criteria are written, what follows is four steps.

Pre-check

→

Design the checks

→

Verify in the browser

→

Final verdict

AI-driven automated verification process

What changes for the PM? Instead of reading through a long list of code changes, you just look at a report like "Criterion 3 failed — submit button is not disabled when a field is empty." The point of failure is the thing you review.

One caveat is worth flagging. If the criteria themselves are wrong, wrong criteria will pass too. But the areas code review tends to miss — logic that's technically correct yet breaks in the browser, or conflicts between features — this approach catches well.

Someone Still Has to Set the Criteria

What matters here isn't the tool. It's the principle: before you hand work to an AI, define what "finished" looks like. Thinking through the edge cases of a feature you haven't even started feels slow.

But hand it off with no criteria, and you'll have to pore over the results from scratch once they come back. Whatever you skipped at the front, you pay for at the back.

It doesn't matter that you can't write code. What matters is that you can write the sentence "if you do this, this is what should come out." In the age of AI agents, quality control is exactly that — the ability to write those sentences.