At what point do we stop reading the code?
Boris Cherny built Claude Code.
He said they built it not for the models we have now but for the models six months out.
February 3rd made that future arrive early.
Opus 4.6 and GPT-5.3-Codex dropped the same day. Something shifted. Before that date, vibe coding meant staying in the loop. You'd describe a feature, watch what the agent did, nudge it, correct the thinking. After February 3rd you describe intent in loose natural language and the thing gets built. The back and forth collapsed. A cardiologist at a hackathon built a medical guidance tool. A road worker in Uganda built an infrastructure assessment pipeline from dashcam footage. Neither of them reviewed a diff. Neither of them knows what a diff is.
There are people right now texting from their phones while a Claude Code instance spawns on the same VPS their production SaaS lives on. The agent reads the codebase, makes decisions, touches files. OpenClaw blew up in January. Root access, no guardrails, security researchers lost their minds. The cowboys didn't care. They were shipping. Peter Steinberger built it, it hit 100,000 GitHub stars in a week, he joined OpenAI two months later. That's the world we're in.
Greptile published the research. The thing that predicts PR review time isn't lines of code. It's files changed. PRs touching 1-3 files merge in under 3 hours. PRs touching 25+ files take 68+ hours. The curve doesn't climb linearly. It goes vertical.
fix: update validation logic in auth handler
fix: handle edge case in stripe billing webhook
That's today. With the worst models we'll ever have.
An unconstrained agent given a feature request touches however many files it thinks are relevant. It optimizes. It cleans up things it noticed while it was there. It refactors the helper function that was technically related to what it was doing. Nobody told it not to. There was no contract.
Greptile is making bank because teams are drowning in this. Code velocity exploded. Review didn't scale with it. So now there's a market for AI that reviews AI code. A bandaid on a bandaid. The models get better, the diffs get bigger, the bandaid gets more expensive. At what point is it economically infeasible to keep reading diffs at all?
The Theater of Code Review
There's a ritual happening at engineering teams everywhere right now.
Everyone involved knows this is theater. Nobody says it out loud.
This is credential laundering. Converting "an AI did this" into "a human approved this" without actually understanding what happened or why. The reviewer can't interrogate the author because the author is a context window that no longer exists. The PR description was written by the same agent that wrote the code, summarizing what it did rather than what it was supposed to do.
Dario Amodei said most of what software engineers do will be done by AI within six to twelve months.
The models aren't stopping. The diffs aren't getting shorter. And nobody's talking about what happens when the agent knows the codebase better than you do.
The Blind Audit Trap
Then Entire announced.
Thomas Dohmke, former GitHub CEO, raised $60 million, the largest seed round in developer tools history. Their product records what your agent did. The transcript. The reasoning. The prompts. Stored alongside the code so you can trace what happened after.
That's the audit trail, and it matters. But here's the problem with solving for after: the agent already made every consequential decision before it touched a file. It decided what was in scope. It decided what to leave alone. It decided what done looked like. By the time there's a transcript to read, the damage, the drift, the misunderstanding, is already in the codebase.
A log of what happened is not a constraint on what could happen. You can't audit your way to trust.
The Evaporating Context
The top tier users already figured out the right instinct.
Plan mode. The agent reads your codebase, surfaces what it thinks needs to change, you read it, you say go. That's intent first. Everyone shipping seriously is already living here.
The problem is the plan evaporates.
Context closes. The chat scrolls away. What's left is the diff and a PR description the agent wrote about itself. The intent that produced the code is just gone.
That's the actual problem. Not the diff. Not the PR. The plan, the decision, the thing that determined everything that followed, has no persistent form. It lived in a context window and then it didn't.
Intent that persists.
Not a better diff tool. Not a smarter PR description. A Change Request. Scope declared before a file gets touched, invariants written down, acceptance criteria defined, what's off limits stated explicitly.
Written by the agent itself after it has read your codebase. Still there when something breaks six months from now and you need to know what the agent understood when it made the call.
The review changes shape entirely. You're not reading a diff anymore. You're reading a decision. You're evaluating whether the agent's understanding of the problem matches yours. Whether the scope it declared makes sense. Whether the invariants it identified are the right ones. That takes ten minutes.
And when the implementation is done, you're not asking does this code look right. You're asking did it stay within what was agreed. Either it touched files outside the declared scope or it didn't.
When the intent was right, the diff review gets faster and more focused. You're not reverse-engineering what the agent was thinking when it made the change because you already agreed on that before it started. You're just checking the work. When the intent was wrong, you catch it at the contract, before a file gets touched, not after 600 lines are already in the codebase.
The plan was always the review. The diff was always just evidence the plan happened.
Boris is building Claude Code for the models we'll have later. The models we'll have later don't need a human reading their output looking for mistakes. They need a human agreeing on the job before they start.
That's the shift. From reviewing code to reviewing intent.
How it plays out
This plays out differently depending on where you sit.
If you're solo
The agent makes a plan, you read it, Sophia writes the Change Request, the agent implements inside it. You're not adding a step. The intent is in the repo now. You probably won't look at it again. That's fine.
If you're solo but on a team
Nothing about your workflow changes. But when you open a PR, your teammate isn't reading a diff against nothing. They're reading a diff against a declared contract. Scope, invariants, acceptance criteria, what was explicitly off limits. The reviewer finally has something real to push back on instead of guessing at intent from 400 changed lines.
If your whole team is using it
The CR becomes the discussion artifact. The thing that gets debated in Slack and review comments is the contract before a line gets written. This already happens everywhere. In Granola transcripts, Linear threads, email chains nobody can find three weeks later. Sophia just formalizes it and attaches it to the change. The intent that used to live in a Slack thread lives in the repo next to the code it produced.
If your org cares about governance
When you need to know who approved what, what the declared scope was, what invariants were supposed to hold, it's in the repo, attached to the commit. Not because someone wrote good documentation after the fact. Because the agent wrote the contract before it started.
I built Sophia last week. It has 59 merged change requests, built using itself. Each one a structured record of a decision made before any code was written. Not because I'm disciplined about documentation. Because the agent wrote the contract before it started, and the contract is just there now, in the repo, attached to every change.
Drop the skill file into your repo.
Get Started
github.com/ithena-one/sophia