AI Governance

Grep 'n Guess: The Research Caught Up

March 11, 2026 Jay Nelson 20 mins

A while back, I wrote about a pattern I'd been seeing in AI-assisted development — something I called grep 'n guess. AI agents scanning codebases, inferring intent from patterns, filling gaps with what usually works. Confident output. Often wrong in ways difficult to catch.

The argument was simple: the problem isn't AI capability. It's knowledge architecture. When business rules live in people's heads, wikis nobody maintains, and code carrying decisions without recording the reasons — any system acting on that codebase is guessing. Humans included. AI just guesses faster, at scale, without raising its hand to ask.

This week, ETH Zurich published the data.

The study

The developer community has been converging on a fix for the context problem: shared context files. CLAUDE.md, AGENTS.md, meta repos with hierarchical instructions — prose files giving AI agents visibility into project conventions, architecture, and rules. Over 60,000 open-source repositories now contain them.

It's real progress. Agents finally have visibility across codebases. But visibility is not compliance.

ETH Zurich tested whether these context files actually improve agent performance. The results should give pause to anyone relying on prose instructions as a governance mechanism:

LLM-generated context files degraded agent performance — reducing task success rates by an average of 3% compared to providing no context file at all, while increasing inference costs by over 20%.

Human-written files fared slightly better, producing a marginal 4% improvement in success rate. But they also increased costs by up to 19%, and the agents' trace data revealed why: agents dutifully followed the instructions. They ran more tests, read more files, executed more grep searches, performed more code-quality checks. Thorough behavior — often unnecessary for the task at hand. Extra context forced the reasoning models to "think" harder without yielding better patches.

As Addy Osmani noted in his analysis: "The auto-generated content isn't useless. It's redundant. The agent could find all of it anyway by reading the repo."

This is grep 'n guess validated by controlled experiment. The agent read the rules. The agent followed the rules. The outcomes didn't meaningfully improve — because prose instructions consumed through pattern matching don't change the fundamental dynamic. The agent saw the rule. It did not provably satisfy it.

Why this matters more than it looks

The context file debate is about code conventions and build commands. But the deeper version of this problem lives in business logic — the rules nobody wrote into any file, context or otherwise.

Every mature codebase has layers of decisions embedded in it. Why does this function check for null before proceeding? Because a production incident in 2019 where a null reference brought down the billing system. Why does the discount calculation have a hardcoded exception for accounts older than five years? Because the VP of Sales negotiated it in 2017 and someone coded it directly into the logic.

These decisions are invisible to an AI assistant. No comment. No document. A human who was there, and a codebase carrying the scar tissue of the decision without recording the reason.

When an AI assistant encounters this code, it sees the what but not the why. When it modifies, extends, or refactors, it has no way to know which patterns are load-bearing business decisions and which are incidental implementation choices.

The result: confident code that quietly violates business rules nobody wrote down.

The real problem is structural

This is not an AI problem. It is a knowledge architecture problem AI makes urgent.

When humans maintained the codebase, tribal knowledge was a tolerable risk. The people carrying the context were the same people making changes. Knowledge and action lived in the same head.

AI breaks that coupling. Knowledge stays in Sarah's head. Action moves to the AI assistant. Nobody notices the gap until generated code does something business rules don't allow — rules nobody realized were unwritten.

The question is not "how do we make AI smarter about our business rules?" It is "how do we make our business rules accessible to anything — human or machine — that needs to act on them?"

What structured knowledge looks like

The fix is not better AI. It is better knowledge organization.

Business rules living in prose documents, wikis, and people's heads are advisory at best. They describe intent. They don't constrain behavior. An AI assistant — like a new hire — can read them and still get implementation wrong because the description is ambiguous, incomplete, or contradicted by actual code.

Business rules that are structured — queryable, explicit, connected to code and systems they govern — are a different category of knowledge. They can be consumed by humans and machines alike. They don't depend on someone remembering the context from 2019.

The difference between "Sarah knows that rule" and "that rule is in the system" is the difference between knowledge that works for a team of ten and knowledge that works for a team augmented by AI.

The question worth asking

If you're adopting AI coding tools — or any AI tooling acting on your business logic — ask yourself:

Could an AI find and correctly apply every business rule in your system without asking a human?

If the answer is no, the AI will do what any reasonable system does when it lacks information: it will guess. Confidently. At scale.

We're exploring this territory further in the coming weeks — the gap between what our systems know and what they can enforce, and what it means for organizations moving fast with AI. There is a structural problem here worth understanding before the codebase gets too far ahead of the governance.

Sources

Gloaguen, T., Mündler, N., Müller, M., Raychev, V., Vechev, M. "Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?" ETH Zurich, February 2026.
InfoQ. "New Research Reassesses the Value of AGENTS.md Files for AI Coding." March 6, 2026.
Osmani, A. "Stop Using /init for AGENTS.md." AddyOsmani.com, 2026.
METR. "Measuring the Impact of AI Coding Tools on Developer Productivity." 2025.
GitClear. "AI-Assisted Code Duplication Analysis." 2025.

Join the conversation Discuss on LinkedIn →

Grep 'n Guess: The Research Caught Up

The study

Why this matters more than it looks

The real problem is structural

What structured knowledge looks like

The question worth asking

Sources

How to Actually Benchmark a VPS: What a Day of Testing Taught Us About Getting It Right

Natural Selection - About This Series

Natural Selection — Week 11, 2026

The study

Why this matters more than it looks

The real problem is structural

What structured knowledge looks like

The question worth asking

Sources

Related Writing

How to Actually Benchmark a VPS: What a Day of Testing Taught Us About Getting It Right

Natural Selection - About This Series

Natural Selection — Week 11, 2026