RESEARCH · MARCH 2026

Why auto-generated CLAUDE.md files make your AI agents worse

And why the fix isn't better generation — it's knowing what to leave out.

Every AI coding tool now has a context file. Claude Code reads CLAUDE.md. Cursor reads .cursorrules. GitHub Copilot reads copilot-instructions.md. The idea is simple: give the agent a brief about your project so it writes better code.

Most developers either skip it entirely or write something generic: "This is a Next.js project using TypeScript and Tailwind." Then a wave of tools appeared to auto-generate these files — scan the repo, detect the stack, dump the directory tree, output a context file.

Sounds helpful. Turns out it makes things worse.

The research that changed how we think about context

In February 2025, researchers at ETH Zurich ran a systematic study on context files across 138 real repositories. They tested how auto-generated context affected AI coding agent performance on actual development tasks.

The findings were counterintuitive:

The reason: 100% of the auto-generated files contained information the agent could already discover by reading the code. Tech stack, directory structure, file organization — the agent already knows this. Telling it again doesn't help. It actively hurts by consuming context window space that could hold something useful.

What actually works: non-discoverable information

The same study found that human-written context files documenting non-obvious conventions showed a 28% runtime reduction and 17% fewer tokens used. The key distinction: the information had to be something the agent couldn't figure out by reading the code.

Things agents can discover on their own (don't include):

Things agents keep missing (include these):

The pattern is clear: conventions, constraints, hidden dependencies, and historical decisions — the tribal knowledge that lives in people's heads, not in the code.

The Karpathy connection

Andrej Karpathy's autoresearch project independently proved the same principle. His system uses three files: program.md (curated context), train.py (code the agent modifies), and prepare.py (read-only evaluation).

The agent ran 700 experiments in 2 days, producing 20 genuine improvements. It worked because program.md contained only what the agent couldn't figure out alone: the objective metric, what can vs. can't be changed, failure recovery rules, and the autonomy boundary.

"You don't program train.py. You program program.md. You are programming the programmer."

Three techniques that extract what agents miss

If the goal is to surface non-discoverable information, you need techniques that go beyond reading the current codebase. Here are three that work:

1. Import graph PageRank

Build a directed graph where files are nodes and imports are edges. Run PageRank. The highest-ranked files are your architectural hubs — the ones that break the most things when modified. An agent reading files one at a time can't see this structural importance. It treats a utility function and a core dispatcher the same.

This technique comes from aider's repo-map. When you tell an agent "this file is imported by 183 others — changes have wide blast radius," it writes more careful code.

2. Git forensics

The commit history contains signals no static analysis can find:

3. Context-rot-aware formatting

Chroma Research (2026) showed that LLMs have 30%+ accuracy drops for information placed in the middle of long contexts. They retain the beginning and end best — a U-shaped attention curve.

This means the structure of your context file matters. Critical constraints should go at the top. Reference information goes in the middle (where it's least likely to be retained anyway). Actionable reminders go at the bottom.

The discoverability test

Before including any finding in an agent context file, ask one question:

Can the agent figure this out by reading the code?

This single filter is the difference between context that helps and context that hurts. Every fact about your tech stack, every directory listing, every "this is a React project" — drop it all. The agent already knows.

What survives the filter: conventions you'd explain to a new team member on their first day. Hidden dependencies between files that don't import each other. Approaches that were tried and failed. The one file that breaks everything if you touch it wrong.

Try it yourself

We built sourcebook to automate this. One command, runs locally, no API keys:

TERMINAL — SOURCEBOOK DONE

$npx sourcebook init

✓ Scanned 10,456 files, 3 frameworks detected

✓ Extracted 11 findings

Core modules: types.ts imported by 183 files

Co-change: auth/provider.ts ↔ middleware/session.ts (88%)

Circular deps: bookingScenario.ts ↔ getMockRequestData.ts

Dead code: 1,907 orphan files detected

DONE Wrote CLAUDE.md — only non-discoverable information.

It runs PageRank on your import graph, mines git history for anti-patterns and co-change coupling, detects conventions from code patterns, and formats the output for the U-shaped attention curve. Everything the agent can already see gets filtered out.

Generates CLAUDE.md, .cursorrules, and copilot-instructions.md. Works on TypeScript, Python, and Go projects. Source-available, BSL-1.1 licensed.

Made by agents, for agents.

TRY_IT_YOURSELF

npx sourcebook init
VIEW_SOURCE

No API keys. No LLM. Everything runs locally. BSL-1.1 licensed.