BLOG

Research, techniques, and field notes on building for agents.

BENCHMARK · APRIL 2026

We Ran 30+ Real GitHub Issues Through an AI Agent. Here's What Actually Broke.

Static context, generated CLAUDE.md, handwritten briefs — nearly identical pass rates across 17 repos and 4 languages. The thing that caught failures was completely different.

READ →

ENGINEERING · APRIL 2026

AI Agents Don't Fail. They Stop Too Early.

I spent two weeks building context tools for AI coding agents. The data killed every hypothesis — then revealed the actual problem. Here's the full story of how sourcebook check was born.

READ →

BENCHMARK · APRIL 2026

Fast Agents Are Just Agents That Don't Wander

We ran 19 bug-fix tasks with repeat runs. The biggest finding wasn't speed — it was variance. Same task, same bug: handwritten context produced 21s to 252s. sourcebook produced 34s to 57s. A map, not a boost.

READ →

ENGINEERING · APRIL 2026

We Scanned 30+ Repos. Here's What Broke (And What We Fixed).

esbuild, vLLM, Polars, Biome, and 25 more. 12 scanner bugs found, 69 QA assertions added, and an engine that stopped hallucinating about codebases.

READ →

ANALYSIS · APRIL 2026

What We Found Scanning 15 Open-Source Repos With 100,000+ Files

Next.js, Cal.com, Supabase, Django, and 10 more — hub files, invisible coupling, and the traps agents fall into.

READ →

BENCHMARK · MARCH 2026

We Benchmarked AI Context Files on Real GitHub Issues. Handwritten Briefs Won. Then We Caught Up.

We tested four context strategies on real codebases — no context, handwritten briefs, repo dumps, and sourcebook. Here's what happened when we measured real patches on real issues.

READ →

RESEARCH · MARCH 2026

Why auto-generated CLAUDE.md files make your AI agents worse

ETH Zurich research shows auto-generated context files hurt performance by 2-3%. The only context that helps is what agents can't figure out alone. Three techniques that extract what they miss.

READ →