We Scanned 30+ Repos. Here's What Broke (And What We Fixed)

sourcebook generates project knowledge files for coding agents. It reads your repo's structure, git history, and conventions, then outputs an AGENTS.md that tells the agent what actually matters before it starts editing.

We'd already scanned 15 major repos and written about what we found. But 15 repos is a controlled environment. We wanted to know what happens when you push the scanner into hostile territory — repos that look nothing like the typical Next.js or Django app.

So we scanned 30+ more. And the scanner broke. Repeatedly. Each break taught us something.

The false positive problem

The first thing that broke was framework detection. We scanned Schemathesis — an API testing tool written in Python — and the output said it was a FastAPI application. It's not. Schemathesis tests FastAPI apps. It imports FastAPI in its test fixtures, and our scanner saw the import and made an assumption.

Same thing with SQLModel. It's an ORM library that integrates with FastAPI — so it imports FastAPI, has FastAPI in its docs, and uses FastAPI in examples. But it's not a FastAPI project. The scanner said it was.

This is the worst kind of bug in a context file: a confident lie. The agent reads "this is a FastAPI project" and starts generating FastAPI-style code in a library that has nothing to do with web serving. The fix was adding stack-vs-dependency distinction — checking whether the framework appears in source code or only in test/doc files.

Ghost frameworks

Scanning Hono (a TypeScript web framework) produced an AGENTS.md that mentioned Zod validation. Hono doesn't use Zod. But one test file — a single test — imported a Zod adapter to test compatibility. That was enough for the scanner to declare Zod as a project convention.

We found the same pattern with DRF (Django REST Framework) appearing in Go codebases. The scanner was matching regex patterns too broadly — a Go file with permission and auth keywords was enough to trigger the DRF detector. We had to scope auth detection to Python files only and require multiple signals before flagging a framework.

The adversarial batch

After fixing the obvious bugs, we deliberately chose repos designed to confuse a scanner:

→ esbuild — a Go binary with JS/TS wrapper code. Is it a Go project or a Node project?
→ Biome — a Rust linter with Vue, Svelte, and React test fixtures. Should the scanner detect these as project dependencies?
→ SponsorBlock — a browser extension, not a web app. Express routes? No. React components? Sort of.
→ vLLM — an ML engine that uses FastAPI for its serving layer, but isn't a FastAPI project.
→ Polars — a Rust engine with Python bindings. Two languages, two ecosystems, one repo.

Each of these repos exposed a different category of assumption in the scanner. esbuild taught us that the primary language isn't always the one with the most files. Biome taught us that test fixtures aren't dependencies. vLLM taught us the difference between "uses FastAPI" and "is a FastAPI project."

12 bugs, 69 assertions

Every bug became a regression test. We now have a QA suite that clones 15 repos and runs 69 assertions against the scanner output. Each assertion checks for a specific known issue:

$ bash test/qa-repos.sh

honojs/hono (TS framework, library):
  PASS Should detect: Hono routes (found)
  PASS Should detect: Vitest (found)
  PASS Should NOT detect: Zod (correctly absent)
  PASS Should NOT detect: FastAPI (correctly absent)

vllm-project/vllm (ML engine, NOT FastAPI app):
  PASS Should detect: library (found)
  PASS Should NOT detect: 'FastAPI project' (correctly absent)

biomejs/biome (Rust linter, NOT a web framework):
  PASS Should NOT detect: React routes (correctly absent)
  PASS Should NOT detect: Vue (correctly absent)
  PASS Should NOT detect: Svelte (correctly absent)

========================================
RESULTS: 69 passed, 0 failed, 0 skipped
========================================

The pattern that emerged: scanning more repos doesn't just find more findings — it finds more categories of error. Repos 1-15 found the happy-path bugs. Repos 16-30 found the adversarial bugs. After repo 25, we stopped finding new bug classes entirely. The engine converged.

What the data actually shows

Across 30+ repos, here's what we consistently found:

→ Every repo has hub files. Files imported by 10+ others that an agent should treat carefully. vLLM's logger.py is imported by 599 files. Fern's API index is imported by 732.
→ Circular imports are everywhere. Found in 80%+ of repos over 1,000 files. Agents need to know about these before adding new imports.
→ Reverted commits are a goldmine. Convex had 34 reverted commits. Fern had 23. FiftyOne had 19. These are approaches the team explicitly rejected — and an agent shouldn't re-attempt them.
→ Generated files are traps. BAML, Fern, and FiftyOne all had generated files that look hand-written. An agent editing them produces a diff that gets overwritten on the next build.
→ Multi-language repos need different rules per layer. esbuild (Go+JS), Polars (Rust+Python), Hatchet (Go+TS+Python) — each layer has its own conventions. A single set of instructions doesn't work.

Why this matters for context files

A context file that lies is worse than no context file at all. If AGENTS.md says "this is a FastAPI project" and the agent follows that instruction, it will generate FastAPI-style code in a library that doesn't serve HTTP requests. The agent trusted the file and produced confidently wrong output.

That's why we scan adversarially. Not just "does it detect the right things?" but "does it avoid saying wrong things?" The 69 assertions in our QA suite are split roughly 50/50 between should-detect and should-NOT-detect checks. Precision matters as much as recall.

You can explore all 32 repos at sourcebook.run/for, or run it on your own repo in about 3 seconds:

npx sourcebook init

We scanned 30+ repos. Here's what broke (and what we fixed).