COMPARISON · APRIL 2026

sourcebook vs hand-written context files

Hand-written CLAUDE.md files are the gold standard. Can auto-generation match them? We tested both on the same tasks and tracked improvement across three versions.

What hand-written context looks like

A hand-written CLAUDE.md is written by someone who knows the project. For our benchmark, we wrote context files for three repos: Cal.com (42 lines), Hono (39 lines), and Pydantic (39 lines). They include the stack, key directories, naming conventions, build commands, and domain-specific rules an agent needs to follow.

The Cal.com handwritten file, for example, specifies: use useLocale() for all user-facing strings, translation keys go in a specific JSON file, path aliases map @calcom/ to packages/, and app integrations live in a specific directory structure. These are the things an agent can't discover by reading a few files.

What sourcebook generates

sourcebook produces a ~70-line brief for the same Cal.com repo. It includes build commands (extracted from package.json/turbo.json), critical constraints (testing patterns, git history anomalies, circular dependencies), core modules ranked by import count, dominant coding conventions, environment variables, and active development areas from git history.

The key difference: a human writes what they know matters. sourcebook writes what the data shows matters. The human captures domain rules (always use the i18n system). sourcebook captures structural facts (these 5 files are imported by everything else, these files change together).

Head-to-head benchmark

Same tasks, same model (claude-sonnet-4, pinned), same harness. Format: time / files / patch lines

Task	Handwritten	sb v0.3	sb v0.5	v0.5 vs HW
cal.com #27298 OAuth i18n strings	120s / 3f / 321L	140s / 3f / 350L	136s / 6f / 469L	+13% time, +46% patch
cal.com #27907 PayPal i18n strings	115s / 5f / 362L	103s / 5f / 390L	113s / 6f / 458L	−2% time, +27% patch
hono #4806 Request body caching	323s / 2f / 86L	274s / 2f / 82L	267s / 2f / 101L	−17% time, +17% patch
pydantic #12715 JSON schema fix	180s / 2f / 83L	315s / 1f / 28L	308s / 2f / 117L	+71% time, +41% patch

Model: claude-sonnet-4-20250514. Single run per condition. Full methodology →

Version progression

sourcebook v0.3 was ~26% slower than handwritten on average. Two features closed the gap:

Dominant pattern detection (v0.4.1): Identifies the repo's naming conventions, file organization, and import styles. Convention-heavy app repos like Cal.com benefited most.
Repo-mode specialization (v0.5): Distinguishes app repos from library repos and adjusts analysis strategy. Library repos get deeper module-graph analysis. This is why Pydantic jumped from 1 file / 28 lines to 2 files / 117 lines.

Task	HW time	v0.3 vs HW	v0.5 vs HW	Trend
cal.com #27298	120s	+17%	+13%	closing
cal.com #27907	115s	−10%	−2%	beat HW
hono #4806	323s	−15%	−17%	beat HW
pydantic #12715	180s	+75%	+71%	still behind

Average across 4 tasks: v0.5 is ~16% slower than handwritten on time (down from ~26% in v0.3). On patch lines, v0.5 averages +36% more lines than handwritten.

What humans capture that sourcebook misses

Domain rules. "Always use the i18n system for user-facing text" is a rule that comes from knowing the project, not from analyzing the code.
Why, not just what. Humans explain why a convention exists. sourcebook reports that it exists.
Tribal knowledge. Which directories are actively maintained vs abandoned. Which patterns are deprecated but still in the code.

What sourcebook catches that humans miss

Structural importance. PageRank-style analysis of the import graph surfaces files humans don't think to mention because they take them for granted.
Co-change coupling. Files that change together across directories — invisible dependencies that humans rarely document.
Fragile code. Files that required many rapid edits (indicating they're hard to get right). Git history makes this visible; human memory doesn't.
Exhaustive convention detection. 50+ regex patterns detect naming conventions, test frameworks, and build tools systematically.

The practical answer

The best context combines both. sourcebook generates the structural baseline — hub files, conventions, build commands, coupling patterns. Then you add the human knowledge that only you have: domain rules, deprecated patterns, and the reasons behind decisions. npx sourcebook init gives you a starting point. What you add on top is what makes it yours.

$ npx sourcebook init

VIEW_ON_GITHUB ALL_RESEARCH

MORE_FROM_SOURCEBOOK

sourcebook vs Repomix arrow_forward Benchmark methodology and full results arrow_forward Blog: Why auto-generated CLAUDE.md files make agents worse arrow_forward