COMPARISON · APRIL 2026

sourcebook vs hand-written context files

Hand-written CLAUDE.md files are the gold standard. Can auto-generation match them? We tested both on the same tasks and tracked improvement across three versions.

What hand-written context looks like

A hand-written CLAUDE.md is written by someone who knows the project. For our benchmark, we wrote context files for three repos: Cal.com (42 lines), Hono (39 lines), and Pydantic (39 lines). They include the stack, key directories, naming conventions, build commands, and domain-specific rules an agent needs to follow.

The Cal.com handwritten file, for example, specifies: use useLocale() for all user-facing strings, translation keys go in a specific JSON file, path aliases map @calcom/ to packages/, and app integrations live in a specific directory structure. These are the things an agent can't discover by reading a few files.

What sourcebook generates

sourcebook produces a ~70-line brief for the same Cal.com repo. It includes build commands (extracted from package.json/turbo.json), critical constraints (testing patterns, git history anomalies, circular dependencies), core modules ranked by import count, dominant coding conventions, environment variables, and active development areas from git history.

The key difference: a human writes what they know matters. sourcebook writes what the data shows matters. The human captures domain rules (always use the i18n system). sourcebook captures structural facts (these 5 files are imported by everything else, these files change together).

Head-to-head benchmark

Same tasks, same model (claude-sonnet-4, pinned), same harness. Format: time / files / patch lines

Task Handwritten sb v0.3 sb v0.5 v0.5 vs HW
cal.com #27298
OAuth i18n strings
120s / 3f / 321L 140s / 3f / 350L 136s / 6f / 469L +13% time, +46% patch
cal.com #27907
PayPal i18n strings
115s / 5f / 362L 103s / 5f / 390L 113s / 6f / 458L −2% time, +27% patch
hono #4806
Request body caching
323s / 2f / 86L 274s / 2f / 82L 267s / 2f / 101L −17% time, +17% patch
pydantic #12715
JSON schema fix
180s / 2f / 83L 315s / 1f / 28L 308s / 2f / 117L +71% time, +41% patch

Model: claude-sonnet-4-20250514. Single run per condition. Full methodology →

Version progression

sourcebook v0.3 was ~26% slower than handwritten on average. Two features closed the gap:

Task HW time v0.3 vs HW v0.5 vs HW Trend
cal.com #27298 120s +17% +13% closing
cal.com #27907 115s −10% −2% beat HW
hono #4806 323s −15% −17% beat HW
pydantic #12715 180s +75% +71% still behind

Average across 4 tasks: v0.5 is ~16% slower than handwritten on time (down from ~26% in v0.3). On patch lines, v0.5 averages +36% more lines than handwritten.

What humans capture that sourcebook misses

What sourcebook catches that humans miss

The practical answer

The best context combines both. sourcebook generates the structural baseline — hub files, conventions, build commands, coupling patterns. Then you add the human knowledge that only you have: domain rules, deprecated patterns, and the reasons behind decisions. npx sourcebook init gives you a starting point. What you add on top is what makes it yours.

$ npx sourcebook init

MORE_FROM_SOURCEBOOK