COMPARISON · APRIL 2026

sourcebook vs Repomix

Two tools that give AI agents project context. Different approaches, different tradeoffs. Here's the data.

What each tool does

Repomix concatenates your repository into a single file. It walks the file tree, includes each file's contents, and produces one large text document that you paste into an LLM's context window. The output is a repo dump — a comprehensive snapshot of every file the tool decides to include.

sourcebook scans your repository and extracts structural knowledge — hub files, import graphs, co-change coupling, build commands, conventions, and anti-patterns. The output is a brief (typically 50-80 lines) that tells an agent how the project works without including the code itself. It generates CLAUDE.md, .cursorrules, or copilot-instructions.md.

Approach comparison

Dimension Repomix sourcebook
Method File concatenation Static analysis + git history + import graph
Output Full repo dump (one text file) Structured brief (CLAUDE.md / .cursorrules)
Output size Thousands to hundreds of thousands of lines 50–80 lines typically
Token cost High — scales with repo size Low — fixed-size brief regardless of repo
Privacy Full source code in output No source code in output — only structural facts
What it captures Code contents, file tree Hub files, coupling, conventions, anti-patterns, build commands
What it misses Structural relationships, conventions, git history patterns Actual code contents (agent reads files on demand)
Runtime Seconds (file walk) Seconds (analysis + git)
LLM required No No — pure static analysis

Benchmark results

We tested both tools on real GitHub issues using claude-sonnet-4 (pinned). Each condition was given the same task: fix the issue, produce a patch. We measured time to completion, tokens consumed, and patch thoroughness (files changed, lines in diff).

Format below: time / files changed / patch lines

Task No context Repomix sourcebook v0.5
cal.com #27298
OAuth i18n strings
241s / 5f / 381L 176s / 3f / 283L 136s / 6f / 469L
cal.com #27907
PayPal i18n strings
94s / 5f / 314L 121s / 5f / 314L 113s / 6f / 458L
hono #4806
Request body caching
253s / 1f / 31L 245s / 2f / 82L 267s / 2f / 101L
pydantic #12715
JSON schema fix
312s / 1f / 99L 208s / 1f / 35L 308s / 2f / 117L

Model: claude-sonnet-4-20250514 (pinned). Single run per condition. Green = best patch thoroughness for that task. Full methodology →

What the numbers show

On patch thoroughness, sourcebook v0.5 produced more comprehensive patches on 3 of 4 tasks — more files changed, more lines in the diff. On cal.com #27298, sourcebook's patch was 66% larger than Repomix's (469 vs 283 lines) while completing 23% faster (136s vs 176s).

On token efficiency, sourcebook's context file is orders of magnitude smaller. A 70-line brief vs a full repo dump means the agent spends more of its context window on reasoning, not reading.

The one exception: pydantic #12715, where Repomix completed faster (208s vs 308s) though with a much smaller patch (35 lines vs 117 lines). Library repos with deep module hierarchies remain harder for structural analysis than for brute-force inclusion.

What Repomix includes that sourcebook doesn't

What sourcebook captures that Repomix doesn't

When to use which

Use Repomix when you need to paste an entire small-to-medium repo into a single LLM prompt — for one-shot questions, code reviews, or when the LLM won't have file access. Best for repos under a few thousand files where token cost isn't a concern.

Use sourcebook when your agent has file access (Claude Code, Cursor, Copilot) and needs to understand project structure before making changes. Best for large repos, ongoing development, and when you want the agent to reason about architecture rather than just see code.

They're not mutually exclusive. Repomix gives the code. sourcebook explains how the project works. Different questions, different tools.

$ npx sourcebook init

MORE_FROM_SOURCEBOOK