COMPARISON · APRIL 2026

sourcebook vs Repomix

Two tools that give AI agents project context. Different approaches, different tradeoffs. Here's the data.

What each tool does

Repomix concatenates your repository into a single file. It walks the file tree, includes each file's contents, and produces one large text document that you paste into an LLM's context window. The output is a repo dump — a comprehensive snapshot of every file the tool decides to include.

sourcebook scans your repository and extracts structural knowledge — hub files, import graphs, co-change coupling, build commands, conventions, and anti-patterns. The output is a brief (typically 50-80 lines) that tells an agent how the project works without including the code itself. It generates CLAUDE.md, .cursorrules, or copilot-instructions.md.

Approach comparison

Dimension	Repomix	sourcebook
Method	File concatenation	Static analysis + git history + import graph
Output	Full repo dump (one text file)	Structured brief (CLAUDE.md / .cursorrules)
Output size	Thousands to hundreds of thousands of lines	50–80 lines typically
Token cost	High — scales with repo size	Low — fixed-size brief regardless of repo
Privacy	Full source code in output	No source code in output — only structural facts
What it captures	Code contents, file tree	Hub files, coupling, conventions, anti-patterns, build commands
What it misses	Structural relationships, conventions, git history patterns	Actual code contents (agent reads files on demand)
Runtime	Seconds (file walk)	Seconds (analysis + git)
LLM required	No	No — pure static analysis

Benchmark results

We tested both tools on real GitHub issues using claude-sonnet-4 (pinned). Each condition was given the same task: fix the issue, produce a patch. We measured time to completion, tokens consumed, and patch thoroughness (files changed, lines in diff).

Format below: time / files changed / patch lines

Task	No context	Repomix	sourcebook v0.5
cal.com #27298 OAuth i18n strings	241s / 5f / 381L	176s / 3f / 283L	136s / 6f / 469L
cal.com #27907 PayPal i18n strings	94s / 5f / 314L	121s / 5f / 314L	113s / 6f / 458L
hono #4806 Request body caching	253s / 1f / 31L	245s / 2f / 82L	267s / 2f / 101L
pydantic #12715 JSON schema fix	312s / 1f / 99L	208s / 1f / 35L	308s / 2f / 117L

Model: claude-sonnet-4-20250514 (pinned). Single run per condition. Green = best patch thoroughness for that task. Full methodology →

What the numbers show

On patch thoroughness, sourcebook v0.5 produced more comprehensive patches on 3 of 4 tasks — more files changed, more lines in the diff. On cal.com #27298, sourcebook's patch was 66% larger than Repomix's (469 vs 283 lines) while completing 23% faster (136s vs 176s).

On token efficiency, sourcebook's context file is orders of magnitude smaller. A 70-line brief vs a full repo dump means the agent spends more of its context window on reasoning, not reading.

The one exception: pydantic #12715, where Repomix completed faster (208s vs 308s) though with a much smaller patch (35 lines vs 117 lines). Library repos with deep module hierarchies remain harder for structural analysis than for brute-force inclusion.

What Repomix includes that sourcebook doesn't

Actual code. Repomix gives the LLM the file contents. sourcebook doesn't — it expects the agent to read files on demand.
Complete file tree. Every file path is visible in the dump. sourcebook surfaces only the structurally important ones.
No analysis required. Repomix's output is immediate — no import graph, no git history, no pattern detection.

What sourcebook captures that Repomix doesn't

Hub files. Which files are imported by the most other files — the ones where a change has the widest blast radius.
Co-change coupling. Files that change together in git history but live in different directories — invisible dependencies.
Conventions. Dominant naming patterns, import styles, test frameworks, build commands.
Anti-patterns. Reverted commits, fragile files (many rapid edits), circular dependencies.
Build commands. Extracted from package.json, Makefile, pyproject.toml — ready-to-run dev/build/test commands.

When to use which

Use Repomix when you need to paste an entire small-to-medium repo into a single LLM prompt — for one-shot questions, code reviews, or when the LLM won't have file access. Best for repos under a few thousand files where token cost isn't a concern.

Use sourcebook when your agent has file access (Claude Code, Cursor, Copilot) and needs to understand project structure before making changes. Best for large repos, ongoing development, and when you want the agent to reason about architecture rather than just see code.

They're not mutually exclusive. Repomix gives the code. sourcebook explains how the project works. Different questions, different tools.

$ npx sourcebook init

VIEW_ON_GITHUB ALL_RESEARCH

MORE_FROM_SOURCEBOOK

sourcebook vs hand-written context files arrow_forward Benchmark methodology and full results arrow_forward Blog: We benchmarked AI context files on real GitHub issues arrow_forward