What each tool does
Repomix concatenates your repository into a single file. It walks the file tree, includes each file's contents, and produces one large text document that you paste into an LLM's context window. The output is a repo dump — a comprehensive snapshot of every file the tool decides to include.
sourcebook scans your repository and extracts structural knowledge — hub files, import graphs, co-change coupling, build commands, conventions, and anti-patterns. The output is a brief (typically 50-80 lines) that tells an agent how the project works without including the code itself. It generates CLAUDE.md, .cursorrules, or copilot-instructions.md.
Approach comparison
Benchmark results
We tested both tools on real GitHub issues using claude-sonnet-4 (pinned). Each condition was given the same task: fix the issue, produce a patch. We measured time to completion, tokens consumed, and patch thoroughness (files changed, lines in diff).
Format below: time / files changed / patch lines
What the numbers show
On patch thoroughness, sourcebook v0.5 produced more comprehensive patches on 3 of 4 tasks — more files changed, more lines in the diff. On cal.com #27298, sourcebook's patch was 66% larger than Repomix's (469 vs 283 lines) while completing 23% faster (136s vs 176s).
On token efficiency, sourcebook's context file is orders of magnitude smaller. A 70-line brief vs a full repo dump means the agent spends more of its context window on reasoning, not reading.
The one exception: pydantic #12715, where Repomix completed faster (208s vs 308s) though with a much smaller patch (35 lines vs 117 lines). Library repos with deep module hierarchies remain harder for structural analysis than for brute-force inclusion.
What Repomix includes that sourcebook doesn't
- Actual code. Repomix gives the LLM the file contents. sourcebook doesn't — it expects the agent to read files on demand.
- Complete file tree. Every file path is visible in the dump. sourcebook surfaces only the structurally important ones.
- No analysis required. Repomix's output is immediate — no import graph, no git history, no pattern detection.
What sourcebook captures that Repomix doesn't
- Hub files. Which files are imported by the most other files — the ones where a change has the widest blast radius.
- Co-change coupling. Files that change together in git history but live in different directories — invisible dependencies.
- Conventions. Dominant naming patterns, import styles, test frameworks, build commands.
- Anti-patterns. Reverted commits, fragile files (many rapid edits), circular dependencies.
- Build commands. Extracted from package.json, Makefile, pyproject.toml — ready-to-run dev/build/test commands.
When to use which
Use Repomix when you need to paste an entire small-to-medium repo into a single LLM prompt — for one-shot questions, code reviews, or when the LLM won't have file access. Best for repos under a few thousand files where token cost isn't a concern.
Use sourcebook when your agent has file access (Claude Code, Cursor, Copilot) and needs to understand project structure before making changes. Best for large repos, ongoing development, and when you want the agent to reason about architecture rather than just see code.
They're not mutually exclusive. Repomix gives the code. sourcebook explains how the project works. Different questions, different tools.