sourcebook

Your codebase has conventions your AI agents keep missing.

$ npx sourcebook init --format all
check_circle Scanned 10,456 files, 3 frameworks detected
check_circle Extracted 11 findings
Core modules: types.ts imported by 183 files
Circular deps: bookingScenario.ts ↔ getMockRequestData.ts
Co-change: auth/provider.ts ↔ middleware/session.ts (88%)
Dead code: 1,907 orphan files detected
DONE Wrote CLAUDE.md, .cursorrules, copilot-instructions.md
3.1s — only non-discoverable information.
ALGORITHM_01

Import Graph PageRank

Ranks every file by structural importance. Find the hubs that break everything when you touch them.

Hub: types.ts (183 importers)
Hub: fixtures.ts (86 importers)
Hub: http.ts (72 importers)
ALGORITHM_02

Git Forensics

Reverted commits are "don't do this" signals. Co-change coupling reveals invisible dependencies.

CO-CHANGE: auth.ts ↔ session.ts
CORRELATION: 88%
REVERTS: 2 found (anti-patterns)
ALGORITHM_03

Convention Detection

Naming patterns, export style, barrel exports, path aliases — the tribal knowledge no README captures.

EXPORTS: named (75:18 ratio)
IMPORTS: path alias @/ preferred
BARREL: 40 index.ts files

ANALYSIS_COMPARISON

WHAT_OTHERS_GENERATE
📁 src
📁 components
📄 button.tsx
📄 input.tsx
📁 utils
📄 helpers.ts
"This is a React project using Tailwind CSS."
WHAT_SOURCEBOOK_GENERATES [PRECISION_MODE]
CRITICAL_HUB: lib/dispatcher.ts
Imported by 42 files. Modification risk: HIGH.
CO-CHANGE_COUPLING
Changing auth/provider.ts usually requires updates in middleware/session.ts (88% correlation).
CIRCULAR_DEPENDENCY
models/user.ts → services/auth.ts → models/user.ts
terminal Context injected. Agent ready.

RESEARCH_FOUNDATION

ETH ZURICH, 2025

Auto-generated obvious context makes agents worse

LLM-generated context files reduced task success by 2-3% and increased inference costs by 20%+. Only non-discoverable information improves performance.

KARPATHY, 2026

program.md is the #1 lever for agent effectiveness

Autoresearch ran 700 experiments in 2 days because the curated context file contained only what the agent couldn't figure out alone.

AIDER

PageRank on import graphs for structural importance

Repo-map technique ranks files by how many other files depend on them. sourcebook uses this to identify architectural hubs.

CHROMA RESEARCH, 2026

LLMs lose 30%+ accuracy in the middle of long contexts

sourcebook places critical constraints at the top and bottom of output files — where LLMs pay the most attention.

READY_TO_INDEX?

npx sourcebook init content_copy
VIEW_SOURCE

No API keys. No LLM. Everything runs locally. MIT licensed.