Methodology
Each repository was cloned to a local directory and scanned with sourcebook v0.8.3. The scan runs entirely locally with no API calls. It performs:
- File tree walk — inventories all files, detects languages and frameworks from package.json, pyproject.toml, go.mod, etc.
- Import graph construction — parses imports/requires across files, builds a dependency graph, ranks files by PageRank score to identify hub files.
- Git history analysis — examines recent commits for co-change coupling (files that change together), reverted commits, and fragile files (many rapid edits).
- Pattern matching — 50+ regex patterns detect naming conventions, test frameworks, auth patterns, routing approaches, and build tools.
- Hybrid sampling — for large repos, samples representative files rather than parsing everything. Ensures scans complete in seconds, not minutes.
Each finding is assigned a confidence level (high or medium) and a "discoverable" flag indicating whether an agent could reasonably find this information on its own. Only non-discoverable findings appear in the output brief.
Repos scanned
| Repo |
Language |
Type |
Explore |
| Next.js |
TypeScript |
Monorepo / Framework |
view → |
| Cal.com |
TypeScript |
Monorepo / App |
view → |
| Supabase |
TypeScript |
Monorepo / Platform |
view → |
| Django |
Python |
Framework |
view → |
| FastAPI |
Python |
Library |
view → |
| Express |
JavaScript |
Library |
view → |
| Flask |
Python |
Library |
view → |
| Hono |
TypeScript |
Library |
view → |
| Fastify |
JavaScript |
Library |
view → |
| Gin |
Go |
Library |
view → |
| Hugo |
Go |
Framework |
view → |
| Pydantic |
Python |
Library |
view → |
| shadcn/ui |
TypeScript |
Component library |
view → |
| Create T3 App |
TypeScript |
Scaffold / CLI |
view → |
| SQLModel |
Python |
Library |
view → |
Language breakdown: 7 TypeScript, 5 Python, 2 Go, 2 JavaScript (some repos detected as both TS and JS). Repo types: 3 monorepos, 5 libraries, 3 frameworks, 2 apps, 1 component library, 1 scaffold.
Aggregate findings
135 total findings across 15 repos. The most common categories:
- Hub files: Every repo has them. Files imported by 10+ other modules that most developers don't realize are critical chokepoints. Changing a hub file without understanding its blast radius causes cascading breakage.
- Co-change coupling: Files that consistently change together in git history but live in different directories. These invisible dependencies are the #1 source of missed changes when agents modify one file without touching its partner.
- Convention variance: Most repos have a dominant coding style (naming, imports, exports), but 60%+ also have pockets of legacy code that follow a different convention. Agents that learn the dominant pattern may produce inconsistent code in legacy areas.
- Generated file traps: Auto-generated files (lock files, compiled output, type declarations) that look like source code. Agents that try to edit these waste tokens and produce broken patches.
- Fragile code: Files that required many rapid edits in recent history — usually indicating areas that are hard to get right. Agents should approach these with extra caution.
What the scan doesn't capture
sourcebook's scan is structural, not semantic. It doesn't understand what the code does — only how it's organized, what depends on what, and how it's changed over time. It won't tell you that a function has a subtle off-by-one error, or that a particular API endpoint is deprecated. For that, you still need human judgment or a code-understanding LLM.
The scan also doesn't capture runtime behavior, test coverage percentages, or deployment configuration. It works with what's in the repo — source files and git history.