DATA · APRIL 2026

15 open-source repos: raw scan data

85,000+ files scanned, 135 findings extracted. Per-repo statistics, methodology, and what the data reveals about real codebase structure.

Methodology

Each repository was cloned to a local directory and scanned with sourcebook v0.8.3. The scan runs entirely locally with no API calls. It performs:

Each finding is assigned a confidence level (high or medium) and a "discoverable" flag indicating whether an agent could reasonably find this information on its own. Only non-discoverable findings appear in the output brief.

Repos scanned

Repo Language Type Explore
Next.js TypeScript Monorepo / Framework view →
Cal.com TypeScript Monorepo / App view →
Supabase TypeScript Monorepo / Platform view →
Django Python Framework view →
FastAPI Python Library view →
Express JavaScript Library view →
Flask Python Library view →
Hono TypeScript Library view →
Fastify JavaScript Library view →
Gin Go Library view →
Hugo Go Framework view →
Pydantic Python Library view →
shadcn/ui TypeScript Component library view →
Create T3 App TypeScript Scaffold / CLI view →
SQLModel Python Library view →

Language breakdown: 7 TypeScript, 5 Python, 2 Go, 2 JavaScript (some repos detected as both TS and JS). Repo types: 3 monorepos, 5 libraries, 3 frameworks, 2 apps, 1 component library, 1 scaffold.

Aggregate findings

135 total findings across 15 repos. The most common categories:

What the scan doesn't capture

sourcebook's scan is structural, not semantic. It doesn't understand what the code does — only how it's organized, what depends on what, and how it's changed over time. It won't tell you that a function has a subtle off-by-one error, or that a particular API endpoint is deprecated. For that, you still need human judgment or a code-understanding LLM.

The scan also doesn't capture runtime behavior, test coverage percentages, or deployment configuration. It works with what's in the repo — source files and git history.

$ npx sourcebook init

MORE_FROM_SOURCEBOOK