HOW VLLM
ACTUALLY WORKS
A high-throughput and memory-efficient inference and serving engine for LLMs. Conventions, patterns, and architecture extracted from the vllm-project/vllm repository by sourcebook.
WHAT_MATTERS
-
→
This is a publishable library, not an application. Focus changes on the public API surface.
KEY_FINDINGS
This is a publishable library, not an application. Focus changes on the public API surface.
HIGHHub files: vllm/logger.py (imported by 599 files), vllm/config/__init__.py (imported by 467 files). Changes here have the widest blast radius.
HIGHUses __init__.py as barrel exports. Import from the package, not from internal modules.
HIGHUse @dataclass for data structures. This is the project's standard validation approach.
HIGH+ 1 MORE FINDINGS (MEDIUM CONFIDENCE)
Third-party integrations live under vllm/plugins/. Each integration has its own directory.
MEDPlugin dirs: lora_resolvers, io_processors