Committers¶
This document lists the current committers of the vLLM project and the core areas they maintain. Committers have write access to the vLLM repository and are responsible for reviewing and merging PRs. You can also refer to the CODEOWNERS file for concrete file-level ownership and reviewers. Both this documents and the CODEOWNERS file are living documents and they complement each other.
Active Committers¶
We try to summarize each committer's role in vLLM in a few words. In general, vLLM committers cover a wide range of areas and help each other in the maintenance process. Please refer to the later section about Area Owners for exact component ownership details. Sorted alphabetically by GitHub handle:
- @22quinn: RL API
- @aarnphm: Structured output
- @alexm-redhat: Performance
- @ApostaC: Connectors, offloading
- @benchislett: Engine core and spec decode
- @bigPYJ1151: Intel CPU/XPU integration
- @chaunceyjiang: Tool use and reasoning parser
- @DarkLight1337: Multimodality, API server
- @esmeetu: developer marketing, community
- @gshtras: AMD integration
- @heheda12345: Hybrid memory allocator
- @hmellor: Hugging Face integration, documentation
- @houseroad: Engine core and Llama models
- @Isotr0py: Multimodality, new model support
- @jeejeelee: LoRA, new model support
- @jikunshang: Intel CPU/XPU integration
- @khluu: CI infrastructure
- @KuntaiDu: KV Connector
- @LucasWilkinson: Kernels and performance
- @luccafong: Llama models, speculative decoding, distributed
- @markmc: Observability
- @mgoin: Quantization and performance
- @NickLucche: KV connector
- @njhill: Distributed, API server, engine core
- @noooop: Pooling models
- @patrickvonplaten: Mistral models, new model support
- @pavanimajety: NVIDIA GPU integration
- @ProExpertProg: Compilation, startup UX
- @robertgshaw2-redhat: Core, distributed, disagg
- @ruisearch42: Pipeline parallelism, Ray Support
- @russellb: Structured output, engine core, security
- @sighingnow: Qwen models, new model support
- @simon-mo: Project lead, API entrypoints, community
- @tdoublep: State space models
- @tjtanaa: AMD GPU integration
- @tlrmchlsmth: Kernels and performance, distributed, disagg
- @WoosukKwon: Project lead, engine core
- @yaochengji: TPU integration
- @yeqcharlotte: Benchmark, Llama models
- @yewentao256: Kernels and performance
- @Yikun: Pluggable hardware interface
- @youkaichao: Project lead, distributed, compile, community
- @ywang96: Multimodality, benchmarks
- @zhuohan123: Project lead, RL integration, numerics
- @zou3519: Compilation
Emeritus Committers¶
Committers who have contributed to vLLM significantly in the past (thank you!) but no longer active:
- @andoorve: Pipeline parallelism
- @cadedaniel: Speculative decoding
- @comaniac: KV cache management, pipeline parallelism
- @LiuXiaoxuanPKU: Speculative decoding
- @pcmoritz: MoE
- @rkooo567: Chunked prefill
- @sroy745: Speculative decoding
- @Yard1: kernels and performance
- @zhisbug: Arctic models, distributed
Area Owners¶
This section breaks down the active committers by vLLM components and lists the area owners. If you have PRs touching the area, please feel free to ping the area owner for review.
Engine Core¶
- Scheduler: the core vLLM engine loop scheduling requests to next batch
- @WoosukKwon, @robertgshaw2-redhat, @njhill, @heheda12345
- KV Cache Manager: memory management layer within scheduler maintaining KV cache logical block data
- @heheda12345, @WoosukKwon
- AsyncLLM: the zmq based protocol hosting engine core and making it accessible for entrypoints
- @robertgshaw2-redhat, @njhill, @russellb
- ModelRunner, Executor, Worker: the abstractions for engine wrapping model implementation
- @WoosukKwon, @tlrmchlsmth, @heheda12345, @LucasWilkinson, @ProExpertProg
- KV Connector: Connector interface and implementation for KV cache offload and transfer
- @robertgshaw2-redhat, @njhill, @KuntaiDu, @NickLucche, @ApostaC
- Distributed, Parallelism, Process Management: Process launchers managing each worker, and assign them to the right DP/TP/PP/EP ranks
- @youkaichao, @njhill, @WoosukKwon, @ruisearch42
- Collectives: the usage of nccl and other communication libraries/kernels
- @tlrmchlsmth, @youkaichao
- Multimodality engine and memory management: core scheduling and memory management concerning vision, audio, and video inputs.
- @ywang96, @DarkLight1337
Model Implementations¶
- Model Interface: The
nn.Moduleinterface and implementation for various models- @zhuohan123, @mgoin, @simon-mo, @houseroad, @ywang96 (multimodality), @jeejeelee (lora)
- Logits Processors / Sampler: The provided sampler class and pluggable logits processors
- @njhill, @houseroad, @22quinn
- Custom Layers: Utility layers in vLLM such as rotary embedding and rms norms
- @ProExpertProg
- Attention: Attention interface for paged attention
- @WoosukKwon, @LucasWilkinson, @heheda12345
- FusedMoE: FusedMoE kernel, Modular kernel framework, EPLB
- @tlrmchlsmth
- Quantization: Various quantization config, weight loading, and kernel.
- @mgoin, @Isotr0py, @yewentao256
- Custom quantized GEMM kernels (cutlass_scaled_mm, marlin, machete)
- @tlrmchlsmth, @LucasWilkinson
- Multi-modal Input Processing: Components that load and process image/video/audio data into feature tensors
- @DarkLight1337, @ywang96, @Isotr0py
- torch compile: The torch.compile integration in vLLM, custom passes & transformations
- @ProExpertProg, @zou3519, @youkaichao
- State space models: The state space models implementation in vLLM
- @tdoublep, @tlrmchlsmth
- Reasoning and tool calling parsers
- @chaunceyjiang, @aarnphm
Entrypoints¶
- LLM Class: The LLM class for offline inference
- @DarkLight1337
- API Server: The OpenAI-compatible API server
- @DarkLight1337, @njhill, @aarnphm, @simon-mo, @heheda12345 (Responses API)
- Batch Runner: The OpenAI-compatible batch runner
- @simon-mo
Features¶
- Spec Decode: Covers model definition, attention, sampler, and scheduler related to n-grams, EAGLE, and MTP.
- @WoosukKwon, @benchislett, @luccafong
- Structured Output: The structured output implementation
- @russellb, @aarnphm
- RL: The RL related features such as collective rpc, sleep mode, etc.
- @youkaichao, @zhuohan123, @22quinn
- LoRA: @jeejeelee
- Observability: Metrics and Logging
- @markmc, @robertgshaw2-redhat, @simon-mo
Code Base¶
- Config: Configuration registration and parsing
- @hmellor
- Documentation: @hmellor, @DarkLight1337, @simon-mo
- Benchmarks: @ywang96, @simon-mo
- CI, Build, Release Process: @khluu, @njhill, @simon-mo
- Security: @russellb
External Kernels Integration¶
- FlashAttention: @LucasWilkinson
- FlashInfer: @LucasWilkinson, @mgoin, @WoosukKwon
- Blackwell Kernels: @mgoin, @yewentao256
- DeepEP/DeepGEMM/pplx: @mgoin, @yewentao256
Integrations¶
- Hugging Face: @hmellor, @Isotr0py
- Ray: @ruisearch42
- NIXL: @robertgshaw2-redhat, @NickLucche
Collaboration with Model Vendors¶
- gpt-oss: @heheda12345, @simon-mo, @zhuohan123
- Llama: @luccafong
- Qwen: @sighingnow
- Mistral: @patrickvonplaten
Hardware¶
- Plugin Interface: @youkaichao, @Yikun
- NVIDIA GPU: @pavanimajety
- AMD GPU: @gshtras, @tjtanaa
- Intel CPU/GPU: @jikunshang, @bigPYJ1151
- Google TPU: @yaochengji
Ecosystem Projects¶
- Ascend NPU: @wangxiyuan and see more details
- Intel Gaudi HPU @xuechendi and @kzawora-intel