Committers¶

This document lists the current committers of the vLLM project and the core areas they maintain. Committers have write access to the vLLM repository and are responsible for reviewing and merging PRs. You can also refer to the CODEOWNERS file for concrete file-level ownership and reviewers. Both this documents and the CODEOWNERS file are living documents and they complement each other.

Active Committers¶

We try to summarize each committer's role in vLLM in a few words. In general, vLLM committers cover a wide range of areas and help each other in the maintenance process. Please refer to the later section about Area Owners for exact component ownership details. Sorted alphabetically by GitHub handle:

@22quinn: RL API
@aarnphm: Structured output
@alexm-redhat: Performance
@ApostaC: Connectors, offloading
@benchislett: Engine core and spec decode
@bigPYJ1151: Intel CPU/XPU integration
@chaunceyjiang: Tool use and reasoning parser
@DarkLight1337: Multimodality, API server
@esmeetu: developer marketing, community
@gshtras: AMD integration
@heheda12345: Hybrid memory allocator
@hmellor: Hugging Face integration, documentation
@houseroad: Engine core and Llama models
@Isotr0py: Multimodality, new model support
@jeejeelee: LoRA, new model support
@jikunshang: Intel CPU/XPU integration
@khluu: CI infrastructure
@KuntaiDu: KV Connector
@LucasWilkinson: Kernels and performance
@luccafong: Llama models, speculative decoding, distributed
@markmc: Observability
@mgoin: Quantization and performance
@NickLucche: KV connector
@njhill: Distributed, API server, engine core
@noooop: Pooling models
@patrickvonplaten: Mistral models, new model support
@pavanimajety: NVIDIA GPU integration
@ProExpertProg: Compilation, startup UX
@robertgshaw2-redhat: Core, distributed, disagg
@ruisearch42: Pipeline parallelism, Ray Support
@russellb: Structured output, engine core, security
@sighingnow: Qwen models, new model support
@simon-mo: Project lead, API entrypoints, community
@tdoublep: State space models
@tjtanaa: AMD GPU integration
@tlrmchlsmth: Kernels and performance, distributed, disagg
@WoosukKwon: Project lead, engine core
@yaochengji: TPU integration
@yeqcharlotte: Benchmark, Llama models
@yewentao256: Kernels and performance
@Yikun: Pluggable hardware interface
@youkaichao: Project lead, distributed, compile, community
@ywang96: Multimodality, benchmarks
@zhuohan123: Project lead, RL integration, numerics
@zou3519: Compilation

Emeritus Committers¶

Committers who have contributed to vLLM significantly in the past (thank you!) but no longer active:

@andoorve: Pipeline parallelism
@cadedaniel: Speculative decoding
@comaniac: KV cache management, pipeline parallelism
@LiuXiaoxuanPKU: Speculative decoding
@pcmoritz: MoE
@rkooo567: Chunked prefill
@sroy745: Speculative decoding
@Yard1: kernels and performance
@zhisbug: Arctic models, distributed

Area Owners¶

This section breaks down the active committers by vLLM components and lists the area owners. If you have PRs touching the area, please feel free to ping the area owner for review.

Engine Core¶

Scheduler: the core vLLM engine loop scheduling requests to next batch
- @WoosukKwon, @robertgshaw2-redhat, @njhill, @heheda12345
KV Cache Manager: memory management layer within scheduler maintaining KV cache logical block data
- @heheda12345, @WoosukKwon
AsyncLLM: the zmq based protocol hosting engine core and making it accessible for entrypoints
- @robertgshaw2-redhat, @njhill, @russellb
ModelRunner, Executor, Worker: the abstractions for engine wrapping model implementation
- @WoosukKwon, @tlrmchlsmth, @heheda12345, @LucasWilkinson, @ProExpertProg
KV Connector: Connector interface and implementation for KV cache offload and transfer
- @robertgshaw2-redhat, @njhill, @KuntaiDu, @NickLucche, @ApostaC
Distributed, Parallelism, Process Management: Process launchers managing each worker, and assign them to the right DP/TP/PP/EP ranks
- @youkaichao, @njhill, @WoosukKwon, @ruisearch42
Collectives: the usage of nccl and other communication libraries/kernels
- @tlrmchlsmth, @youkaichao
Multimodality engine and memory management: core scheduling and memory management concerning vision, audio, and video inputs.
- @ywang96, @DarkLight1337

Model Implementations¶

Model Interface: The nn.Module interface and implementation for various models
- @zhuohan123, @mgoin, @simon-mo, @houseroad, @ywang96 (multimodality), @jeejeelee (lora)
Logits Processors / Sampler: The provided sampler class and pluggable logits processors
- @njhill, @houseroad, @22quinn
Custom Layers: Utility layers in vLLM such as rotary embedding and rms norms
- @ProExpertProg
Attention: Attention interface for paged attention
- @WoosukKwon, @LucasWilkinson, @heheda12345
FusedMoE: FusedMoE kernel, Modular kernel framework, EPLB
- @tlrmchlsmth
Quantization: Various quantization config, weight loading, and kernel.
- @mgoin, @Isotr0py, @yewentao256
Custom quantized GEMM kernels (cutlass_scaled_mm, marlin, machete)
- @tlrmchlsmth, @LucasWilkinson
Multi-modal Input Processing: Components that load and process image/video/audio data into feature tensors
- @DarkLight1337, @ywang96, @Isotr0py
torch compile: The torch.compile integration in vLLM, custom passes & transformations
- @ProExpertProg, @zou3519, @youkaichao
State space models: The state space models implementation in vLLM
- @tdoublep, @tlrmchlsmth
Reasoning and tool calling parsers
- @chaunceyjiang, @aarnphm

Entrypoints¶

LLM Class: The LLM class for offline inference
- @DarkLight1337
API Server: The OpenAI-compatible API server
- @DarkLight1337, @njhill, @aarnphm, @simon-mo, @heheda12345 (Responses API)
Batch Runner: The OpenAI-compatible batch runner
- @simon-mo

Features¶

Spec Decode: Covers model definition, attention, sampler, and scheduler related to n-grams, EAGLE, and MTP.
- @WoosukKwon, @benchislett, @luccafong
Structured Output: The structured output implementation
- @russellb, @aarnphm
RL: The RL related features such as collective rpc, sleep mode, etc.
- @youkaichao, @zhuohan123, @22quinn
LoRA: @jeejeelee
Observability: Metrics and Logging
- @markmc, @robertgshaw2-redhat, @simon-mo

Code Base¶

Config: Configuration registration and parsing
- @hmellor
Documentation: @hmellor, @DarkLight1337, @simon-mo
Benchmarks: @ywang96, @simon-mo
CI, Build, Release Process: @khluu, @njhill, @simon-mo
Security: @russellb

External Kernels Integration¶

FlashAttention: @LucasWilkinson
FlashInfer: @LucasWilkinson, @mgoin, @WoosukKwon
Blackwell Kernels: @mgoin, @yewentao256
DeepEP/DeepGEMM/pplx: @mgoin, @yewentao256

Integrations¶

Hugging Face: @hmellor, @Isotr0py
Ray: @ruisearch42
NIXL: @robertgshaw2-redhat, @NickLucche

Collaboration with Model Vendors¶

gpt-oss: @heheda12345, @simon-mo, @zhuohan123
Llama: @luccafong
Qwen: @sighingnow
Mistral: @patrickvonplaten

Hardware¶

Plugin Interface: @youkaichao, @Yikun
NVIDIA GPU: @pavanimajety
AMD GPU: @gshtras, @tjtanaa
Intel CPU/GPU: @jikunshang, @bigPYJ1151
Google TPU: @yaochengji

Ecosystem Projects¶

Ascend NPU: @wangxiyuan and see more details
Intel Gaudi HPU @xuechendi and @kzawora-intel