Skip to content

Pinned Loading

  1. vllm vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 54.8k 9.3k

  2. llm-compressor llm-compressor Public

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Python 1.8k 197

  3. recipes recipes Public

    Common recipes to run vLLM

    94 16

Repositories

Showing 10 of 20 repositories
  • vllm-ascend Public

    Community maintained hardware plugin for vLLM on Ascend

    vllm-project/vllm-ascend’s past year of commit activity
    Python 979 Apache-2.0 321 294 (5 issues need help) 153 Updated Aug 11, 2025
  • vllm-spyre Public

    Community maintained hardware plugin for vLLM on Spyre

    vllm-project/vllm-spyre’s past year of commit activity
    Python 30 Apache-2.0 20 12 12 Updated Aug 11, 2025
  • vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    vllm-project/vllm’s past year of commit activity
    Python 54,819 Apache-2.0 9,286 1,808 (15 issues need help) 923 Updated Aug 11, 2025
  • aibrix Public

    Cost-efficient and pluggable Infrastructure components for GenAI inference

    vllm-project/aibrix’s past year of commit activity
    Go 4,007 Apache-2.0 422 202 (21 issues need help) 21 Updated Aug 11, 2025
  • llm-compressor Public

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    vllm-project/llm-compressor’s past year of commit activity
    Python 1,766 Apache-2.0 197 36 (7 issues need help) 33 Updated Aug 11, 2025
  • recipes Public

    Common recipes to run vLLM

    vllm-project/recipes’s past year of commit activity
    94 Apache-2.0 16 2 2 Updated Aug 10, 2025
  • guidellm Public

    Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

    vllm-project/guidellm’s past year of commit activity
    Python 480 Apache-2.0 66 56 (4 issues need help) 18 Updated Aug 9, 2025
  • flash-attention Public Forked from Dao-AILab/flash-attention

    Fast and memory-efficient exact attention

    vllm-project/flash-attention’s past year of commit activity
    Python 86 BSD-3-Clause 1,890 0 11 Updated Aug 9, 2025
  • production-stack Public

    vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

    vllm-project/production-stack’s past year of commit activity
    Python 1,649 Apache-2.0 254 64 (3 issues need help) 40 Updated Aug 8, 2025
  • vllm-xpu-kernels Public

    The vLLM XPU kernels for Intel GPU

    vllm-project/vllm-xpu-kernels’s past year of commit activity
    Python 4 Apache-2.0 7 0 4 Updated Aug 8, 2025