Skip to content

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 #2106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

louie-tsai
Copy link
Collaborator

@louie-tsai louie-tsai commented Jul 1, 2025

Description

Added an additional compose.perf.yaml file, when users want to have more vLLM optimization, they just apply with one more yaml file during docker compose
docker compose -f compose.yaml -f compose.perf.yaml up

it includes most of the Xeon optimizations from public vLLM 0.9.2 which plan to be release this week.

Assume that we use a system with 2 NUMA nodes and AMX support.

Issues

[#2045 ]
[#2044 ]

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

vLLM 0.9.2 release
https://github.com/vllm-project/vllm/releases/tag/v0.9.2
https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo

Tests

Throughput speedup is ~1.3X. TTFT and TPOT are reduced to 75% comparing to original OPEA vLLM

image image image

perf_comparison.html

Copy link

github-actions bot commented Jul 1, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

@CICD-at-OPEA
Copy link
Collaborator

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@louie-tsai louie-tsai force-pushed the vllm-optimize branch 3 times, most recently from 8e0c6d5 to b19733c Compare August 13, 2025 01:22
@louie-tsai louie-tsai changed the title Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.9.2 Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 Aug 14, 2025
@louie-tsai louie-tsai force-pushed the vllm-optimize branch 3 times, most recently from b5d3362 to f484e04 Compare August 15, 2025 06:32
@louie-tsai
Copy link
Collaborator Author

louie-tsai commented Aug 15, 2025

all docker instances up and healthy on local env.
image
doing benchmark testing.

original compose.yaml also works
image

@louie-tsai louie-tsai force-pushed the vllm-optimize branch 2 times, most recently from 42b33da to bb1c060 Compare August 15, 2025 06:48
@louie-tsai louie-tsai force-pushed the vllm-optimize branch 2 times, most recently from b90bd59 to 83668ea Compare August 15, 2025 06:49
@louie-tsai louie-tsai force-pushed the vllm-optimize branch 3 times, most recently from 9b82ce3 to 368c98a Compare August 15, 2025 06:53
@louie-tsai louie-tsai force-pushed the vllm-optimize branch 6 times, most recently from e87a6d9 to fab7efa Compare August 16, 2025 18:09
@louie-tsai
Copy link
Collaborator Author

louie-tsai commented Aug 16, 2025

@chensuyue
don't understand what wrong for the test failure. vLLM started normally in below diagram.
After I rolled back all changes for compose.yaml and test scripts, the issue is still there.
it doesn't look like an issue caused by this PR since the change only be enabled by having additional compose.perf.yaml during docker compose up.
could you help on that?

image

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
    Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] enable AMX support for vLLM on GNR/EMR/SPR [Feature] Enable vLLM V1 feature and Tensor/Pipeline Parallel to improve Performance
5 participants