Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 #2106

louie-tsai · 2025-07-01T01:27:18Z

Description

Added an additional compose.perf.yaml file, when users want to have more vLLM optimization, they just apply with one more yaml file during docker compose
docker compose -f compose.yaml -f compose.perf.yaml up

it includes most of the Xeon optimizations from public vLLM 0.9.2 which plan to be release this week.

Assume that we use a system with 2 NUMA nodes and AMX support.

Issues

[#2045 ]
[#2044 ]

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

vLLM 0.9.2 release
https://github.com/vllm-project/vllm/releases/tag/v0.9.2
https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo

Tests

Throughput speedup is ~1.3X. TTFT and TPOT are reduced to 75% comparing to original OPEA vLLM

perf_comparison.html

github-actions · 2025-07-01T01:27:29Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

ChatQnA/kubernetes/helm/cpu-values.yaml

ChatQnA/docker_compose/intel/cpu/xeon/compose.perf.yaml

CICD-at-OPEA · 2025-08-06T22:46:48Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

louie-tsai · 2025-08-15T06:34:24Z

all docker instances up and healthy on local env.

doing benchmark testing.

original compose.yaml also works

louie-tsai · 2025-08-16T18:09:26Z

@chensuyue
don't understand what wrong for the test failure. vLLM started normally in below diagram.
After I rolled back all changes for compose.yaml and test scripts, the issue is still there.
it doesn't look like an issue caused by this PR since the change only be enabled by having additional compose.perf.yaml during docker compose up.
could you help on that?

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

louie-tsai requested review from lvliang-intel and letonghan as code owners July 1, 2025 01:27

This was referenced Jul 1, 2025

[Feature] enable AMX support for vLLM on GNR/EMR/SPR #2045

Open

[Feature] Enable vLLM V1 feature and Tensor/Pipeline Parallel to improve Performance #2044

Closed

louie-tsai requested a review from chensuyue July 1, 2025 01:31

louie-tsai force-pushed the vllm-optimize branch 2 times, most recently from cccd778 to 7072d2c Compare July 1, 2025 01:34

chensuyue reviewed Jul 1, 2025

View reviewed changes

ChatQnA/kubernetes/helm/cpu-values.yaml Outdated Show resolved Hide resolved

chensuyue reviewed Jul 1, 2025

View reviewed changes

ChatQnA/docker_compose/intel/cpu/xeon/compose.perf.yaml Show resolved Hide resolved

louie-tsai force-pushed the vllm-optimize branch from 4c2b923 to e93933e Compare July 3, 2025 17:33

louie-tsai requested a review from chensuyue July 3, 2025 17:34

louie-tsai force-pushed the vllm-optimize branch 3 times, most recently from 3740497 to 2ee36fc Compare July 7, 2025 18:28

This was linked to issues Jul 9, 2025

[Feature] enable AMX support for vLLM on GNR/EMR/SPR #2045

Open

[Feature] Enable vLLM V1 feature and Tensor/Pipeline Parallel to improve Performance #2044

Closed

CICD-at-OPEA added the Stale label Aug 6, 2025

louie-tsai force-pushed the vllm-optimize branch 3 times, most recently from 8e0c6d5 to b19733c Compare August 13, 2025 01:22

CICD-at-OPEA removed the Stale label Aug 13, 2025

louie-tsai force-pushed the vllm-optimize branch from b19733c to 9169670 Compare August 13, 2025 23:41

louie-tsai changed the title ~~Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.9.2~~ Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 Aug 14, 2025

louie-tsai force-pushed the vllm-optimize branch 3 times, most recently from b5d3362 to f484e04 Compare August 15, 2025 06:32

louie-tsai force-pushed the vllm-optimize branch from f484e04 to 6d82af0 Compare August 15, 2025 06:40

louie-tsai force-pushed the vllm-optimize branch 2 times, most recently from 42b33da to bb1c060 Compare August 15, 2025 06:48

louie-tsai requested review from ZePan110, ftian1, lkk12014402, minmin-intel and rbrugaro as code owners August 15, 2025 06:48

louie-tsai force-pushed the vllm-optimize branch 2 times, most recently from b90bd59 to 83668ea Compare August 15, 2025 06:49

letonghan approved these changes Aug 15, 2025

View reviewed changes

louie-tsai force-pushed the vllm-optimize branch 3 times, most recently from 9b82ce3 to 368c98a Compare August 15, 2025 06:53

ZePan110 approved these changes Aug 15, 2025

View reviewed changes

louie-tsai force-pushed the vllm-optimize branch 6 times, most recently from e87a6d9 to fab7efa Compare August 16, 2025 18:09

louie-tsai added 7 commits August 16, 2025 11:09

changes to enable optimizatino from vLLM 0.9.2

f035e0a

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

adding CI test and new cpu-value-perf.yaml to address review feedback

b15833a

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

update docker compose for vllm 0.10.0

84a600e

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

update helm for vllm 0.10.0

c5ead24

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

add entrypoint for vllm

bd26554

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

For vLLM health check, using docker service name instead to host_ip

9f892e2

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

remove perf testing. only keep original compose.yaml testing

39f7187

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

louie-tsai force-pushed the vllm-optimize branch from fab7efa to 39f7187 Compare August 16, 2025 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 #2106

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 #2106

louie-tsai commented Jul 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

CICD-at-OPEA commented Aug 6, 2025

Uh oh!

louie-tsai commented Aug 15, 2025 •

edited

Loading

Uh oh!

louie-tsai commented Aug 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 #2106

Are you sure you want to change the base?

Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 #2106

Conversation

louie-tsai commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

github-actions bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Uh oh!

Uh oh!

CICD-at-OPEA commented Aug 6, 2025

Uh oh!

louie-tsai commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

louie-tsai commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

louie-tsai commented Jul 1, 2025 •

edited

Loading

github-actions bot commented Jul 1, 2025 •

edited

Loading

louie-tsai commented Aug 15, 2025 •

edited

Loading

louie-tsai commented Aug 16, 2025 •

edited

Loading