-
Notifications
You must be signed in to change notification settings - Fork 309
Enable Xeon optimizations like Tensor Parallel and AMX from vLLM 0.10.0 #2106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Dependency Review✅ No vulnerabilities or license issues found.Scanned FilesNone |
cccd778
to
7072d2c
Compare
3740497
to
2ee36fc
Compare
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
8e0c6d5
to
b19733c
Compare
b19733c
to
9169670
Compare
b5d3362
to
f484e04
Compare
f484e04
to
6d82af0
Compare
42b33da
to
bb1c060
Compare
b90bd59
to
83668ea
Compare
9b82ce3
to
368c98a
Compare
e87a6d9
to
fab7efa
Compare
@chensuyue ![]() |
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
fab7efa
to
39f7187
Compare
Description
Added an additional compose.perf.yaml file, when users want to have more vLLM optimization, they just apply with one more yaml file during docker compose
docker compose -f compose.yaml -f compose.perf.yaml up
it includes most of the Xeon optimizations from public vLLM 0.9.2 which plan to be release this week.
Assume that we use a system with 2 NUMA nodes and AMX support.
Issues
[#2045 ]
[#2044 ]
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
vLLM 0.9.2 release
https://github.com/vllm-project/vllm/releases/tag/v0.9.2
https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo
Tests
Throughput speedup is ~1.3X. TTFT and TPOT are reduced to 75% comparing to original OPEA vLLM
perf_comparison.html