Skip to content

Nixl optimization for llama4 local attention #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 59 commits into
base: pd-launch-branch
Choose a base branch
from

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented May 15, 2025

No description provided.

ekagra-ranjan and others added 30 commits May 14, 2025 12:31
…aft model to free ~1GB for llama 3 model (vllm-project#17326)

Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…llm-project#18013)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Andy Xie <andy.xning@gmail.com>
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Signed-off-by: David Xia <david@davidxia.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
tjtanaa and others added 23 commits May 15, 2025 09:53
… unquantizedMethod to reenable LLama4 BF16 (vllm-project#18205)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
…-project#18229)

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
…attention on ROCm (vllm-project#18093)

Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com>
Signed-off-by: skylee-01 <497627264@qq.com>
Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: David Xia <david@davidxia.com>
vllm-project#17973)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale label Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.