[Inference] Support group-wize quantize for weight_quantize op in GPU #71549

zeroRains · 2025-03-11T05:13:06Z

PR Category

Inference

PR Types

Performance

Description

给weight_quantize op的group-wize quantize(int8, int4_col, int4_row)添加GPU kernel

deepseek-V2验证成功（已将deepseek-V2 modeling中转成cpu的逻辑删除）

PaddlePaddle/PaddleNLP#10174

python ./predict/predictor.py --model_name_or_path deepseek-ai/DeepSeek-V2-Lite-Chat --dtype float16 --mode dynamic --decode_strategy greedy_search --inference_model 1 --block_attn 1 --append_attn 1 --quant_type weight_only_int4 --weightonly_group_size 64

paddle-bot · 2025-03-11T05:13:11Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-ci-bot · 2025-04-02T03:03:57Z

Sorry to inform you that e6b68d3's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot · 2025-04-14T02:54:22Z

Sorry to inform you that 9f7556c's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot · 2025-05-01T02:51:00Z

Sorry to inform you that f194185's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot · 2025-05-18T03:24:19Z

Sorry to inform you that 51c5483's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

zeroRains added 2 commits March 11, 2025 04:49

base

d4a0239

support weight_quantize with group_wize quantize in gpu

10acd29

paddle-bot bot added the contributor External developers label Mar 11, 2025

zeroRains added 2 commits March 12, 2025 05:39

support per_group_quant_gpu_int4_col_pack for weight quant

ef17ac2

add per_group_quant_gpu_int4_row_pack

2393d35

zeroRains marked this pull request as draft March 12, 2025 12:10

fix the bug for int4 kernel and change the test condition

a1a2e35

zeroRains marked this pull request as ready for review March 13, 2025 05:15

zeroRains added 7 commits March 13, 2025 12:27

add some debug info

e3f37ee

fix the bug in float16

0a26964

fix the bug in channel-wise

2c9a207

fix the conflict

e6b68d3

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into wq

0e88abf

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into wq

9b13be0

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into wq

2b0e12b

zeroRains added 2 commits April 2, 2025 03:31

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into wq

56a36af

update test case

9f7556c

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into wq

f194185

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into wq

51c5483

yuanlehome self-requested a review May 13, 2025 03:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Support group-wize quantize for weight_quantize op in GPU #71549

[Inference] Support group-wize quantize for weight_quantize op in GPU #71549

zeroRains commented Mar 11, 2025 •

edited

Loading

paddle-bot bot commented Mar 11, 2025

paddle-ci-bot bot commented Apr 2, 2025

paddle-ci-bot bot commented Apr 14, 2025

paddle-ci-bot bot commented May 1, 2025

paddle-ci-bot bot commented May 18, 2025

[Inference] Support group-wize quantize for weight_quantize op in GPU #71549

Are you sure you want to change the base?

[Inference] Support group-wize quantize for weight_quantize op in GPU #71549

Conversation

zeroRains commented Mar 11, 2025 • edited Loading

PR Category

PR Types

Description

paddle-bot bot commented Mar 11, 2025

paddle-ci-bot bot commented Apr 2, 2025

paddle-ci-bot bot commented Apr 14, 2025

paddle-ci-bot bot commented May 1, 2025

paddle-ci-bot bot commented May 18, 2025

zeroRains commented Mar 11, 2025 •

edited

Loading