[PHI] Optimize Gather kernel with vectorization #72238

lshpku · 2025-04-14T05:34:53Z

PR Category

Performance Optimization

PR Types

Performance

Description

使用向量化优化GatherGPUKernel的性能，并将原有的2种Gather实现合并为一个

注：原来的2种实现分别处理高维和低维，我发现没有必要，就合并成一个了，但仍然保留了2种调用接口，因为不少别的Kernel还依赖于被弃用的那个

性能测试

A100，float16，假设index的长度和shape[axis]相同，用时单位为us

shape	axis	原用时	新用时	性能提升	说明
[128, 1024*1024]	0	1,466	459	219.6%	2D高维，可4x向量化
[128, 1024*1024+2]	0	1,472	834	76.5%	2D高维，可2x向量化
[128, 1024*1024+1]	0	1,471	1,394	5.5%	2D高维，不可向量化
[262144, 256]	1	793	694	14.2%	2D低维，不可向量化
[16384, 4096]	1	884	720	22.7%	2D低维，不可向量化
[4096, 16384]	1	1,151	963	19.6%	2D低维，不可向量化
[128, 1024, 1024]	1	1,700	478	255.5%	3D中维，可4x向量化
[128, 1024, 1024+2]	1	1,747	871	100.6%	3D中维，可2x向量化
[128, 1024, 1024+1]	1	1,695	1,409	20.3%	3D中维，不可向量化

由测试结果可知，本PR主要在可向量化的场景下带来较大的性能提升；对于不可向量化的情况也有略微的提升，这是因为优化了下标的计算方式和增大了loop数量

另外，进行了千级的shape覆盖性测试，也检查了部分shape下float32的性能，均无问题

Pcard-85711

paddle-bot · 2025-04-14T05:34:58Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2025-04-14T05:34:59Z

All committers have signed the CLA.

[PHI] Optimize Gather kernel with vectorization

0498fa7

lshpku force-pushed the vectorize-gather-kernel-test branch from 05559d4 to 0498fa7 Compare April 14, 2025 05:36

PaddlePaddle deleted a comment from CLAassistant Apr 14, 2025

lshpku closed this Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PHI] Optimize Gather kernel with vectorization #72238

[PHI] Optimize Gather kernel with vectorization #72238

lshpku commented Apr 14, 2025 •

edited

Loading

paddle-bot bot commented Apr 14, 2025

CLAassistant commented Apr 14, 2025 •

edited

Loading

[PHI] Optimize Gather kernel with vectorization #72238

[PHI] Optimize Gather kernel with vectorization #72238

Conversation

lshpku commented Apr 14, 2025 • edited Loading

PR Category

PR Types

Description

性能测试

paddle-bot bot commented Apr 14, 2025

CLAassistant commented Apr 14, 2025 • edited Loading

lshpku commented Apr 14, 2025 •

edited

Loading

CLAassistant commented Apr 14, 2025 •

edited

Loading