[GPU] modify GroupQueryAttention decompose logic to fit gpu unsqueeze_broadcast_reshape_sdpa_fusion pattern #31507

bopeng1234 · 2025-07-29T01:44:30Z

For MultiQueryAttention model, for example, Qwen2.5-1.5B, KV head num = 2, Q head num = 12.

KV need to broadcast to 12 then do SDPA computation.

GPU has pattern to fuse KV broadcast into SDPA kernel to raise performance, in unsqueeze_broadcast_reshape_sdpa_fusion.cpp

This PR adjust GQA decompose logic to match the pattern.

…a_fusion pattern for MQA model performance

rkazants

Please share CVS JIRA ticket number.
Also, please add tests for new pattern. I think we can affect other models with existing patterns.

bopeng1234 · 2025-08-01T03:12:41Z

added JIRA, https://jira.devtools.intel.com/browse/CVS-171437
will add test soon.

bopeng1234 · 2025-08-01T04:06:11Z

Hi @rkazants , I think these is existing GQA test with query head 2, kv head 1

This PR only changes the implementation of the kv broadcast to q, the existing testcase can cover and verify this changes, so I think we don't need to add another duplicate testcase.

sgbihu · 2025-08-07T06:27:25Z

build_jenkins

bopeng1234 requested a review from a team as a code owner July 29, 2025 01:44

bopeng1234 requested review from itikhono and removed request for a team July 29, 2025 01:44

github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Jul 29, 2025

sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Jul 29, 2025

modify GQA decompose logic to fit gpu unsqueeze_broadcast_reshape_sdp…

be4e135

…a_fusion pattern for MQA model performance

bopeng1234 force-pushed the gqa_multi_query_atten_fix_gpu branch from ecbe728 to be4e135 Compare July 30, 2025 01:46

mryzhov requested a review from CuriousPanCake July 30, 2025 07:44

Merge branch 'master' into gqa_multi_query_atten_fix_gpu

5117999

rkazants requested changes Jul 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] modify GroupQueryAttention decompose logic to fit gpu unsqueeze_broadcast_reshape_sdpa_fusion pattern #31507

[GPU] modify GroupQueryAttention decompose logic to fit gpu unsqueeze_broadcast_reshape_sdpa_fusion pattern #31507

bopeng1234 commented Jul 29, 2025

Uh oh!

rkazants left a comment

Uh oh!

bopeng1234 commented Aug 1, 2025

Uh oh!

bopeng1234 commented Aug 1, 2025

Uh oh!

sgbihu commented Aug 7, 2025

Uh oh!

Uh oh!

[GPU] modify GroupQueryAttention decompose logic to fit gpu unsqueeze_broadcast_reshape_sdpa_fusion pattern #31507

Are you sure you want to change the base?

[GPU] modify GroupQueryAttention decompose logic to fit gpu unsqueeze_broadcast_reshape_sdpa_fusion pattern #31507

Conversation

bopeng1234 commented Jul 29, 2025

Uh oh!

rkazants left a comment

Choose a reason for hiding this comment

Uh oh!

bopeng1234 commented Aug 1, 2025

Uh oh!

bopeng1234 commented Aug 1, 2025

Uh oh!

sgbihu commented Aug 7, 2025

Uh oh!

Uh oh!