Skip to content

Commit 38708d4

Browse files
authored
【Inference】fix blha bug (#70466)
* fix blha bug * fix blha bug
1 parent 0d307e2 commit 38708d4

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

paddle/phi/kernels/fusion/gpu/block_attn.h

+3-1
Original file line numberDiff line numberDiff line change
@@ -892,7 +892,9 @@ __global__ __launch_bounds__(THREADS_PER_BLOCK) void gqa_block_attention_kernel(
892892
float qk_maxs[GQA_SUB_PARTITION_SIZE];
893893
#pragma unroll
894894
for (int i = 0; i < GQA_SUB_PARTITION_SIZE; i++) {
895-
qk_maxs[i] = -FLT_MAX;
895+
// qk_maxs[i] = -FLT_MAX;
896+
// initialize qk_maxs!!!
897+
qk_maxs[i] = qk_smem[act_time_step * GQA_SUB_PARTITION_SIZE + i];
896898
}
897899

898900
// threads in one block can process 'K_PER_ITER' keys

0 commit comments

Comments
 (0)