Skip to content

Commit 6d88448

Browse files
authored
fix: add tail-case handling for elementwise_add_f16x8_pack_kernel to … (#380)
* fix: add tail-case handling for elementwise_add_f16x8_pack_kernel to avoid out-of-bounds access * Rename nx to idx * Fix bug: nx -> idx --------- Co-authored-by: Uper <41718895+Hyliu-BUAA@users.noreply.github.com>
1 parent b98ba1c commit 6d88448

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

kernels/elementwise/elementwise.cu

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,10 @@ __global__ void elementwise_add_f16x8_pack_kernel(half *a, half *b, half *c,
117117
// reinterpret as float4 and store 128 bits in 1 memory issue.
118118
if ((idx + 7) < N) {
119119
LDST128BITS(c[idx]) = LDST128BITS(pack_c[0]);
120+
} else {
121+
for (int i=0; idx+i<N; i++) {
122+
c[idx+i] = __hadd(a[idx+i], b[idx+i]);
123+
}
120124
}
121125
}
122126

0 commit comments

Comments
 (0)