fix: add tail-case handling for elementwise_add_f16x8_pack_kernel to … #380

lhycms · 2025-08-23T04:15:07Z

Background

In elementwise_add_f16x8_pack_kernel, each thread processes 8 half elements at once. However, when the input length N is not divisible by 8, the last thread may perform out-of-bounds memory access.

Changes

Added a tail-case handling branch to safely compute the remaining elements one by one:

} else {
    for (int i = 0; nx + i < N; ++i) {
        d_c[nx + i] = __hadd(d_a[nx + i], d_b[nx + i]);
    }
}

Impact

Prevents potential out-of-bounds access when N % 8 != 0.
Keeps vectorized performance intact when N is a multiple of 8.

…avoid out-of-bounds access

kernels/elementwise/elementwise.cu

DefTruth

LGTM

lhycms added 2 commits August 23, 2025 12:13

fix: add tail-case handling for elementwise_add_f16x8_pack_kernel to …

d498904

…avoid out-of-bounds access

Rename nx to idx

6a84e3e

DefTruth requested changes Aug 24, 2025

View reviewed changes

kernels/elementwise/elementwise.cu Outdated Show resolved Hide resolved

Fix bug: nx -> idx

ec82df5

lhycms requested a review from DefTruth August 24, 2025 12:41

DefTruth approved these changes Aug 24, 2025

View reviewed changes

DefTruth merged commit 6d88448 into xlite-dev:main Aug 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: add tail-case handling for elementwise_add_f16x8_pack_kernel to … #380

fix: add tail-case handling for elementwise_add_f16x8_pack_kernel to … #380

Uh oh!

lhycms commented Aug 23, 2025

Uh oh!

Uh oh!

DefTruth left a comment

Uh oh!

Uh oh!

Uh oh!

fix: add tail-case handling for elementwise_add_f16x8_pack_kernel to … #380

fix: add tail-case handling for elementwise_add_f16x8_pack_kernel to … #380

Uh oh!

Conversation

lhycms commented Aug 23, 2025

Background

Changes

Impact

Uh oh!

Uh oh!

DefTruth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!