fix: bugs in dsps_fird_f32_aes3 (DSP-167) #110
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes critical bugs in the
dsps_fird_f32_aes3
optimized assembly implementation that were causing digital noise/corruption in the FIR filter output.Issues Fixed:
Incorrect register usage in offset_3 branch: The final accumulation was overwriting
f4
instead of usingf6
, causing partial loss of filter results.Incorrect instruction ordering in non-offset_0 branches: The order of
madd.s
andEE.LDF.128.IP
instructions in the tight FIR loops was leading to out-of-bounds data being added to FIR results..Potential out-of-bounds memory access: The optimized implementation could read 4 floats beyond the delay line buffer, which could crash if the buffer was placed at a memory boundary.
Root Cause:
The optimized AES3 implementation handles different memory alignments (offset_0, offset_1, offset_2, offset_3) but contained copy-paste errors and subtle timing issues in the non-aligned cases. The ANSI implementation worked correctly, but the optimized version would intermittently produce digital noise when the delay line pointer wasn't 16-byte aligned.
Changes Made:
.offset_3
section (line 204:madd.s f4, f3, f14
→madd.s f6, f3, f14
).offset_1
,.offset_2
, and.offset_3
loops for proper coefficient-data alignmentDocumentation Update Needed:
The requirement for delay line buffer to be
N + 4
floats (not justN
) should be documented fordsps_fird_f32_init
, as it's currently only mentioned fordsps_fir_f32_init
.Testing
The fix has been tested in a real-world PDM audio processing application where the digital noise was clearly audible before the fix and completely eliminated after.
Checklist
Before submitting a Pull Request, please ensure the following: