Skip to content

Conversation

iha-taisei
Copy link
Contributor

Closes #5352
This pull request addresses issue #5352 by implementing loop unrolling of [SD]DOT kernels.​
This improves performance by 1.3x on A64FX.​
image
image

@martin-frbg martin-frbg added this to the 0.3.31 milestone Jul 4, 2025
@martin-frbg
Copy link
Collaborator

Thank you. (That pronounced peak around 5000 looks intriguing, now that your patch has magnified it - there might be something else wrong, like a poorly chosen threshold for multithreading...)

@martin-frbg martin-frbg merged commit df013c5 into OpenMathLib:develop Jul 4, 2025
86 of 87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance improvements of [SD]DOT on A64FX.
2 participants