-
Notifications
You must be signed in to change notification settings - Fork 349
Adding new HWY_AVX10_2 target #2348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for starting the discussion! Looks like GNR has also just been introduced/launched, but that supports 10.1, I think. Min/MaxNumber (Min with proper NaN handling per IEEE754:2019) and Min/MaxMagnitude look useful, as does F16 WidenMulPairwiseAdd. Would be very happy to see those added :) I agree we'd want to split the "AVX3" and "512-bit" aspects of x86_512-inl.h. How about I make a TODO for around 2025-03 to lay the groundwork by creating the HWY_AVX10_2 (or HWY_AVX102?) target/boilerplate? Would you later like to add some of its functionality? |
MinMagnitude/MaxMagnitude ops are implemented in pull request #2353. |
It is possible to go ahead and implement the HWY_AVX10_2 target as GCC 14, Clang 18, and Clang 19 have the |
I have added new HWY_AVX10_2 and HWY_AVX10_2_512 targets in pull request #2395. |
Very nice, thanks for adding the targets already :D |
Interesting news, https://www.phoronix.com/news/Intel-AVX10-Drops-256-Bit reports that 512-bit will now be required. I think this means we can remove the |
@jan-wassenberg I removed the separate HWY_AVX10_2_512 target and renamed HWY_AVX10_2_512 to just HWY_AVX10_2 in pull request #2563. I have also made some changes in pull request #2563 to ensure that the AVX3, AVX3_DL, and AVX3_SPR baselines are enabled if compiling for AVX10.2 as GCC 15 might fail to define the macros for various AVX512 instruction set extensions if |
The upcoming Intel AVX10.2 instruction set (which is described in the specification that can be found at https://www.intel.com/content/www/us/en/content-details/828965/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html) adds the following operations:
IfThenElse(Lt(Abs(a), Abs(b)), a, b)
if botha[i]
andb[i]
are non-NaN)IfThenElse(Lt(Abs(a), Abs(b)), b, a)
if botha[i]
andb[i]
are non-NaN)GCC 15 and Clang 20, which are currently under development and scheduled to be released in Spring 2025, will have support for the new AVX10.2 intrinsics.
The new _mm*_cvttsp[h,s,d]_epi* intrinsics available on AVX10.2 should also fix the undefined behavior that is there with the conversion of out-of-range floating-point vectors to integer vectors with GCC (and this issue was described at #2183).
Also need to move some of the ops for 256-bit or smaller vectors that are currently implemented in the hwy/ops/x86_512-inl.h header on AVX3 targets into a separate header as support for 512-bit vectors is optional on AVX10.2.
The text was updated successfully, but these errors were encountered: