-
Notifications
You must be signed in to change notification settings - Fork 49
[DRAFT]Dnn38 arm #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3.6_for_ie_master
Are you sure you want to change the base?
[DRAFT]Dnn38 arm #285
Conversation
Temporary "const char *" objects can disappear while getting to the parser internals. Moving strings to parse into a permanent container solves the problem.
src1 with different data types cannot be fused.
And refactored sdpa primitive integration for better compilation performance. Currently the new kernel only supports floating point sdpa.
This allows oneDNN to build successfully with GCC 7.x
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Signed-off-by: Zhang fei <zhangfei@iscas.ac.cn>
Level Zero query currently returns wrong result wrt support for atomics by the device. This commit reverts to using ocl query until the issue is fixed in level zero.
3.5 squash list: [FORK][FIX] Corrected brgemm rd_step for bf16 compressed weights
3.5 squash list: [Fork][Fix] Fix avx2 bf16 reorder
[FORK][FEATURE] Enable avx2 jit reorder for bf16 data type
* fix matmul decompress test case Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * save tmp Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * [FORK][FIX] IP weights compression: scalar scale [FORK][FEATURE] InnerProduct primitive: squashed weight decompression Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * [FORK][FIX] IP weights compression: max bcast blocking computation [FORK][FEATURE] InnerProduct primitive: squashed weight decompression * fix compile issue Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * fix crash issue Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * try to fix compare issue Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * contiue fix some accrucy issue Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * fix f4_e2m1 Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * continue to fix f4e2m1 Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * fix confict on smoke_FC_(2|3)D_I8_sparse Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * clean debug and unused code Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> * revert this change, should affect test case Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> --------- Signed-off-by: HU Yuan2 <yuan2.hu@intel.com> Co-authored-by: dmitrygo <dmitry.gorokhov@intel.com>
Co-authored-by: Xuxin, Zeng <xuxin.zeng@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
[FORK][FEATURE] InnerProduct primitive: squashed weight decompression
- allocate aux accums regs on stack - precompute grouped src sums - optimize pointer arithmetic - reduce aux vecs count requred for the microkernel
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
after migration to 3.8, the default value of runtime_scale_t is undef instead of f32 Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
} | ||
|
||
inline float load_float_value(data_type_t dt, const void *ptr, dim_t idx) { | ||
FORCE_INLINE float load_float_value(data_type_t dt, const void *ptr, dim_t idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume this is done for performance improvement. Probably a better alternative is to change the loading method in the kernel/implementation of interest. The switch below is mostly the killer of any benefits. The function was not designed to be performant in any way.
Feel free to resolve, it's just a general observation.
Description
Please include a summary of the change. Please also include relevant motivation and context. See contribution guidelines for more details. If the change fixes an issue not documented in the project's Github issue tracker, please document all steps necessary to reproduce it.
Fixes # (github issue)
Checklist
General
make test
andmake test_benchdnn_*
) pass locally for each commit?Performance improvements
New features
Bug fixes
RFC PR