Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
2088 commits
Select commit Hold shift + click to select a range
ce085ff
cpu: x64: jit_reorder: add verbose messages
dzarukin Mar 19, 2025
4832ccc
benchdnn: self: replace temporary "const char *" with "std::string"
dzarukin Mar 19, 2025
67c3042
cpu: x64: fixed memory leak in jit_uni_ncsp convolution impl
dzarukin Mar 20, 2025
607a318
ngen: update PVC WAR bug workaround
petercad Mar 22, 2025
caa770a
benchdnn: inputs: graph: fix test cases related to int8/f8 add
TaoLv Mar 5, 2025
5d7ed69
Yobodovs/amx blocking heuristics fixes (#2938)
yair-obodovsky Mar 25, 2025
7d418f5
xe: ocl: fix gemm_with_po verbose dispatch message
petercad Mar 25, 2025
931cc27
ngen: downstream nGEN
rjoursler Mar 21, 2025
068b775
cpu: aarch64: add brgemm bwd data support for block size 8 and 16
rpushkarr Mar 26, 2025
6bc6597
graph: backend: dnnl: introduce internal dnnl_sdpa op
Mar 26, 2025
71f1837
build: removed -fcf-protection build option for old GCC
vpirogov Mar 14, 2025
114cd34
benchdnn: add per test case timer
dzarukin Mar 21, 2025
1a2f7c3
benchdnn: add message for not found files
dzarukin Mar 21, 2025
6ffa939
benchdnn: add summary execute timer
dzarukin Mar 21, 2025
5f6951b
benchdnn: add summary create_pd and create_prim timers
dzarukin Mar 21, 2025
4107fb9
riscv64: update intrinsics
zhangfeiv0 Mar 21, 2025
32f71c2
riscv64: fix clang-format error
zhangfeiv0 Mar 21, 2025
37b4972
riscv64: fix clang-format error
zhangfeiv0 Mar 21, 2025
190bf99
riscv64: fix clang-format error
zhangfeiv0 Mar 21, 2025
f8c3ab0
riscv64: update cmake
zhangfeiv0 Mar 21, 2025
5de25f3
riscv64: update cmake
zhangfeiv0 Mar 22, 2025
2a35904
gpu: intel: sycl: workaround failing atomics support
mgouicem Mar 25, 2025
3e40d66
gpu: intel: sycl: level zero query fixup
mgouicem Mar 27, 2025
201b16a
xe: jit: gemm: reduce grf consumption for fp4 strategies
dyoussif Mar 27, 2025
5652be3
common, xe: sycl: improve logging when OpenCL install is missing
rjoursler Mar 27, 2025
5171c2a
gpu: intel: ocl: add s32 support for binary
yehudaorel Mar 12, 2025
e1c08d7
gpu: intel: ocl: enable s32 for binary primitive
yehudaorel Mar 17, 2025
4e44b16
tests: benchdnn: gpu: enable s32 dt in binary
yehudaorel Mar 18, 2025
6a7f559
xe: ocl: prevent double -> float literal conversion
yehudaorel Mar 18, 2025
4d61ba4
xe: ocl: prevent double -> float literal conversion fix
yehudaorel Mar 18, 2025
df1bbe2
xe: ocl: prevent double -> float clang fix
yehudaorel Mar 18, 2025
eaaf1c0
third_party: ngen: prepare for SYCL generator usage
echeresh Mar 25, 2025
a575629
gpu: enable SYCL generator for nGEN kernels
echeresh Mar 25, 2025
c667c39
xe, sycl: eliminate intermediate kernel binary
echeresh Mar 25, 2025
699de16
xe: jit: remove outdated comments
echeresh Mar 25, 2025
4c3596d
xe: jit: codegen: prevent IR -> nGEN assembly functional changes
rjoursler Mar 26, 2025
1b3890f
xe: jit: move require_signal_header into exec_cfg
rjoursler Mar 26, 2025
ff05a37
x64: brgemm: split brgemm_blocking function for tmm and vmm
ankalinin Mar 24, 2025
1498c83
x64: brgemm: update code for tmm brgemm blocking
ankalinin Mar 25, 2025
762e317
generic: sycl: Adding support for RNN FWD r2l, sum & concat
ShanoToni Feb 4, 2025
9bef39e
ngen: fix missing field initialization warning
rjoursler Mar 26, 2025
7dc74a9
cmake: limit host compiler dpcpp warning divergence
rjoursler Mar 26, 2025
dc9eca4
graph: backend: dnnl: add reshape pass to support 5D GQA
Mar 21, 2025
df65f8c
graph: backend: dnnl: refine check and layout propagation for gqa
Mar 26, 2025
879eefd
benchdnn: inputs: graph: add gqa v2 case
Mar 21, 2025
396fdfc
all: clean graph compiler backend
TaoLv Mar 25, 2025
61821f1
xe: ir: add 4-bit types
atkassen Dec 16, 2024
2205a94
xe: jit: codegen: remove unused parameter
atkassen Mar 18, 2025
0b17a99
xe: jit: codegen: use offset-based interfaces
atkassen Mar 21, 2025
aa20318
xe: jit: ir: adjust sizes/offsets for packed types
atkassen Mar 21, 2025
42aaded
xe: jit: ir: add assertion for sub-byte type packing
atkassen Mar 25, 2025
91c60af
xe: jit: reorder: remove hf8 workarounds
atkassen Mar 18, 2025
797b867
xe: jit: reorder: prevent scalar mov in 2d impl
atkassen Mar 18, 2025
11a8548
xe: jit: reorder: enable fp4 up-convert
atkassen Mar 18, 2025
425ae14
xe: jit: reorder: enable fp4 down-convert
atkassen Mar 24, 2025
bb6819e
xe: jit: codegen: fix dst width for asm dumping
atkassen Mar 24, 2025
633978a
xe: jit: address clang-tidy complaints
atkassen Mar 24, 2025
333ef80
graph: interface: op: matmul supports mixed data types
TaoLv Mar 5, 2025
697db52
graph: interface: op: softmax supports mixed data types
TaoLv Mar 12, 2025
60c6d5e
graph: interface: op: binary ops support mixed data types
TaoLv Mar 12, 2025
27f277c
examples: graph: sdpa: define with f32 intermediate data type
TaoLv Mar 12, 2025
fe974de
graph: backend: dnnl: ukernel sdpa only supports f32 intermediates
TaoLv Mar 13, 2025
00dbb2a
graph: backend: dnnl: pattern: sdpa: remove xf16 check from gpu pattern
TaoLv Mar 21, 2025
50e3ea3
benchdnn: inputs: graph: add sdpa cases with f32 intermediate type
TaoLv Mar 13, 2025
f19ecf9
benchdnn: inputs: graph: test f32 intermediates for implicit mask
TaoLv Mar 21, 2025
9b54354
doc: graph: op: update supported data types
TaoLv Mar 17, 2025
8a04bcc
graph: backend: dnnl: support intermediate data type in decomp kernel
Mar 27, 2025
aeaa73f
cpu: aarch64: default num_threads to max for acl_threadpool
Sqvid Mar 25, 2025
11aa54d
doc: graph: fusion patterns restructure (#2952)
ranukund Mar 31, 2025
c6fab66
x64: conv: add f16 jit dw conv for avx512_core_fp16
tczeszun Mar 21, 2025
a03a5bb
xe: jit: gemm: db: reinfo a BOS and SOS strategy
dyoussif Mar 11, 2025
bebdb6b
xe: jit: gemm: handle data type alignment requirements more strictly
dyoussif Mar 14, 2025
d1c9a1b
xe: jit: gemm: db: fixup strategy alignments
dyoussif Mar 17, 2025
3bfdbfa
xe: jit: gemm: db: fixup out of regs
dyoussif Mar 18, 2025
616cbaa
xe: avoid copies
atkassen Mar 7, 2025
a6b3c47
xe: add missing ctors/dtors/assignment operators
atkassen Mar 7, 2025
71387b6
xe: remove unnecessary/dangerous moves
atkassen Mar 7, 2025
46b79d4
xe: remove unused code
atkassen Mar 7, 2025
7c65b29
xe: jit: codegen: remove dead code
atkassen Mar 27, 2025
5d9f7ad
xe: jit: ir: avoid overflow
atkassen Mar 31, 2025
3ff9c05
xe: jit: ir: use `type.packing()` interface
atkassen Mar 31, 2025
c2f590d
xe: jit: codegen: gracefully handle bad float division
atkassen Mar 31, 2025
d54fe49
xe: jit: address clang-tidy complaints
atkassen Mar 7, 2025
4d9a89c
doc: fixup fp8 support documentation
kealan-barbieri Mar 17, 2025
b04fcca
xe: jit: conv: reduce min hw for fp8 support
kealan-barbieri Mar 17, 2025
e20f9b5
doc: Add Xe2 architectures
kealan-barbieri Mar 21, 2025
d467687
xe: ocl: reorder: allow more type combinations
atkassen Mar 26, 2025
224bb94
tests: benchdnn: adjust reorder fill range for hf8
atkassen Mar 26, 2025
28b4c27
xe: sdpa: Update configs for head sizes of 128 and 256
umar456 Mar 25, 2025
f46c8a3
xe: sdpa: move all 64 head size configurations to xe2 for LNL and BMG
umar456 Mar 26, 2025
ca053d7
xe: sdpa: Refactor condition macros in init function
umar456 Mar 26, 2025
e4e4818
xe: sdpa: Add new configurations for head size of 256 on xe2
umar456 Mar 26, 2025
1521345
xe: sdpa: additional sdpa config updates from expanded configuration
umar456 Apr 1, 2025
d13fd38
cpu: x64: fix invalid immediate encoding in cpu_reducer
rjoursler Mar 31, 2025
b552ce7
cpu: x64: fix invalid immediate offset in avx 1x1 convolution
rjoursler Mar 31, 2025
2d41f6f
cpu: binary: disable broadcast check for select op
avmanerikar Apr 1, 2025
55d6ac5
graph: backend: dnnl: verbose log enhancement
rongzha1 Mar 26, 2025
f5baa4b
scripts: verbose converter: strengthen type-hinting for attributes
atkassen Apr 1, 2025
84c7f3f
scripts: verbose converter: simplify attribute formatting
atkassen Apr 1, 2025
2074cf4
scripts: verbose converter: use "any" as default binary po tag
atkassen Apr 1, 2025
5cf9db9
cpu: remove extra size checks
ankalinin Apr 2, 2025
0b6e3ba
x64: update has_large_size function
ankalinin Apr 2, 2025
bef2e40
cpu: conv_list: add x8:s8:f16 combination
dzarukin Apr 1, 2025
132703e
benchdnn: prim_ref: use tag::any for binary and f32 for sum po
dzarukin Apr 2, 2025
7b39578
ngen: workaround for SYCL + GCC 12.3 compiler bug
petercad Apr 1, 2025
48e6b97
xe: ocl: enable ref fp4 conv
kealan-barbieri Mar 28, 2025
d30d609
src: common: limit fp4 convs to even dims
kealan-barbieri Mar 28, 2025
1dcf33e
tests: benchdnn: enable fp4 conv tests, inputs
kealan-barbieri Mar 28, 2025
5f3a9dc
xe: jit: conv: enable fp4 support
kealan-barbieri Mar 28, 2025
55b000c
src: common: add convolution pack scratchpad tag
kealan-barbieri Mar 31, 2025
c37b483
xe: ocl: ref convolution dst fp4 support
kealan-barbieri Mar 31, 2025
ef5e699
benchdnn: graph: support op kind rewrite for binary/eltwise
wzt1997 Mar 25, 2025
a27c348
benchdnn: graph: improve case log with new knob
wzt1997 Mar 25, 2025
a67c2ed
benchdnn: graph: inputs: use op-kind rewrite for scale in SDPA
wzt1997 Mar 25, 2025
1495519
benchdnn: graph: improve doc for --op-kind knob
wzt1997 Mar 25, 2025
bc555fd
benchdnn: graph: inputs: use op kind rewrite for binary op testing
wzt1997 Mar 26, 2025
cba91c3
benchdnn: graph: inputs: use op kind rewrite for eltwise op testing
wzt1997 Mar 26, 2025
7388893
cpu: x64: matmul: correct blocked_B layout initialization (#3007)
xuxinzen Apr 3, 2025
a474e3a
benchdnn: graph: separate mem filling and create mem from graph path
wzt1997 Mar 20, 2025
06a7e82
benchdnn: graph: remove useless value for reduction
wzt1997 Mar 20, 2025
dde3af7
benchdnn: graph: use default value from benchdnn for no ref mem
wzt1997 Mar 20, 2025
5994eb7
benchdnn: graph: remove check for no_ref_mem
wzt1997 Mar 20, 2025
4fc4e5a
fixup: build: bumped version to v3.8.0
vpirogov Apr 3, 2025
64f78da
github: workflows: bump KyleMayes/install-llvm-action
dependabot[bot] Feb 24, 2025
7584871
xe: sdpa: add configs for head_size of 512
syurkevi Mar 28, 2025
077763a
tests: sdpa: add complex_fusion tests for head size 512
syurkevi Mar 28, 2025
b818143
xe: sdpa: enable 32-wide block loads for DG2
syurkevi Mar 28, 2025
ce928e6
xe: sdpa: refactor config selection to separate header
syurkevi Apr 2, 2025
bfc4cac
xe: sdpa: update configs for xe2 granularity
syurkevi Apr 2, 2025
fce8dda
xe: sdpa: address coverity issues
syurkevi Apr 3, 2025
814d0e9
xe: sdpa: enable head size 576 for f16
syurkevi Apr 3, 2025
642d110
xe: jit: gemm: handle sub-byte 'any' tags
dyoussif Mar 27, 2025
8bbf199
xe: jit: gemm: fixup out of reg
dyoussif Mar 27, 2025
fdefc68
benchdnn: matmul: remove invalid int4 zp cases
dyoussif Mar 27, 2025
f6ed545
github: workflows: bump lukka/get-cmake from 3.31.5 to 3.31.6
dependabot[bot] Mar 24, 2025
a1e553e
scripts: verbose converter: allow post-op duck typing
atkassen Apr 3, 2025
f840512
graph: backend: dnnl: fix genindex build on NV GPU
Jiexin-Zheng Apr 2, 2025
decb08c
gtests: graph: fix incorrect layout expectation
Jiexin-Zheng Apr 1, 2025
19bfa32
graph: backend: dnnl: disable binary+sqrt fusion on NV GPU
Jiexin-Zheng Apr 2, 2025
910e36d
gtests: graph: unit: add binary+sqrt case
Jiexin-Zheng Apr 2, 2025
032bc7a
gtests: graph: unit: add compile option for ptx
Jiexin-Zheng Apr 2, 2025
41ef402
graph: backend: dnnl: fix sdpa build on NV GPU
Jiexin-Zheng Apr 3, 2025
5243796
benchdnn: graph: fix emplace
TaoLv Apr 1, 2025
69545e2
benchdnn: graph: fix naming style of deserialized_lt_t
TaoLv Apr 1, 2025
d727bbe
benchdnn: graph: fix naming style of sycl_deletor_t
TaoLv Apr 1, 2025
af78dcf
graph: utils: pm: check pointer before dereference
TaoLv Apr 1, 2025
9554374
graph: backend: dnnl: aovid unnecessary copy
TaoLv Apr 1, 2025
c960e96
src: gpu: intel: jit: gemm: add dual (src+wei) vector zero points
hidefromkgb Apr 2, 2025
b073921
gpu: intel: sycl: l0: remove dependency to OCL for atomics query
mgouicem Apr 3, 2025
cea5462
cmake, doc: add GROUP_NORMALIZATION value for ONEDNN_ENABLE_PRIMITIVE
mzhukova Apr 4, 2025
4795c31
xe: sdpa: Add support for bottom right causal mask type
umar456 Apr 3, 2025
9674952
xe: sdpa: Use append instead of set for opencl argument assignment
umar456 Apr 3, 2025
b08cd59
xe: sdpa: pass attn_mask_type as int using compiler definitions in ocl
umar456 Apr 4, 2025
03fafc7
[FORK][FEATURE] Enable jit sse41 NxN convolution for grayscale input
Jun 5, 2018
8b35c43
[FORK][FEATURE] Support of strided blobs for [de]convolution and simp…
luweizhou2016 Dec 19, 2023
9adee10
[FORK][FEATURE] Updated sse41 jit convolutions to support padded chan…
Oct 26, 2018
1e093be
[FORK][FEATURE] Introduced Depthwise and Quantization post ops
Sep 24, 2020
6afdaff
[FORK][FEATURE] TBB_AUTO was enabled
alexey-varyzgin May 14, 2019
fff5ef7
[FIX] nchw_pooling dense fix
alexey-varyzgin Nov 14, 2019
ecee1ae
[FORK][FEATURE] Enabled BWD (JIT/GEMM) FP32/BF16 Convoltions + Depthw…
Oct 21, 2020
17482fa
[FIX] Fixes for MKLDNN to enable LTO
ilya-lavrenov May 18, 2020
d3b9514
[FIX] [MSVC] Enabling SIMD functionality for VS2019
Aug 12, 2020
133bbd8
[FIX] Add several uni instruction wrappers into jit_generator
AlexPeskov Oct 26, 2020
26b541b
[FIX] Fix name matching with system struct 'user' in llvm-android too…
AlexPeskov Nov 16, 2020
177e84c
[FORK][FEATURE] Added JIT FP32/BF16 Softmax for arbitrary inner_size
Dec 4, 2020
3195eab
[FORK][FEATURE] Added support of hsigmoid, round_half_to_even, round_…
a-sidorova Aug 27, 2020
6fa3d46
[FIX] Limit applicability of is_1stconv logic for JIT FP32/BF16 AVX51…
Dec 9, 2020
7f90108
[FIX] [WA] Removed kernel_outside_src condition on JIT FP32/BF16 Conv…
Dec 9, 2020
1d9879c
[FORK][FEATURE] Added custom vesrion of JIT DW FP32/BF16 Convolution …
Dec 14, 2020
cc7cdc3
[FORK][FEATURE] Asymmetric quntization for activations
Nov 20, 2020
48f1985
[FORK][FEATURE] Added 3D DW case support for JIT INT8 Convolutions
Dec 14, 2020
33c8075
[FORK][FEATURE] Added JIT AVX512/AVX2 FP32 Planar Convolution impleme…
Jan 2, 2021
a98a81c
[FORK][FEATURE] Binary networks support
Jan 21, 2021
8692be6
[FIX] Accommodating oneTBB (with hybrid cores support) that
myshevts Nov 24, 2020
e30973e
[FIX] [WA] Fixed fallback on ref conv in case exceeding scratchpad limit
Feb 26, 2021
805bfb2
[FORK][FEATURE] Returned old behavior for fp32 avx2 1x1 conv with dw …
antonvor Feb 16, 2021
20e65b5
[FIX] Updated SoftPlus
a-sidorova Apr 12, 2021
8d96cec
[FIX] Disable reorder JIT if both inputs and outputs are batch-strided.
IvanNovoselov Jun 8, 2021
e69c895
[FIX] Include TBB headers as system
AlexPeskov Oct 26, 2020
c9fa057
[FORK][FEATURE] nspc layout support for convolutions
luweizhou2016 Jul 26, 2024
88fffb8
[FIX] set scale = 1.f in case of signed input on platforms without vnni
antonvor May 26, 2021
776be72
[FIX] Memory descriptor dynamism related changes
maxnick Jul 23, 2021
2320b7e
[FORK][FEATURE] Added prelu as binary post op
antonvor Aug 2, 2021
5ef9946
[FORK][FEATURE] Depthwise and Quantization post ops for Gemm Convolut…
antonvor Aug 23, 2021
ff5f753
[FORK][FIX] perf fixes for quantization post ops
antonvor Sep 16, 2021
3850908
[FIX] todo: fix assert(idx < max_idx)
antonvor Sep 16, 2021
610db3e
[FIX] [1D] Enlarge support
alexey-varyzgin Oct 22, 2021
79d3c93
[FIX] Hash utility functions were extracted to a separate module for …
maxnick Nov 29, 2021
3fe61e6
[FIX] Desc similar_to routine consider start stride
maxnick Jan 14, 2022
0df98e3
[FIX] Desc similar_to routine use stride cmp mask
maxnick Jan 26, 2022
f3fb464
[FIX] added some legacy parallel methods to fix perf issues
antonvor Jan 17, 2022
d3f0e18
[FORK][FEATURE] Migrate legacy post ops and zero points on runtime da…
luweizhou2016 Jul 26, 2024
589807f
[FIX] fix ci error
luo-cheng2021 May 6, 2022
292d3ba
[FIX] [WA] stride=2, left pad =1, kw*kh=1*1 may crash
luo-cheng2021 May 8, 2022
cc2d3ef
[FORK][FEATURE] gemm_conv support binary post ops
luo-cheng2021 May 9, 2022
b245a31
[FORK][FIX] prelu post ops fix
EgorDuplensky Jul 22, 2022
86f9ea2
[FORK][FEATURE] fork dw conv support binary postops
luo-cheng2021 May 11, 2022
039b72c
[FORK][FEATURE] gemm bf16 support binary postops& sse4.1 1x1 binray t…
luo-cheng2021 May 12, 2022
ada2d14
[FORK][FEATURE] avx512 fork bf16 dw support binary postops
luo-cheng2021 May 12, 2022
6541345
[FIX] fork dw conv may overflow on width tail
luo-cheng2021 May 13, 2022
5657cb5
[FORK][FEATURE] gemm int8 support binary postops
luo-cheng2021 May 17, 2022
c11d0e9
[FORK][FEAUTRE] Add log to jit dump code
luweizhou2016 Dec 20, 2023
7b20140
[FIX] [WA] Disabled weights md transpose in FC to prevent perf degrad…
Dec 16, 2020
83d2dc5
[FIX] Remove average pooling exclude padding limitation
EgorDuplensky Oct 25, 2022
5b8d7f0
[FIX] Added support for exceptions during code generation
lohika-denis-kotov Sep 8, 2022
e8746ba
[FIX] fix cpu convolution qdq testcase fail issue when using scratchpad
liubo-intel Nov 3, 2022
1bdec8d
[FIX] CPU: x64: fix issue in eltwise post ops to allow multi-instance…
usstq Sep 30, 2022
eff8e6d
[FORK][FEATURE] Add jit debug trace log dump in GCC debug mode
usstq Nov 3, 2022
4d9f59b
[FIX] Fix seg fault in parallel function with ITT build
EgorDuplensky Nov 16, 2022
6befa4d
[FIX] Add option to explicitly disable XBYAK_NO_EXCEPTION
EgorDuplensky Jul 25, 2024
68504db
[FIX] Extend AMX deconv to support oscale+eltwise+eltwise post ops.
luweizhou2016 Oct 31, 2022
bc63709
[FIX] Fixed compilation for 32bits
ilya-lavrenov Jan 9, 2023
f7265c6
[FORK][FEATURE] jit_uni_reorder: relaxed isa condition to enable FP16…
antonvor Jul 14, 2023
436d88b
[FORK][FEATURE] cpu: Unify oc_block for inner product with heuristic
luweizhou2016 Sep 1, 2023
0a0c847
[FIX][WA] Apply recorder WA caused by compiler issue on AVX2 windows …
luweizhou2016 Dec 25, 2023
5c7d70c
[FORK][FIX][x64] Refactor avx2 binary PReLU and fix reg conflicts
maxnick Apr 17, 2024
823850e
[Fork][Fix] Deconv update the limitation.
luweizhou2016 Mar 14, 2024
ccf8276
[FIX] Fix warining caused by missing header file.
luweizhou2016 Jun 21, 2024
a7d0867
[FORK][FEATURE] Cc based on master (#135)
zhwuwuwu Jul 7, 2022
58dbced
[FORK] [FEATURE] cpu: add inner product with sparse packed weights
jianan-gu Nov 25, 2022
fcf8998
[FORK][FEATURE] InnerProduct primitive: squashed weight decompression
luweizhou2016 Jul 24, 2024
3d9f274
[FORK][FEATURE] Support (f32,bf16,f32) inner-product
May 22, 2024
3523e9e
[FORK][FEATURE] Enable avx2 jit reorder for bf16 data type
May 22, 2024
36a18db
[FORK][FEATURE] IP weights compression: mxfp4 (wei=f4e2m1, scales=f8e…
Jul 29, 2024
51a48c6
[X64] Fixed need_mask_register for eltwise injectors
a-sidorova Aug 26, 2024
f2b408c
[FORK][FIX] Fixed debug assert in jit_io_helper_t
Aug 29, 2024
4f0f1f3
temp fix
azhai219 May 12, 2025
cd20975
[CPU][fix] fix matmul decompress test case for migration v3.8 (#1)
tiger100256-hu Jun 4, 2025
40d9d5c
cpu: x64: matmul: correct LDD when M == 1 (#2)
tiger100256-hu Jun 4, 2025
e4d97f3
cpu: x64: guard macro definitions to avoid potential Wundef hits
dzarukin Apr 25, 2025
54e4db8
[FORK][FIX] Fix missing 'map' include introduced by xbyak debug logic
tiger100256-hu Jun 5, 2025
5d91309
[FORK][FIX] IP weights compression: max bcast blocking computation
Jan 20, 2025
2f9e73e
[FORK][FEATURE] DQ IP: performance enhansments
Jan 21, 2025
0f9b7fb
[FORK][FiX] fix IP compress test case after migration v3.8 on avx2
tiger100256-hu Jun 6, 2025
cd0b8d8
[FORK][FIX] fix args checking issue
tiger100256-hu Jun 6, 2025
9ad2016
[FORK][FIX] add missing override
tiger100256-hu Jun 9, 2025
fb38e19
[FORK][Fix] Fix condition compilation
tiger100256-hu Jun 10, 2025
38c7c03
[FORK][FIX] fix LLM FP16 Failed on avx512 and avx2
tiger100256-hu Jun 11, 2025
2577b22
[FORK][FIX] fix riscv cmake issue
tiger100256-hu Jun 11, 2025
902df2e
[FORK][FIX] fix crash of convolution 1x1 int8 model on SPR (#9)
tiger100256-hu Jun 11, 2025
a64d9a0
[ARM] Hide x64 dependent implementation under macro
alvoron Jun 12, 2025
c764191
[ARM] ARM 32bits support for oneDNN
alvoron Dec 17, 2024
d8054b4
[ARM] Added ARM32 ACL kernels calls
alvoron Dec 17, 2024
328e942
[MERGE THIS INTO ANOTHER COMMIT] brgemm_matmul_matrix_B_reorder_t fix
alvoron Jun 12, 2025
ed34759
[FORK][FEATURE][ARM] Enable f16 ACL post-op
alvoron Dec 17, 2024
301e94f
[FORK][ARM][FIX] Fix ACL configuration and skip failed tests
alvoron Jun 12, 2025
1a918af
[ARM] New heuristic for winograd and gemm (ACL)
allnes Feb 18, 2025
634c2db
[ARM][FORK] Resolve float32_t type on 32-bit platforms
alvoron Dec 23, 2024
7eb6272
[ARM][FORK][FIX] Set CMAKE_CXX_STANDARD to 20 on Android
alvoron Jun 12, 2025
9b9b876
[ARM][FORK][FIX] Use FORCE_INLINE for load_float_value
aobolensk Jul 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
File renamed without changes.
4 changes: 4 additions & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
HeaderFilterRegex: '/(examples|include|src|tests)/.*\.hpp'

FormatStyle: file

Checks: >
-*,
readability-identifier-naming,
Expand Down
60 changes: 32 additions & 28 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#===============================================================================
# Copyright 2019-2024 Intel Corporation
# Copyright 2019-2025 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -15,44 +15,48 @@
#===============================================================================

# Default
* @oneapi-src/onednn-arch @intel-innersource/dnn-arch
* @uxlfoundation/onednn-arch

# Github automation
/.github/ @oneapi-src/onednn-devops
/.github/ @uxlfoundation/onednn-devops

# CPU Engine
/src/cpu/aarch64/ @oneapi-src/onednn-cpu-aarch64 @intel-innersource/dnn-arch
/src/cpu/x64/ @oneapi-src/onednn-cpu-x64 @intel-innersource/dnn-cpu
/src/cpu/rnn/ @oneapi-src/onednn-cpu-x64 @intel-innersource/dnn-cpu
/src/cpu/aarch64/ @uxlfoundation/onednn-cpu-aarch64
/src/cpu/x64/ @uxlfoundation/onednn-cpu-x64
/src/cpu/rnn/ @uxlfoundation/onednn-cpu-x64

# GPU Engine
/src/gpu/amd/ @oneapi-src/onednn-gpu-amd @intel-innersource/dnn-arch
/src/gpu/intel/ @oneapi-src/onednn-gpu-intel @intel-innersource/dnn-gpu
/src/gpu/nvidia/ @oneapi-src/onednn-gpu-nvidia @intel-innersource/dnn-arch
/src/gpu/generic/ @oneapi-src/onednn-arch @intel-innersource/dnn-arch @intel-innersource/dnn-gpu
/src/gpu/generic/sycl/ @oneapi-src/onednn-gpu-generic @intel-innersource/dnn-arch @intel-innersource/dnn-gpu
/src/gpu/amd/ @uxlfoundation/onednn-gpu-amd
/src/gpu/intel/ @uxlfoundation/onednn-gpu-intel
/src/gpu/nvidia/ @uxlfoundation/onednn-gpu-nvidia
/src/gpu/generic/ @uxlfoundation/onednn-arch
/src/gpu/generic/sycl/ @uxlfoundation/onednn-gpu-generic

# Tests
/tests/benchdnn/inputs/ @oneapi-src/onednn-maintain @intel-innersource/dnn-arch @intel-innersource/dnn-cpu @intel-innersource/dnn-gpu
/tests/benchdnn/graph/ @oneapi-src/onednn-graph @oneapi-src/onednn-arch @intel-innersource/dnn-graph @intel-innersource/dnn-arch
/tests/benchdnn/inputs/graph/ @oneapi-src/onednn-graph @oneapi-src/onednn-arch @intel-innersource/dnn-graph @intel-innersource/dnn-arch
/tests/gtests/graph/ @oneapi-src/onednn-graph @intel-innersource/dnn-graph
/tests/benchdnn/inputs/ @uxlfoundation/onednn-maintain
/tests/benchdnn/graph/ @uxlfoundation/onednn-graph @uxlfoundation/onednn-arch
/tests/benchdnn/inputs/graph/ @uxlfoundation/onednn-graph @uxlfoundation/onednn-arch
/tests/gtests/graph/ @uxlfoundation/onednn-graph

# Graph API
/src/graph/ @oneapi-src/onednn-graph @intel-innersource/dnn-graph

# Graph compiler
/src/graph/backend/graph_compiler/ @intel-innersource/dnn-compiler
/tests/gtests/graph/unit/backend/graph_compiler/ @intel-innersource/dnn-compiler
/src/graph/ @uxlfoundation/onednn-graph

# Documentation
*.md @oneapi-src/onednn-doc @oneapi-src/onednn-arch @intel-innersource/dnn-doc @intel-innersource/dnn-arch
/doc/ @oneapi-src/onednn-doc @oneapi-src/onednn-arch @intel-innersource/dnn-doc @intel-innersource/dnn-arch
*.md @uxlfoundation/onednn-doc @uxlfoundation/onednn-arch
/doc/ @uxlfoundation/onednn-doc @uxlfoundation/onednn-arch

# Third party components
/third-party/ @uxlfoundation/onednn-arch
/third_party/level_zero/ @uxlfoundation/onednn-gpu-intel
/third_party/mdapi/ @uxlfoundation/onednn-gpu-intel
/third_party/ngen/ @uxlfoundation/onednn-gpu-intel
/third_party/xbyak/ @uxlfoundation/onednn-cpu-x64
/third_party/xbyak_aarch64/ @uxlfoundation/onednn-cpu-aarch64

# Governance and process
/.github/CODEOWNERS @oneapi-src/onednn-maintain
/SECURITY.md @oneapi-src/onednn-maintain
/MAINTAINERS.md @oneapi-src/onednn-maintain
/CONTRIBUTING.md @oneapi-src/onednn-maintain
/CODING_STANDARDS.md @oneapi-src/onednn-maintain
/CODE_OF_CONDUCT.md @oneapi-src/onednn-maintain
/.github/CODEOWNERS @uxlfoundation/onednn-maintain
/SECURITY.md @uxlfoundation/onednn-maintain
/MAINTAINERS.md @uxlfoundation/onednn-maintain
/CONTRIBUTING.md @uxlfoundation/onednn-maintain
/CODING_STANDARDS.md @uxlfoundation/onednn-maintain
/CODE_OF_CONDUCT.md @uxlfoundation/onednn-maintain
10 changes: 5 additions & 5 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ factors are considered important to reproduce an issue.

# Version
Report oneDNN version and githash. Version information is printed to stdout
in [verbose mode](https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html).
in [verbose mode](https://uxlfoundation.github.io/oneDNN/dev_guide_verbose.html).

# Environment
oneDNN includes hardware-specific optimizations and may behave
Expand All @@ -28,10 +28,10 @@ the following information to help reproduce the issue:

# Steps to reproduce
Please check that the issue is reproducible with the latest revision on
master. Include all the steps to reproduce the issue.
main. Include all the steps to reproduce the issue.

You can use [verbose mode](https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html)
and [benchdnn](https://github.com/oneapi-src/oneDNN/tree/master/tests/benchdnn)
You can use [verbose mode](https://uxlfoundation.github.io/oneDNN/dev_guide_verbose.html)
and [benchdnn](https://github.com/uxlfoundation/oneDNN/tree/main/tests/benchdnn)
to validate correctness of all primitives the library supports. If this does not
work a short C/C++ program or modified unit tests demonstrating the issue
will greatly help with the investigation.
Expand All @@ -40,7 +40,7 @@ will greatly help with the investigation.
Document behavior you observe. For performance defects, like performance
regressions or a function being slow, provide a log including output generated
by your application in
[verbose mode](https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html).
[verbose mode](https://uxlfoundation.github.io/oneDNN/dev_guide_verbose.html).

# Expected behavior
Document behavior you expect.
132 changes: 0 additions & 132 deletions .github/automation/.azure-pipeline.yml

This file was deleted.

54 changes: 54 additions & 0 deletions .github/automation/aarch64/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#! /bin/bash

# *******************************************************************************
# Copyright 2024 Arm Limited and affiliates.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# *******************************************************************************

# Build oneDNN for aarch64.

set -o errexit -o pipefail -o noclobber

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"

# Defines MP, CC, CXX and OS.
source ${SCRIPT_DIR}/common.sh

export ACL_ROOT_DIR=${ACL_ROOT_DIR:-"${PWD}/ComputeLibrary"}

CMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE:-"Release"}
ONEDNN_TEST_SET=${ONEDNN_TEST_SET:-"SMOKE"}
ONEDNN_BUILD_GRAPH=${ONEDNN_BUILD_GRAPH:-"ON"}

if [[ "$ONEDNN_ACTION" == "configure" ]]; then
set -x
cmake \
-Bbuild -S. \
-DDNNL_USE_ACL=ON \
-DONEDNN_BUILD_GRAPH=$ONEDNN_BUILD_GRAPH \
-DDNNL_CPU_RUNTIME=$ONEDNN_THREADING \
-DONEDNN_WERROR=ON \
-DDNNL_BUILD_FOR_CI=ON \
-DONEDNN_TEST_SET=$ONEDNN_TEST_SET \
-DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE
set +x
elif [[ "$ONEDNN_ACTION" == "build" ]]; then
set -x
cmake --build build
set +x
else
echo "Unknown action: $ONEDNN_ACTION"
exit 1
fi
81 changes: 81 additions & 0 deletions .github/automation/aarch64/build_acl.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#! /bin/bash

# *******************************************************************************
# Copyright 2020-2025 Arm Limited and affiliates.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# *******************************************************************************

# Build ACL from github.

set -o errexit -o pipefail -o noclobber

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"

# Defines MP, CC, CXX and OS.
source ${SCRIPT_DIR}/common.sh

ACL_BUILD_TYPE=${ACL_BUILD_TYPE:-"Release"}
ACL_ROOT_DIR=${ACL_ROOT_DIR:-"${PWD}/ComputeLibrary"}
ACL_REPO="https://github.com/ARM-software/ComputeLibrary.git"

if [[ "$ACL_THREADING" == "OMP" ]]; then
ACL_OPENMP=1
elif [[ "$ACL_THREADING" == "SEQ" ]]; then
ACL_OPENMP=0
fi

if [[ "$OS" == "Linux" ]]; then
ACL_MULTI_ISA_SUPPORT=1
if [[ "$ACL_THREADING" == "OMP" ]]; then
ACL_OPENMP=1
elif [[ "$ACL_THREADING" == "SEQ" ]]; then
ACL_OPENMP=0
fi
ACL_OS="linux"
elif [[ "$OS" == "Darwin" ]]; then
ACL_MULTI_ISA_SUPPORT=0
ACL_OPENMP=0
ACL_OS="macos"
else
echo "Unknown OS: $OS"
exit 1
fi

if [[ "$ACL_BUILD_TYPE" == "Release" ]]; then
ACL_DEBUG=0
elif [[ "$ACL_BUILD_TYPE" == "Debug" ]]; then
ACL_DEBUG=1
else
echo "Unknown build config: $ACL_BUILD_TYPE"
exit 1
fi

if [[ "$ACL_ACTION" == "clone" ]]; then
set -x
git clone --branch $ACL_VERSION --depth 1 $ACL_REPO $ACL_ROOT_DIR
set +x
elif [[ "$ACL_ACTION" == "build" ]]; then
set -x
cd $ACL_ROOT_DIR
set -x
scons $MP Werror=0 debug=$ACL_DEBUG neon=1 opencl=0 embed_kernels=0 \
os=$ACL_OS arch=armv8.2-a build=native multi_isa=$ACL_MULTI_ISA_SUPPORT \
fixed_format_kernels=1 cppthreads=0 openmp=$ACL_OPENMP examples=0 \
validation_tests=0
set +x
else
echo "Unknown action: $ACL_ACTION"
exit 1
fi
Loading
Loading