-
Notifications
You must be signed in to change notification settings - Fork 256
Fixes zero-dim matmatmul & matvecmul #2958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
| Benchmark suite | Current: 10dbaec | Previous: 1e35ff7 | Ratio |
|---|---|---|---|
latency/precompile |
56839170229.5 ns |
57073264350 ns |
1.00 |
latency/ttfp |
8293961507.5 ns |
8361242115.5 ns |
0.99 |
latency/import |
4493983266 ns |
4520871512 ns |
0.99 |
integration/volumerhs |
9608403 ns |
9609145.5 ns |
1.00 |
integration/byval/slices=1 |
146729 ns |
146784 ns |
1.00 |
integration/byval/slices=3 |
425688 ns |
425930 ns |
1.00 |
integration/byval/reference |
144944 ns |
144913 ns |
1.00 |
integration/byval/slices=2 |
286200 ns |
286275 ns |
1.00 |
integration/cudadevrt |
103408 ns |
103477 ns |
1.00 |
kernel/indexing |
14162 ns |
14088 ns |
1.01 |
kernel/indexing_checked |
14910 ns |
14920 ns |
1.00 |
kernel/occupancy |
674.2215189873418 ns |
670.5283018867924 ns |
1.01 |
kernel/launch |
2167 ns |
2192.1111111111113 ns |
0.99 |
kernel/rand |
15573 ns |
18597.5 ns |
0.84 |
array/reverse/1d |
20062 ns |
19990 ns |
1.00 |
array/reverse/2dL_inplace |
66968 ns |
66851 ns |
1.00 |
array/reverse/1dL |
70236 ns |
70214 ns |
1.00 |
array/reverse/2d |
22059 ns |
21764 ns |
1.01 |
array/reverse/1d_inplace |
11396 ns |
9644 ns |
1.18 |
array/reverse/2d_inplace |
13382 ns |
11083 ns |
1.21 |
array/reverse/2dL |
74226.5 ns |
73680.5 ns |
1.01 |
array/reverse/1dL_inplace |
66779 ns |
66780 ns |
1.00 |
array/copy |
20954.5 ns |
20656 ns |
1.01 |
array/iteration/findall/int |
159452.5 ns |
157234 ns |
1.01 |
array/iteration/findall/bool |
141317 ns |
139637.5 ns |
1.01 |
array/iteration/findfirst/int |
162645 ns |
161491 ns |
1.01 |
array/iteration/findfirst/bool |
163137.5 ns |
161981.5 ns |
1.01 |
array/iteration/scalar |
73710 ns |
72914 ns |
1.01 |
array/iteration/logical |
218890.5 ns |
215503 ns |
1.02 |
array/iteration/findmin/1d |
54171.5 ns |
52893.5 ns |
1.02 |
array/iteration/findmin/2d |
97228 ns |
96673.5 ns |
1.01 |
array/reductions/reduce/Int64/1d |
44423 ns |
43374 ns |
1.02 |
array/reductions/reduce/Int64/dims=1 |
44937 ns |
44924.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
61859 ns |
61289 ns |
1.01 |
array/reductions/reduce/Int64/dims=1L |
89117 ns |
89013 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
88328.5 ns |
88275 ns |
1.00 |
array/reductions/reduce/Float32/1d |
37481 ns |
37043 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
42641 ns |
43018 ns |
0.99 |
array/reductions/reduce/Float32/dims=2 |
59997 ns |
59774 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
52520 ns |
52409 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
72293 ns |
72278 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
44124 ns |
43540 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1 |
55556.5 ns |
45057.5 ns |
1.23 |
array/reductions/mapreduce/Int64/dims=2 |
62182 ns |
61470 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
89096.5 ns |
88923 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
88416.5 ns |
88349 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
36661.5 ns |
36698 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
41802 ns |
41442 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
60068 ns |
59908 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52820 ns |
52585 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
72324 ns |
72014 ns |
1.00 |
array/broadcast |
19976 ns |
20078 ns |
0.99 |
array/copyto!/gpu_to_gpu |
12947 ns |
12908 ns |
1.00 |
array/copyto!/cpu_to_gpu |
217098 ns |
213437 ns |
1.02 |
array/copyto!/gpu_to_cpu |
284144 ns |
283206 ns |
1.00 |
array/accumulate/Int64/1d |
125267 ns |
124198 ns |
1.01 |
array/accumulate/Int64/dims=1 |
83680 ns |
83165 ns |
1.01 |
array/accumulate/Int64/dims=2 |
158512 ns |
157631 ns |
1.01 |
array/accumulate/Int64/dims=1L |
1709548 ns |
1709733 ns |
1.00 |
array/accumulate/Int64/dims=2L |
967025 ns |
966057.5 ns |
1.00 |
array/accumulate/Float32/1d |
109355 ns |
108414 ns |
1.01 |
array/accumulate/Float32/dims=1 |
80542 ns |
79731.5 ns |
1.01 |
array/accumulate/Float32/dims=2 |
147497 ns |
146657 ns |
1.01 |
array/accumulate/Float32/dims=1L |
1619041 ns |
1616606.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
698387 ns |
697417 ns |
1.00 |
array/construct |
1279.5 ns |
1271.5 ns |
1.01 |
array/random/randn/Float32 |
45369 ns |
45612 ns |
0.99 |
array/random/randn!/Float32 |
24921 ns |
24822 ns |
1.00 |
array/random/rand!/Int64 |
27323 ns |
27264 ns |
1.00 |
array/random/rand!/Float32 |
8827 ns |
8854 ns |
1.00 |
array/random/rand/Int64 |
30118 ns |
29823 ns |
1.01 |
array/random/rand/Float32 |
13144.5 ns |
13073 ns |
1.01 |
array/permutedims/4d |
60086 ns |
59525 ns |
1.01 |
array/permutedims/2d |
53908 ns |
53919 ns |
1.00 |
array/permutedims/3d |
54858.5 ns |
54583 ns |
1.01 |
array/sorting/1d |
2758934.5 ns |
2757051 ns |
1.00 |
array/sorting/by |
3345603 ns |
3344047 ns |
1.00 |
array/sorting/2d |
1081937 ns |
1080794 ns |
1.00 |
cuda/synchronization/stream/auto |
1060.8 ns |
1034 ns |
1.03 |
cuda/synchronization/stream/nonblocking |
8238.4 ns |
8105 ns |
1.02 |
cuda/synchronization/stream/blocking |
811.7555555555556 ns |
796.4842105263158 ns |
1.02 |
cuda/synchronization/context/auto |
1182.3 ns |
1198.2 ns |
0.99 |
cuda/synchronization/context/nonblocking |
6948.4 ns |
8018.6 ns |
0.87 |
cuda/synchronization/context/blocking |
905.9090909090909 ns |
918.6428571428571 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2958 +/- ##
==========================================
+ Coverage 89.30% 89.46% +0.15%
==========================================
Files 150 150
Lines 13084 13087 +3
==========================================
+ Hits 11685 11708 +23
+ Misses 1399 1379 -20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d335b2c to
10dbaec
Compare
10dbaec to
8f8a3da
Compare
Closes #2952 and #2607.
The (m x 0) * (0 x n) matmatmul and the (m x 0) * (0) matvecmul edgecase should probably be tested in the GPUArrays.jl testsuite (for all GPU backends). I'll add a PR there (JuliaGPU/GPUArrays.jl#646).