Skip to content

Conversation

@tam724
Copy link
Contributor

@tam724 tam724 commented Nov 4, 2025

Closes #2952 and #2607.
The (m x 0) * (0 x n) matmatmul and the (m x 0) * (0) matvecmul edgecase should probably be tested in the GPUArrays.jl testsuite (for all GPU backends). I'll add a PR there (JuliaGPU/GPUArrays.jl#646).

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 10dbaec Previous: 1e35ff7 Ratio
latency/precompile 56839170229.5 ns 57073264350 ns 1.00
latency/ttfp 8293961507.5 ns 8361242115.5 ns 0.99
latency/import 4493983266 ns 4520871512 ns 0.99
integration/volumerhs 9608403 ns 9609145.5 ns 1.00
integration/byval/slices=1 146729 ns 146784 ns 1.00
integration/byval/slices=3 425688 ns 425930 ns 1.00
integration/byval/reference 144944 ns 144913 ns 1.00
integration/byval/slices=2 286200 ns 286275 ns 1.00
integration/cudadevrt 103408 ns 103477 ns 1.00
kernel/indexing 14162 ns 14088 ns 1.01
kernel/indexing_checked 14910 ns 14920 ns 1.00
kernel/occupancy 674.2215189873418 ns 670.5283018867924 ns 1.01
kernel/launch 2167 ns 2192.1111111111113 ns 0.99
kernel/rand 15573 ns 18597.5 ns 0.84
array/reverse/1d 20062 ns 19990 ns 1.00
array/reverse/2dL_inplace 66968 ns 66851 ns 1.00
array/reverse/1dL 70236 ns 70214 ns 1.00
array/reverse/2d 22059 ns 21764 ns 1.01
array/reverse/1d_inplace 11396 ns 9644 ns 1.18
array/reverse/2d_inplace 13382 ns 11083 ns 1.21
array/reverse/2dL 74226.5 ns 73680.5 ns 1.01
array/reverse/1dL_inplace 66779 ns 66780 ns 1.00
array/copy 20954.5 ns 20656 ns 1.01
array/iteration/findall/int 159452.5 ns 157234 ns 1.01
array/iteration/findall/bool 141317 ns 139637.5 ns 1.01
array/iteration/findfirst/int 162645 ns 161491 ns 1.01
array/iteration/findfirst/bool 163137.5 ns 161981.5 ns 1.01
array/iteration/scalar 73710 ns 72914 ns 1.01
array/iteration/logical 218890.5 ns 215503 ns 1.02
array/iteration/findmin/1d 54171.5 ns 52893.5 ns 1.02
array/iteration/findmin/2d 97228 ns 96673.5 ns 1.01
array/reductions/reduce/Int64/1d 44423 ns 43374 ns 1.02
array/reductions/reduce/Int64/dims=1 44937 ns 44924.5 ns 1.00
array/reductions/reduce/Int64/dims=2 61859 ns 61289 ns 1.01
array/reductions/reduce/Int64/dims=1L 89117 ns 89013 ns 1.00
array/reductions/reduce/Int64/dims=2L 88328.5 ns 88275 ns 1.00
array/reductions/reduce/Float32/1d 37481 ns 37043 ns 1.01
array/reductions/reduce/Float32/dims=1 42641 ns 43018 ns 0.99
array/reductions/reduce/Float32/dims=2 59997 ns 59774 ns 1.00
array/reductions/reduce/Float32/dims=1L 52520 ns 52409 ns 1.00
array/reductions/reduce/Float32/dims=2L 72293 ns 72278 ns 1.00
array/reductions/mapreduce/Int64/1d 44124 ns 43540 ns 1.01
array/reductions/mapreduce/Int64/dims=1 55556.5 ns 45057.5 ns 1.23
array/reductions/mapreduce/Int64/dims=2 62182 ns 61470 ns 1.01
array/reductions/mapreduce/Int64/dims=1L 89096.5 ns 88923 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 88416.5 ns 88349 ns 1.00
array/reductions/mapreduce/Float32/1d 36661.5 ns 36698 ns 1.00
array/reductions/mapreduce/Float32/dims=1 41802 ns 41442 ns 1.01
array/reductions/mapreduce/Float32/dims=2 60068 ns 59908 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52820 ns 52585 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 72324 ns 72014 ns 1.00
array/broadcast 19976 ns 20078 ns 0.99
array/copyto!/gpu_to_gpu 12947 ns 12908 ns 1.00
array/copyto!/cpu_to_gpu 217098 ns 213437 ns 1.02
array/copyto!/gpu_to_cpu 284144 ns 283206 ns 1.00
array/accumulate/Int64/1d 125267 ns 124198 ns 1.01
array/accumulate/Int64/dims=1 83680 ns 83165 ns 1.01
array/accumulate/Int64/dims=2 158512 ns 157631 ns 1.01
array/accumulate/Int64/dims=1L 1709548 ns 1709733 ns 1.00
array/accumulate/Int64/dims=2L 967025 ns 966057.5 ns 1.00
array/accumulate/Float32/1d 109355 ns 108414 ns 1.01
array/accumulate/Float32/dims=1 80542 ns 79731.5 ns 1.01
array/accumulate/Float32/dims=2 147497 ns 146657 ns 1.01
array/accumulate/Float32/dims=1L 1619041 ns 1616606.5 ns 1.00
array/accumulate/Float32/dims=2L 698387 ns 697417 ns 1.00
array/construct 1279.5 ns 1271.5 ns 1.01
array/random/randn/Float32 45369 ns 45612 ns 0.99
array/random/randn!/Float32 24921 ns 24822 ns 1.00
array/random/rand!/Int64 27323 ns 27264 ns 1.00
array/random/rand!/Float32 8827 ns 8854 ns 1.00
array/random/rand/Int64 30118 ns 29823 ns 1.01
array/random/rand/Float32 13144.5 ns 13073 ns 1.01
array/permutedims/4d 60086 ns 59525 ns 1.01
array/permutedims/2d 53908 ns 53919 ns 1.00
array/permutedims/3d 54858.5 ns 54583 ns 1.01
array/sorting/1d 2758934.5 ns 2757051 ns 1.00
array/sorting/by 3345603 ns 3344047 ns 1.00
array/sorting/2d 1081937 ns 1080794 ns 1.00
cuda/synchronization/stream/auto 1060.8 ns 1034 ns 1.03
cuda/synchronization/stream/nonblocking 8238.4 ns 8105 ns 1.02
cuda/synchronization/stream/blocking 811.7555555555556 ns 796.4842105263158 ns 1.02
cuda/synchronization/context/auto 1182.3 ns 1198.2 ns 0.99
cuda/synchronization/context/nonblocking 6948.4 ns 8018.6 ns 0.87
cuda/synchronization/context/blocking 905.9090909090909 ns 918.6428571428571 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.46%. Comparing base (b30cae9) to head (8f8a3da).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
lib/cublas/linalg.jl 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2958      +/-   ##
==========================================
+ Coverage   89.30%   89.46%   +0.15%     
==========================================
  Files         150      150              
  Lines       13084    13087       +3     
==========================================
+ Hits        11685    11708      +23     
+ Misses       1399     1379      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kshyatt kshyatt merged commit 2e983fe into JuliaGPU:master Nov 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong matmul with empty matrices

2 participants