Skip to content

Conversation

@kshyatt
Copy link
Member

@kshyatt kshyatt commented Nov 1, 2025

Hopefully fixes #2945, worked for me locally. We no longer need the deleted code in lib/cusparse/generic.jl because we're firmly in the CUSPARSE 12.x era.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/lib/cusparse/generic.jl b/lib/cusparse/generic.jl
index 64d904c39..dcf1c939d 100644
--- a/lib/cusparse/generic.jl
+++ b/lib/cusparse/generic.jl
@@ -159,7 +159,7 @@ function mv!(transa::SparseChar, alpha::Number, A::Union{CuSparseMatrixCSC{TA},C
     transa = T <: Real && transa == 'C' ? 'T' : transa
 
     descA = CuSparseMatrixDescriptor(A, index)
-    m,n = size(A)
+    m, n = size(A)
 
     if transa == 'N'
         chkmvdims(X,n,Y,m)
diff --git a/test/libraries/cusparse/interfaces.jl b/test/libraries/cusparse/interfaces.jl
index 0413197df..293f61f87 100644
--- a/test/libraries/cusparse/interfaces.jl
+++ b/test/libraries/cusparse/interfaces.jl
@@ -154,7 +154,7 @@ nB = 2
                         end
                         @testset "A * CuSparseVector" begin
                             @testset "A * b" begin
-                                c  = opa(geam_A) * b_spvec
+                                c = opa(geam_A) * b_spvec
                                 dc = opa(d_geam_A) * db_spvec
                                 @test c ≈ collect(dc)
                             end

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 685dc94 Previous: 1e35ff7 Ratio
latency/precompile 56619126061 ns 57073264350 ns 0.99
latency/ttfp 8291755377 ns 8361242115.5 ns 0.99
latency/import 4492967306 ns 4520871512 ns 0.99
integration/volumerhs 9595387 ns 9609145.5 ns 1.00
integration/byval/slices=1 146728 ns 146784 ns 1.00
integration/byval/slices=3 425585 ns 425930 ns 1.00
integration/byval/reference 144984 ns 144913 ns 1.00
integration/byval/slices=2 286453.5 ns 286275 ns 1.00
integration/cudadevrt 103539 ns 103477 ns 1.00
kernel/indexing 14186 ns 14088 ns 1.01
kernel/indexing_checked 14975 ns 14920 ns 1.00
kernel/occupancy 667.746835443038 ns 670.5283018867924 ns 1.00
kernel/launch 2142.5555555555557 ns 2192.1111111111113 ns 0.98
kernel/rand 15857 ns 18597.5 ns 0.85
array/reverse/1d 19903 ns 19990 ns 1.00
array/reverse/2dL_inplace 66810 ns 66851 ns 1.00
array/reverse/1dL 69983.5 ns 70214 ns 1.00
array/reverse/2d 21749 ns 21764 ns 1.00
array/reverse/1d_inplace 9505 ns 9644 ns 0.99
array/reverse/2d_inplace 10943.5 ns 11083 ns 0.99
array/reverse/2dL 73754 ns 73680.5 ns 1.00
array/reverse/1dL_inplace 66778 ns 66780 ns 1.00
array/copy 20644 ns 20656 ns 1.00
array/iteration/findall/int 157348 ns 157234 ns 1.00
array/iteration/findall/bool 140113 ns 139637.5 ns 1.00
array/iteration/findfirst/int 160550.5 ns 161491 ns 0.99
array/iteration/findfirst/bool 161430 ns 161981.5 ns 1.00
array/iteration/scalar 73049 ns 72914 ns 1.00
array/iteration/logical 216323 ns 215503 ns 1.00
array/iteration/findmin/1d 49769 ns 52893.5 ns 0.94
array/iteration/findmin/2d 96270.5 ns 96673.5 ns 1.00
array/reductions/reduce/Int64/1d 43423.5 ns 43374 ns 1.00
array/reductions/reduce/Int64/dims=1 44705.5 ns 44924.5 ns 1.00
array/reductions/reduce/Int64/dims=2 61451 ns 61289 ns 1.00
array/reductions/reduce/Int64/dims=1L 88814 ns 89013 ns 1.00
array/reductions/reduce/Int64/dims=2L 87769 ns 88275 ns 0.99
array/reductions/reduce/Float32/1d 36777 ns 37043 ns 0.99
array/reductions/reduce/Float32/dims=1 43032 ns 43018 ns 1.00
array/reductions/reduce/Float32/dims=2 59697 ns 59774 ns 1.00
array/reductions/reduce/Float32/dims=1L 52390 ns 52409 ns 1.00
array/reductions/reduce/Float32/dims=2L 71977 ns 72278 ns 1.00
array/reductions/mapreduce/Int64/1d 43233 ns 43540 ns 0.99
array/reductions/mapreduce/Int64/dims=1 45104 ns 45057.5 ns 1.00
array/reductions/mapreduce/Int64/dims=2 61556 ns 61470 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 89027 ns 88923 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 87961 ns 88349 ns 1.00
array/reductions/mapreduce/Float32/1d 36574 ns 36698 ns 1.00
array/reductions/mapreduce/Float32/dims=1 41689 ns 41442 ns 1.01
array/reductions/mapreduce/Float32/dims=2 60043 ns 59908 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52538 ns 52585 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 71827 ns 72014 ns 1.00
array/broadcast 19968 ns 20078 ns 0.99
array/copyto!/gpu_to_gpu 11432 ns 12908 ns 0.89
array/copyto!/cpu_to_gpu 214849 ns 213437 ns 1.01
array/copyto!/gpu_to_cpu 282616 ns 283206 ns 1.00
array/accumulate/Int64/1d 124396 ns 124198 ns 1.00
array/accumulate/Int64/dims=1 83351.5 ns 83165 ns 1.00
array/accumulate/Int64/dims=2 158000 ns 157631 ns 1.00
array/accumulate/Int64/dims=1L 1708619 ns 1709733 ns 1.00
array/accumulate/Int64/dims=2L 966738 ns 966057.5 ns 1.00
array/accumulate/Float32/1d 109104.5 ns 108414 ns 1.01
array/accumulate/Float32/dims=1 80023 ns 79731.5 ns 1.00
array/accumulate/Float32/dims=2 147348 ns 146657 ns 1.00
array/accumulate/Float32/dims=1L 1617868.5 ns 1616606.5 ns 1.00
array/accumulate/Float32/dims=2L 697983 ns 697417 ns 1.00
array/construct 1255.4 ns 1271.5 ns 0.99
array/random/randn/Float32 48671 ns 45612 ns 1.07
array/random/randn!/Float32 24759.5 ns 24822 ns 1.00
array/random/rand!/Int64 27414 ns 27264 ns 1.01
array/random/rand!/Float32 8826.666666666666 ns 8854 ns 1.00
array/random/rand/Int64 31377 ns 29823 ns 1.05
array/random/rand/Float32 13141 ns 13073 ns 1.01
array/permutedims/4d 60145 ns 59525 ns 1.01
array/permutedims/2d 54287 ns 53919 ns 1.01
array/permutedims/3d 54742 ns 54583 ns 1.00
array/sorting/1d 2777773 ns 2757051 ns 1.01
array/sorting/by 3368935 ns 3344047 ns 1.01
array/sorting/2d 1088340.5 ns 1080794 ns 1.01
cuda/synchronization/stream/auto 1012.8181818181819 ns 1034 ns 0.98
cuda/synchronization/stream/nonblocking 7589.5 ns 8105 ns 0.94
cuda/synchronization/stream/blocking 818.8602150537635 ns 796.4842105263158 ns 1.03
cuda/synchronization/context/auto 1155.3 ns 1198.2 ns 0.96
cuda/synchronization/context/nonblocking 8438.4 ns 8018.6 ns 1.05
cuda/synchronization/context/blocking 896.4255319148937 ns 918.6428571428571 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@maleadt maleadt added bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers. labels Nov 7, 2025
@kshyatt kshyatt enabled auto-merge (squash) November 10, 2025 07:49
@ytdHuang
Copy link

Hi, would it be possible to bump a new (maybe patch) release after this PR?

Since it is an important bugfix, I would like to set compat for our package to ignore 5.9.0 - 5.9.3

@kshyatt kshyatt merged commit b30cae9 into master Nov 12, 2025
3 checks passed
@kshyatt kshyatt deleted the ksh/mv branch November 12, 2025 01:24
@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.45%. Comparing base (11256ab) to head (a9f0f11).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2957       +/-   ##
===========================================
+ Coverage   12.17%   89.45%   +77.27%     
===========================================
  Files         147      150        +3     
  Lines       12870    13084      +214     
===========================================
+ Hits         1567    11704    +10137     
+ Misses      11303     1380     -9923     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kshyatt
Copy link
Member Author

kshyatt commented Nov 12, 2025

@ytdHuang I will tag a new version later today, got another fix in on master and trying to get #2962 in as well. But either way will tag by EOD.

@ytdHuang
Copy link

@kshyatt Thank you very much !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix This gets something working again. cuda libraries Stuff about CUDA library wrappers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong sparse matrix-vector multiplication after v5.9+

4 participants