-
Notifications
You must be signed in to change notification settings - Fork 256
Fix complex CSC * dense vec #2957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/lib/cusparse/generic.jl b/lib/cusparse/generic.jl
index 64d904c39..dcf1c939d 100644
--- a/lib/cusparse/generic.jl
+++ b/lib/cusparse/generic.jl
@@ -159,7 +159,7 @@ function mv!(transa::SparseChar, alpha::Number, A::Union{CuSparseMatrixCSC{TA},C
transa = T <: Real && transa == 'C' ? 'T' : transa
descA = CuSparseMatrixDescriptor(A, index)
- m,n = size(A)
+ m, n = size(A)
if transa == 'N'
chkmvdims(X,n,Y,m)
diff --git a/test/libraries/cusparse/interfaces.jl b/test/libraries/cusparse/interfaces.jl
index 0413197df..293f61f87 100644
--- a/test/libraries/cusparse/interfaces.jl
+++ b/test/libraries/cusparse/interfaces.jl
@@ -154,7 +154,7 @@ nB = 2
end
@testset "A * CuSparseVector" begin
@testset "A * b" begin
- c = opa(geam_A) * b_spvec
+ c = opa(geam_A) * b_spvec
dc = opa(d_geam_A) * db_spvec
@test c ≈ collect(dc)
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
| Benchmark suite | Current: 685dc94 | Previous: 1e35ff7 | Ratio |
|---|---|---|---|
latency/precompile |
56619126061 ns |
57073264350 ns |
0.99 |
latency/ttfp |
8291755377 ns |
8361242115.5 ns |
0.99 |
latency/import |
4492967306 ns |
4520871512 ns |
0.99 |
integration/volumerhs |
9595387 ns |
9609145.5 ns |
1.00 |
integration/byval/slices=1 |
146728 ns |
146784 ns |
1.00 |
integration/byval/slices=3 |
425585 ns |
425930 ns |
1.00 |
integration/byval/reference |
144984 ns |
144913 ns |
1.00 |
integration/byval/slices=2 |
286453.5 ns |
286275 ns |
1.00 |
integration/cudadevrt |
103539 ns |
103477 ns |
1.00 |
kernel/indexing |
14186 ns |
14088 ns |
1.01 |
kernel/indexing_checked |
14975 ns |
14920 ns |
1.00 |
kernel/occupancy |
667.746835443038 ns |
670.5283018867924 ns |
1.00 |
kernel/launch |
2142.5555555555557 ns |
2192.1111111111113 ns |
0.98 |
kernel/rand |
15857 ns |
18597.5 ns |
0.85 |
array/reverse/1d |
19903 ns |
19990 ns |
1.00 |
array/reverse/2dL_inplace |
66810 ns |
66851 ns |
1.00 |
array/reverse/1dL |
69983.5 ns |
70214 ns |
1.00 |
array/reverse/2d |
21749 ns |
21764 ns |
1.00 |
array/reverse/1d_inplace |
9505 ns |
9644 ns |
0.99 |
array/reverse/2d_inplace |
10943.5 ns |
11083 ns |
0.99 |
array/reverse/2dL |
73754 ns |
73680.5 ns |
1.00 |
array/reverse/1dL_inplace |
66778 ns |
66780 ns |
1.00 |
array/copy |
20644 ns |
20656 ns |
1.00 |
array/iteration/findall/int |
157348 ns |
157234 ns |
1.00 |
array/iteration/findall/bool |
140113 ns |
139637.5 ns |
1.00 |
array/iteration/findfirst/int |
160550.5 ns |
161491 ns |
0.99 |
array/iteration/findfirst/bool |
161430 ns |
161981.5 ns |
1.00 |
array/iteration/scalar |
73049 ns |
72914 ns |
1.00 |
array/iteration/logical |
216323 ns |
215503 ns |
1.00 |
array/iteration/findmin/1d |
49769 ns |
52893.5 ns |
0.94 |
array/iteration/findmin/2d |
96270.5 ns |
96673.5 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43423.5 ns |
43374 ns |
1.00 |
array/reductions/reduce/Int64/dims=1 |
44705.5 ns |
44924.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
61451 ns |
61289 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
88814 ns |
89013 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
87769 ns |
88275 ns |
0.99 |
array/reductions/reduce/Float32/1d |
36777 ns |
37043 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
43032 ns |
43018 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
59697 ns |
59774 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
52390 ns |
52409 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
71977 ns |
72278 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
43233 ns |
43540 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
45104 ns |
45057.5 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
61556 ns |
61470 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
89027 ns |
88923 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
87961 ns |
88349 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
36574 ns |
36698 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
41689 ns |
41442 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
60043 ns |
59908 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52538 ns |
52585 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
71827 ns |
72014 ns |
1.00 |
array/broadcast |
19968 ns |
20078 ns |
0.99 |
array/copyto!/gpu_to_gpu |
11432 ns |
12908 ns |
0.89 |
array/copyto!/cpu_to_gpu |
214849 ns |
213437 ns |
1.01 |
array/copyto!/gpu_to_cpu |
282616 ns |
283206 ns |
1.00 |
array/accumulate/Int64/1d |
124396 ns |
124198 ns |
1.00 |
array/accumulate/Int64/dims=1 |
83351.5 ns |
83165 ns |
1.00 |
array/accumulate/Int64/dims=2 |
158000 ns |
157631 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1708619 ns |
1709733 ns |
1.00 |
array/accumulate/Int64/dims=2L |
966738 ns |
966057.5 ns |
1.00 |
array/accumulate/Float32/1d |
109104.5 ns |
108414 ns |
1.01 |
array/accumulate/Float32/dims=1 |
80023 ns |
79731.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
147348 ns |
146657 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1617868.5 ns |
1616606.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
697983 ns |
697417 ns |
1.00 |
array/construct |
1255.4 ns |
1271.5 ns |
0.99 |
array/random/randn/Float32 |
48671 ns |
45612 ns |
1.07 |
array/random/randn!/Float32 |
24759.5 ns |
24822 ns |
1.00 |
array/random/rand!/Int64 |
27414 ns |
27264 ns |
1.01 |
array/random/rand!/Float32 |
8826.666666666666 ns |
8854 ns |
1.00 |
array/random/rand/Int64 |
31377 ns |
29823 ns |
1.05 |
array/random/rand/Float32 |
13141 ns |
13073 ns |
1.01 |
array/permutedims/4d |
60145 ns |
59525 ns |
1.01 |
array/permutedims/2d |
54287 ns |
53919 ns |
1.01 |
array/permutedims/3d |
54742 ns |
54583 ns |
1.00 |
array/sorting/1d |
2777773 ns |
2757051 ns |
1.01 |
array/sorting/by |
3368935 ns |
3344047 ns |
1.01 |
array/sorting/2d |
1088340.5 ns |
1080794 ns |
1.01 |
cuda/synchronization/stream/auto |
1012.8181818181819 ns |
1034 ns |
0.98 |
cuda/synchronization/stream/nonblocking |
7589.5 ns |
8105 ns |
0.94 |
cuda/synchronization/stream/blocking |
818.8602150537635 ns |
796.4842105263158 ns |
1.03 |
cuda/synchronization/context/auto |
1155.3 ns |
1198.2 ns |
0.96 |
cuda/synchronization/context/nonblocking |
8438.4 ns |
8018.6 ns |
1.05 |
cuda/synchronization/context/blocking |
896.4255319148937 ns |
918.6428571428571 ns |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
maleadt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
|
Hi, would it be possible to bump a new (maybe patch) release after this PR? Since it is an important bugfix, I would like to set compat for our package to ignore |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2957 +/- ##
===========================================
+ Coverage 12.17% 89.45% +77.27%
===========================================
Files 147 150 +3
Lines 12870 13084 +214
===========================================
+ Hits 1567 11704 +10137
+ Misses 11303 1380 -9923 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@kshyatt Thank you very much ! |
Hopefully fixes #2945, worked for me locally. We no longer need the deleted code in
lib/cusparse/generic.jlbecause we're firmly in the CUSPARSE 12.x era.