Draft: Performance fixes for DualLinearProblems #776

jClugstor · 2025-09-10T22:14:54Z

Checklist

Appropriate tests were added
Any code changes were done in a way that does not break public API
All documentation related to code changes were updated
The new code follows the
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Any new documentation only uses public API

Additional context

jClugstor · 2025-09-10T22:23:03Z

@oscardssmith this caches a lot more stuff during init, so should help performance.

The biggest source of allocations and something that takes a large chunk of time is the creation of the solution here:

LinearSolve.jl/ext/LinearSolveForwardDiffExt.jl

Line 115 in 547db8a

function linearsolve_dual_solution(u::AbstractArray, partials,

which can definitely be improved.

ext/LinearSolveForwardDiffExt.jl

jClugstor · 2025-09-11T21:23:05Z

Now I'm down to solve!(::DualLinearCache) only taking one alloc.

The getfields everywhere are because I overrided getproperty for DualLinearCache to go to getproperty of LinearCache if appropriate, but of course that's not type stable. Maybe using Val would help if we really wanted to change back to getproperty, not sure.

So with this branch:

using LinearSolve
using ForwardDiff
using Test
using Chairmarks

function h(p)
    (A=[p[1] p[2]+1 p[2]^3;
            3*p[1] p[1]+5 p[2]*p[1]-4;
            p[2]^2 9*p[1] p[2]],
        b=[p[1] + 1, p[2] * 2, p[1]^2])
end

A, b = h([ForwardDiff.Dual(5.0, 1.0, 0.0), ForwardDiff.Dual(5.0, 0.0, 1.0)])

prob = LinearProblem(A, b)
@b solve(prob, LUFactorization())
cache = init(prob, LUFactorization())
@b solve!(cache)
371.835 ns (1 allocs: 48 bytes)

Then LinearSolve 3.17:

@b solve!(cache)
74.021 ns (1 allocs: 48 bytes)

So this is a lot better than current main but still quite slower than before.

debug_dual.jl

oscardssmith · 2025-09-15T20:07:49Z

ext/LinearSolveForwardDiffExt.jl

+    cache.primal_u_cache .= cache.linear_cache.u
+
+    # Store solution metadata without copying - we'll return this
+    primal_sol = sol


delete this and just return sol?

oscardssmith · 2025-09-15T20:10:30Z

ext/LinearSolveForwardDiffExt.jl

+function update_partials_list!(partial_matrix::AbstractVector{T}, list_cache) where {T}
+    p = eachindex(first(partial_matrix))
+    for i in p
+        for j in eachindex(partial_matrix)
+            list_cache[i][j] = partial_matrix[j][i]
+        end
+    end
+    return list_cache
+end
+
+function update_partials_list!(partial_matrix, list_cache)


these functions seem to be just shufling data back and forth. Can you not do this without the shuffling?

The issue is that we need to take a matrix of ForwardDiff.Partials and convert them in to a list of matrices where each matrix holds corresponding entries of the Partials. We need to do this in order to have matrices that mul! and LinearSolve can actually act on, because arithmetic isn't defined for ForwardDiff.Partials. So I'm not sure if there's a way to accomplish that without this.

I guess one thing we could do is precompute the list of the necessary RHS matrices during init, since that will only need to be computed once, but would need to be recomputed if A or b change

jClugstor · 2025-09-15T21:14:09Z

FYI these changes also broke the nested dual number handling so I'm trying to figure that out

jClugstor · 2025-09-16T17:50:53Z

Nested Duals are back to working

ChrisRackauckas · 2025-09-21T07:01:34Z

what's left here?

jClugstor and others added 5 commits September 9, 2025 12:18

add Dual problem JET tests

d0ba95b

Update test/nopre/jet.jl

117d266

use dual_prob

5544871

precache more stuff

2a0ed86

add more caching

d9cf955

oscardssmith reviewed Sep 11, 2025

View reviewed changes

ext/LinearSolveForwardDiffExt.jl Outdated Show resolved Hide resolved

jClugstor added 2 commits September 11, 2025 14:14

use five arg mul!, improve caching

357579a

use getfield for DualCache

e13012f

oscardssmith reviewed Sep 15, 2025

View reviewed changes

debug_dual.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Sep 15, 2025

View reviewed changes

jClugstor force-pushed the forwarddiff_performance_fix branch from cb1580b to e13012f Compare September 15, 2025 21:29

jClugstor added 2 commits September 16, 2025 12:15

fix nested Duals

c902d61

branch for nested duals

5eb0356

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Draft: Performance fixes for DualLinearProblems #776

Draft: Performance fixes for DualLinearProblems #776

jClugstor commented Sep 10, 2025

Uh oh!

jClugstor commented Sep 10, 2025

Uh oh!

Uh oh!

jClugstor commented Sep 11, 2025

Uh oh!

Uh oh!

oscardssmith Sep 15, 2025

Uh oh!

oscardssmith Sep 15, 2025

Uh oh!

jClugstor Sep 16, 2025

Uh oh!

jClugstor Sep 16, 2025

Uh oh!

jClugstor commented Sep 15, 2025

Uh oh!

jClugstor commented Sep 16, 2025

Uh oh!

ChrisRackauckas commented Sep 21, 2025

Uh oh!

Uh oh!

Uh oh!

Draft: Performance fixes for DualLinearProblems #776

Are you sure you want to change the base?

Draft: Performance fixes for DualLinearProblems #776

Conversation

jClugstor commented Sep 10, 2025

Checklist

Additional context

Uh oh!

jClugstor commented Sep 10, 2025

Uh oh!

Uh oh!

jClugstor commented Sep 11, 2025

Uh oh!

Uh oh!

oscardssmith Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

oscardssmith Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

jClugstor Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

jClugstor Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

jClugstor commented Sep 15, 2025

Uh oh!

jClugstor commented Sep 16, 2025

Uh oh!

ChrisRackauckas commented Sep 21, 2025

Uh oh!

Uh oh!