-
Notifications
You must be signed in to change notification settings - Fork 49
[WIP] Switch to unconstrained optimizers - step 1 #840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'll look into the failures on evaluation of the logdet in the model for the Dyestuff2 data tomorrow. It should be a straightforward evaluation because the model (should) converge to \theta = 0. |
I had the wrong location for the first failure in the tests - it was in checking the fit of |
I am trying to determine where the iterations in LN_BOBYQA start to differ in the I propose:
Any other ideas of how to proceed are welcome. |
I would expect item two to be the more fruitful one, i.e. to focus on differences related to evaluating just the objective function across microarchitectures and leave out the effect of the optimization algorithm. I think it would be interesting to see differences in the initial values and in a value where the two start to differ. It doesn't matter which of the microarchitectures created the value, but the same should be used when evaluating both. Again, to leave out the effect of the optimization. I'm speculating if the objective function is ill-conditioned for some input parameters. (Here, I mean ill-conditioned in the general sense that measures an arbitrary function and not just the linear solve conditioning.) I think it would be beneficial to stick with BOBYQA for now. It means one moving part less, and I don't suspect that disabling the bounds in BOBYQA by setting the lower bound to |
In addition to checking for consistency in the evaluations at the initial values, which tend to have exact integer parameter values for the first few evaluations of the objective in the optimization, I was thinking of using extended precision to evaluate a "reference" value for the result. For a simple model, such as that for the penicillin data, the calculations are all scalar evaluations except for a rankUpdate operation and the Cholesky factorization of the resultant 6 by 6 positive definite matrix. @andreasnoack or @palday Would you have recommendations for what extended precision representation to use? There is BigFloat but I think there is another representation that essentially extends the mantisa of a Float64. I can't remember the name, unfortunately. |
@andreasnoack I am okay with using LN_BOBYQA as the default for a while longer but definitely feel we should eliminate the lowerbd stuff if we are not going to use it. It may be a result of getting older and less mentally flexible but I am finding the current code to be overly complex in places. Like any large project there is a tendency to add code to patch special cases and justify this by adding more tests. Eventually a person looking at the code for the first time is left to wonder "Why is there all this code about a lower bound when the lower bound is always set to -Inf?". If you don't know the history it is nonsense. |
It seems that some early differences in the evaluation of the objective for even a simple model like that for the On a Macbook with multiple threads there are differences in the evaluations of the objective according to whether AppleAccelerate or OpenBLAS are used dmbates@MacBookPro MixedModels % julia -tauto --sta=no --quiet --proj # use OpenBLAS
julia> using MixedModels
julia> include("./test/modelcache.jl")
models (generic function with 1 method)
julia> m = refit!(first(models(:penicillin)); fitlog=true);
julia> first(m.optsum.fitlog, 8)
8-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.0, 1.0], 364.6267798165887)
([2.0, 1.0], 369.17598910041096)
([1.0, 2.0], 341.1011743244961)
([0.0, 1.0], 441.9062007516885)
([1.0, 0.0], 634.3549438457203)
([1.444405652744627, 1.5955538627700872], 340.7336990803933)
([0.49423753019996997, 1.907291795816359], 376.6434207973661)
([1.8130833383192728, 1.5302193144504488], 343.9702954340243)
julia> last(m.optsum.fitlog, 4)
4-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.5375663750437891, 3.2196669828561943], 332.18834868575027)
([1.5374512614983462, 3.219888903724198], 332.1883489935133)
([1.5375635758177966, 3.219607909552041], 332.18834870145923)
([1.5375908459147303, 3.2196603414283036], 332.1883486779673)
dmbates@MacBookPro MixedModels % julia -tauto --quiet --proj # use AppleAccelerate
[ Info: using OhMyREPL.jl
[ Info: using AppleAccelerate.jl
[ Info: using Revise.jl
julia> using MixedModels
julia> include("./test/modelcache.jl")
models (generic function with 1 method)
julia> m = refit!(first(models(:penicillin)); fitlog=true);
julia> first(m.optsum.fitlog, 8)
8-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.0, 1.0], 364.62677981658675)
([2.0, 1.0], 369.17598910040977)
([1.0, 2.0], 341.1011743245018)
([0.0, 1.0], 441.9062007516885)
([1.0, 0.0], 634.3549438457203)
([1.4444056527446194, 1.5955538627700523], 340.73369908052746)
([0.49423753020029537, 1.9072917958173385], 376.6434207973354)
([1.8130833383206328, 1.5302193144497196], 343.9702954341842)
julia> last(m.optsum.fitlog, 4)
4-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.5375832064702692, 3.219592495951509], 332.1883486973754)
([1.5375061859551695, 3.2196562755740685], 332.1883487819558)
([1.5375898376921937, 3.219692275843751], 332.188348672713)
([1.5375939045981573, 3.219792193110907], 332.1883486700085) So right from the beginning the evaluation of the objective is a bit different for AppleAccelerate vs OpenBLAS and from there on the trajectory will be slightly different |
On an x86_64 machine I get, with OpenBLAS dmbates@plinth:~/.julia/dev/MixedModels$ julia -t8 --sta=no --quiet --proj
julia> using MixedModels
julia> include("./test/modelcache.jl")
models (generic function with 1 method)
julia> m = refit!(only(models(:penicillin)); fitlog=true);
julia> first(m.optsum.fitlog, 8)
8-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.0, 1.0], 364.6267798165887)
([2.0, 1.0], 369.17598910041096)
([1.0, 2.0], 341.1011743244961)
([0.0, 1.0], 441.9062007516885)
([1.0, 0.0], 634.3549438457203)
([1.444405652744627, 1.5955538627700872], 340.7336990803923)
([0.4942375301999674, 1.9072917958163513], 376.64342079733973)
([1.8130833383189084, 1.5302193144504623], 343.97029543404193)
julia> last(m.optsum.fitlog, 4)
4-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.5368792590556977, 3.2176454905841227], 332.1883597436415)
([1.5375182176811606, 3.219536746361558], 332.18834878933217)
([1.537643747886617, 3.2205288361596702], 332.18834933611157)
([1.5375133789110618, 3.219886141303852], 332.18834878807843) and with MKL dmbates@plinth:~/.julia/dev/MixedModels$ julia -t8 --quiet --proj
julia> using MixedModels
julia> include("./test/modelcache.jl")
models (generic function with 1 method)
julia> m = refit!(only(models(:penicillin)); fitlog=true);
julia> first(m.optsum.fitlog, 8)
8-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.0, 1.0], 364.62677981659255)
([2.0, 1.0], 369.1759891004115)
([1.0, 2.0], 341.10117432449795)
([0.0, 1.0], 441.90620075169636)
([1.0, 0.0], 634.3549438457203)
([1.4444056527446676, 1.5955538627700978], 340.73369908034823)
([0.4942375301998896, 1.9072917958160005], 376.64342079740277)
([1.8130833383190654, 1.530219314450679], 343.97029543397673)
julia> last(m.optsum.fitlog, 4)
4-element Vector{Tuple{Vector{Float64}, Float64}}:
([1.5375663667436432, 3.2196670002814987], 332.1883486857041)
([1.537451252957679, 3.21988892102474], 332.18834899335576)
([1.5375634872760895, 3.2196078285002994], 332.18834870172884)
([1.5375910290916945, 3.2196608178035246], 332.18834867791) |
Some of the most interesting results are for the parameter vectors |
It is even more simplified in the case of the julia> m.L[3]
6×6 Matrix{Float64}:
5.0 0.0 0.0 0.0 0.0 0.0
0.0 5.0 0.0 0.0 0.0 0.0
0.0 0.0 5.0 0.0 0.0 0.0
0.0 0.0 0.0 5.0 0.0 0.0
0.0 0.0 0.0 0.0 5.0 0.0
0.0 0.0 0.0 0.0 0.0 5.0
julia> m.L[1]
24×24 Diagonal{Float64, Vector{Float64}}:
1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0 ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1.0
julia> m.L[6] # the last diagonal block (conceptually this is LowerTriangular, the [1,2] position is used as scratch storage)
2×2 Matrix{Float64}:
2.4 3308.0
55.1333 12.5951
julia> pwrss(m) # penalized sum of squared residuals
158.6355555555629
julia> abs2(last(last(m.L))) # the square of the [2,2] element of m.L[6]
158.6355555555629
julia> 2 * 6 * log(5.) # logdet of the [2,2] block of L (recall the logdet is the logarithm of the determinant of L*L', not just L
19.313254949209202
julia> logdet(m)
19.313254949209202 |
When using MKL on x86_64 the last element of julia> abs2(last(last(m.L)))
158.6355555555715 We can reproduce the calculation using julia> m.L[5] == m.A[5] ./ 5.0
true So the downdate of julia> BigFloat.(last(m.A))
2×2 Matrix{BigFloat}:
144.0 3308.0
3308.0 76582.0
julia> m.L[5]
2×6 Matrix{Float64}:
4.8 4.8 4.8 4.8 4.8 4.8
120.8 105.4 119.6 109.8 110.2 95.8
julia> L5 = BigFloat.(m.A[5]) ./ 5
2×6 Matrix{BigFloat}:
4.8 4.8 4.8 4.8 4.8 4.8
120.8 105.4 119.6 109.8 110.2 95.8
julia> first(L5)
4.800000000000000000000000000000000000000000000000000000000000000000000000000028
julia> L6 = cholesky(Symmetric(BigFloat.(last(m.A)) - L5 * L5', :L))
Cholesky{BigFloat, Matrix{BigFloat}}
L factor:
2×2 LowerTriangular{BigFloat, Matrix{BigFloat}}:
2.4 ⋅
55.1333 12.5951
julia> last(L6.L)
12.59506076029629781458345910991177392468429196879571875004312834742566832337298
julia> last(last(m.L))
12.595060760296931 When using OpenBLAS on x86_64 the results are julia> m = refit!(only(models(:penicillin)); fitlog=true);
julia> objective!(m, [0.0, 1.0])
441.9062007516885
julia> last(last(m.L))
12.595060760296588 so the OpenBLAS result is slighly closer to the BigFloat value. Frankly I am a bit surprised that there is a difference in the Cholesky factor of a 2 by 2 symmetric matrix. |
Does anyone have access to a Windows computer to check there computations with OpenBLAS and with MKL? I don't have anything machines Windows. I suppose we could put something like this into the CI runs. |
I don't think the difference in the Cholesky matters too much here. For the same input, I get very similar results for OpenBLAS and MKL. With julia> repr(m.A[6] - m.L[5]*m.L[5]')
"[5.7599999999999625 132.31999999999925; 132.31999999999925 3198.3199999999924]" I'm getting OpenBLASjulia> cholesky([5.7599999999999625 132.31999999999925; 132.31999999999925 3198.3199999999924]).L[2,2]
12.595060760296588 MKLjulia> cholesky([5.7599999999999625 132.31999999999925; 132.31999999999925 3198.3199999999924]).L[2,2]
12.59506076029659 which are only one ULP apart. So I think the difference between the two Choleskys is because of the inputs. With OpenBLAS, there is one ULP error in julia> print(m.L[5])
[4.800000000000001 4.800000000000001 4.800000000000001 4.800000000000001 4.800000000000001 4.800000000000001; 120.80000000000001 105.4 119.60000000000001 109.80000000000001 110.2 95.80000000000001] so I have julia> m.A[5]/5 == m.L[5]
false but Just speculating, but maybe the downdate Update:Could be that OpenBLAS is multiplying by the reciprocal instead of dividing julia> (m.A[5]*(1/5))[2,6] == m.L[5][2,6]
true
julia> (m.A[5]/5)[2,6] == m.L[5][2,6]
false |
That's interesting. The calculation of |
I would like to follow up on this thread but I am having difficulty thinking of what to do next. Suggestions welcome. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #840 +/- ##
==========================================
- Coverage 96.27% 95.62% -0.66%
==========================================
Files 38 38
Lines 3654 3682 +28
==========================================
+ Hits 3518 3521 +3
- Misses 136 161 +25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I am beginning to see the problem with the style enforcer. For some reason JuliaFormatter on my computer changes successive ':' arguments by removing spaces around them (i.e. |
I think part of the problem is the difference between v1 and v2 of JuliaFormatter -- I had similar problems at some point when I was running a local version different from the version in the action. |
I just noticed that the forwarddiff tests take 15-20 seconds on my laptop whereas the finitediff tests take about half a second. There is one more test in forwarddiff but it is evaluating a 3 by 3 Hessian. |
In addition to removing the lowerbd field of optsum I was thinking of changing the structure of fitlog to be a matrix table (in the sense of |
I'm onboard with these changes, but please go piecemeal. 😄 |
osj = optsum | ||
opt = Opt(osj) | ||
pmj = m.parmap[j] | ||
lbj = pmj[2] == pmj[3] ? zero(T) : T(-Inf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pattern has come up enough, I wonder if it makes sense to define a method lowerbd(parmapj)
[noblock]
Sorry for the silence here. I was busy with vacation. I just tried this out with the test cases that we have and this branch seems to avoid any of the issues that we'd hit in the past, i.e. both #705 and #833. No regressions appeared either so I'm in favor of moving forward. Regarding #840 (comment) then I don't think it has any immediate implications for the transition to unconstrained optimization. However, it might be worth analyzing if the rank downgrade introduces numerical issues under some conditions and if they can be avoided. Such issues would be unrelated to the optimization approach, but they might explain the required changes to the test values here. |
I added a julia> timingtable()
Table with 6 columns and 16 rows:
modnum bkend optmzr feval objtiv time
┌─────────────────────────────────────────────────────
1 │ 1 nlopt LN_NEWUOA 81 2.37722e5 0.424785
2 │ 1 nlopt LN_BOBYQA 107 2.37722e5 0.560039
3 │ 1 prima newuoa 92 2.37722e5 0.483105
4 │ 1 prima bobyqa 104 2.37722e5 0.546217
5 │ 2 nlopt LN_NEWUOA 50 2.37586e5 0.249036
6 │ 2 nlopt LN_BOBYQA 46 2.37586e5 0.230194
7 │ 2 prima newuoa 47 2.37586e5 0.235359
8 │ 2 prima bobyqa 46 2.37586e5 0.230809
9 │ 3 nlopt LN_NEWUOA 196 2.37649e5 1.13438
10 │ 3 nlopt LN_BOBYQA 170 2.37649e5 0.985796
11 │ 3 prima newuoa 203 2.37649e5 1.1752
12 │ 3 prima bobyqa 173 2.37649e5 1.00095
13 │ 4 nlopt LN_NEWUOA 181 2.37647e5 1.05708
14 │ 4 nlopt LN_BOBYQA 490 2.37647e5 2.85961
15 │ 4 prima newuoa 211 2.37647e5 1.23718
16 │ 4 prima bobyqa 293 2.37647e5 1.71672 allowing us to see which are the faster and slower (mostly determined by the number of function evaluations but not entirely - Shall I add a column for the number of parameters in the model? |
How about each optimizer's final objective value and that value's deviation from the best value across optimizers?
Yes. |
Finally this version is green on the tests except for the nightly version of Julia and that failure is due to compilation failures in other packages. I will hold off on declaring this to be v5.0.0 until further checking and enhancements take place. I think we should remove the |
Yep, you can do the |
@palday Woud you be willing to do the honors and set the new version number, adjust the NEWS.md file, etc., and make a new release? You are much more familiar with the details of doing that than am I. |
@dmbates There are a few small things that I want to do in a follow-up PR before making the breaking release, but I've added an appropriate NEWS entry and marked the version as |
-Inf
rectify!
step after the optimization to flip the signs on columns of \lambda where the diagonal element converges to a negative value.test/prima.jl
is giving errors so I disabled it. I'm too tired to work out what is happening there.