Skip to content

Conversation

@MaxenceGollier
Copy link
Collaborator

Copy link
Member

@dpo dpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have numerical results?

@MaxenceGollier
Copy link
Collaborator Author

I am trying to make the tests pass, i will show numerical experiments thereafter. It seems that there is a type instability hidden somewhere, i am investigating

@codecov
Copy link

codecov bot commented Aug 25, 2025

Codecov Report

❌ Patch coverage is 86.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.41%. Comparing base (e0f214d) to head (bf1029a).
⚠️ Report is 202 commits behind head on master.

Files with missing lines Patch % Lines
src/R2N.jl 77.77% 2 Missing ⚠️
src/TR_alg.jl 80.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #204       +/-   ##
===========================================
+ Coverage   61.53%   84.41%   +22.88%     
===========================================
  Files          11       13        +2     
  Lines        1292     1630      +338     
===========================================
+ Hits          795     1376      +581     
+ Misses        497      254      -243     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MaxenceGollier
Copy link
Collaborator Author

MaxenceGollier commented Aug 25, 2025

Numerical Experiment

using LinearAlgebra: length
using LinearAlgebra, Random
using ProximalOperators
using NLPModels, NLPModelsModifiers, RegularizedProblems, RegularizedOptimization, SolverCore

Random.seed!(123)   

compound = 1
nz = 10 * compound
bpdn, bpdn_nls, sol = bpdn_model(compound)
λ = 1.0

h = NormL1(λ)
reg_nlp = RegularizedNLPModel(LSR1Model(bpdn), h)
solver = R2NSolver(reg_nlp)
stats = RegularizedExecutionStats(reg_nlp)

x0 = 100*randn(Float64, reg_nlp.model.meta.nvar)
solve!(solver, reg_nlp, stats, x = x0, σk = 1.0, atol = 1e-8, rtol = 1e-8, compute_opnorm = true, verbose = 1)
solve!(solver, reg_nlp, stats, x = x0, σk = 1.0, atol = 1e-8, rtol = 1e-8, compute_opnorm = false, verbose = 1)

yields

[ Info:  outer  inner     f(x)     h(x)  √(ξ1/ν)        ρ        σ      ‖x‖      ‖s‖      ‖B‖ R2N 
[ Info:      0      1  9.3e+05  3.9e+04  1.4e+03  1.5e+00  1.0e+00  2.2e+03  6.9e+02  1.0e+00               ↘
[ Info:      1     11  2.3e+05  3.2e+04  6.8e+02  1.3e+00  3.3e-01  1.8e+03  5.1e+02  1.0e+00               ↘
[ Info:      2     18  1.4e+04  2.9e+04  1.7e+02  1.2e+00  1.1e-01  1.6e+03  2.3e+02  1.0e+00               ↘
[ Info:      3     39  1.9e+02  2.6e+04  2.6e+01  1.7e+00  3.7e-02  1.5e+03  4.1e+02  1.0e+00               ↘
[ Info:      4     50  5.6e+01  1.8e+04  2.0e+01  1.4e+00  1.2e-02  1.1e+03  5.5e+02  1.0e+00               ↘
[ Info:      5     66  4.9e+01  1.1e+04  2.0e+01  1.2e+00  4.1e-03  7.1e+02  4.9e+02  1.0e+00               ↘
[ Info:      6     29  4.2e+01  5.7e+03  2.0e+01  1.7e+00  1.4e-03  3.7e+02  6.0e+01  1.0e+00               ↘
[ Info:      7     37  4.2e+01  5.3e+03  1.9e+01  1.1e+00  4.6e-04  3.4e+02  2.3e+02  1.0e+00               ↘
[ Info:      8     31  3.7e+01  2.8e+03  1.9e+01  1.2e+00  1.5e-04  1.8e+02  1.3e+02  1.0e+00               ↘
[ Info:      9     21  3.3e+01  1.4e+03  1.8e+01  1.8e+00  5.1e-05  9.7e+01  2.4e+01  1.0e+00               ↘
[ Info:     10     20  3.2e+01  1.0e+03  1.6e+01  1.3e+00  1.7e-05  7.9e+01  5.3e+01  1.0e+00               ↘
[ Info:     11     11  2.5e+01  3.6e+02  1.4e+01  1.5e+00  5.6e-06  3.5e+01  2.5e+01  1.0e+00               ↘
[ Info:     12      5  1.2e+01  9.3e+01  8.9e+00  1.3e+00  1.9e-06  1.4e+01  1.2e+01  1.0e+00               ↘
[ Info:     13      0  2.5e+00  3.7e+00  2.2e+00  1.1e+00  6.3e-07  1.8e+00  1.8e+00  1.0e+00               ↘
[ Info:     14      0  2.1e+00  0.0e+00  4.7e-08  1.1e+00  2.1e-07  0.0e+00  1.8e+00  1.0e+00               ↘
[ Info: R2N: terminating with √(ξ1/ν) = 4.713904811330292e-8
[ Info:  outer  inner     f(x)     h(x)  √(ξ1/ν)        ρ        σ      ‖x‖      ‖s‖      ‖B‖ R2N
[ Info:      0     12  9.3e+05  3.9e+04  1.4e+03  1.5e+00  1.0e+00  2.2e+03  6.9e+02  9.9e-01               ↘
[ Info:      1     11  2.3e+05  3.2e+04  6.8e+02  1.3e+00  3.3e-01  1.8e+03  5.1e+02  9.9e-01               ↘
[ Info:      2     18  1.4e+04  2.9e+04  1.7e+02  1.2e+00  1.1e-01  1.6e+03  2.2e+02  9.9e-01               ↘
[ Info:      3     43  1.9e+02  2.6e+04  2.6e+01  1.8e+00  3.7e-02  1.5e+03  4.2e+02  9.9e-01               ↘
[ Info:      4     64  5.1e+01  1.9e+04  2.0e+01  1.4e+00  1.2e-02  1.1e+03  5.1e+02  9.9e-01               ↘
[ Info:      5     75  3.7e+01  1.2e+04  2.0e+01  1.2e+00  4.1e-03  7.5e+02  4.8e+02  9.9e-01               ↘
[ Info:      6     41  4.5e+01  8.9e+03  2.0e+01  1.5e+00  1.4e-03  5.3e+02  1.2e+02  9.9e-01               ↘
[ Info:      7     39  5.6e+01  8.2e+03  1.9e+01  1.4e+00  4.6e-04  5.2e+02  1.2e+02  9.9e-01               ↘
[ Info:      8     38  6.3e+01  7.5e+03  1.9e+01  1.1e+00  1.5e-04  5.0e+02  2.2e+02  9.9e-01               ↘
[ Info:      9     40  5.2e+01  5.8e+03  1.9e+01  1.3e+00  5.1e-05  3.9e+02  1.1e+02  9.9e-01               ↘
[ Info:     10     32  5.6e+01  5.0e+03  1.9e+01  1.7e+00  1.7e-05  3.4e+02  5.1e+01  9.9e-01               ↘
[ Info:     11     37  5.5e+01  4.5e+03  1.9e+01  1.1e+00  5.6e-06  3.2e+02  2.1e+02  9.9e-01               ↘
[ Info:     12     20  3.4e+01  2.7e+03  1.9e+01  1.7e+00  1.9e-06  1.8e+02  4.4e+01  9.9e-01               ↘
[ Info:     13     24  3.9e+01  2.2e+03  1.8e+01  1.3e+00  6.3e-07  1.6e+02  7.8e+01  9.9e-01               ↘
[ Info:     14     16  4.8e+01  1.4e+03  1.6e+01  1.6e+00  2.1e-07  1.1e+02  3.8e+01  9.9e-01               ↘
[ Info:     15     14  4.7e+01  1.0e+03  1.5e+01  1.4e+00  7.0e-08  8.8e+01  4.7e+01  9.9e-01               ↘
[ Info:     16     19  3.7e+01  5.6e+02  1.4e+01  1.4e+00  2.3e-08  5.2e+01  3.8e+01  9.9e-01               ↘
[ Info:     17      6  1.5e+01  1.6e+02  1.1e+01  1.4e+00  7.7e-09  1.9e+01  1.5e+01  9.9e-01               ↘
[ Info:     18      3  5.8e+00  2.2e+01  4.9e+00  1.3e+00  2.6e-09  6.3e+00  5.4e+00  9.9e-01               ↘
[ Info:     19      0  2.2e+00  1.4e+00  1.3e+00  1.2e+00  8.6e-10  1.3e+00  1.3e+00  9.9e-01               ↘
[ Info:     20      0  2.1e+00  0.0e+00  4.7e-08  1.2e+00  2.9e-10  0.0e+00  1.3e+00  9.9e-01               ↘
[ Info: R2N: terminating with √(ξ1/ν) = 4.686513340103464e-8

so that computing the exact norm results in approximately two times less iterations in this case.
On the other hand, it seems that it is not that bad on average, here I run the same experiment 1000 times, computing both the total number of iterations and the execution time for each setting :

using LinearAlgebra: length
using LinearAlgebra, Random
using ProximalOperators
using NLPModels, NLPModelsModifiers, RegularizedProblems, RegularizedOptimization, SolverCore

Random.seed!(123)   

compound = 1
nz = 10 * compound
bpdn, bpdn_nls, sol = bpdn_model(compound)
λ = 1.0

h = NormL1(λ)
reg_nlp = RegularizedNLPModel(LSR1Model(bpdn), h)
solver = R2NSolver(reg_nlp)
stats = RegularizedExecutionStats(reg_nlp)

avg_num_iter_compute_true = 0
avg_num_iter_compute_false = 0
avg_time_compute_true = 0
avg_time_compute_false = 0

callback = (nlp, solver, stats) -> begin
  if stats.iter == 0
    set_solver_specific!(stats, :total_iter, 0)
  else
    set_solver_specific!(stats, :total_iter, stats.solver_specific[:total_iter] + solver.substats.iter)
  end
end

Nrun = 1000
for i = 1:Nrun
    local x0 = 100 * randn(Float64, reg_nlp.model.meta.nvar)

    solve!(solver, reg_nlp, stats; x = x0, σk = 1.0, atol = 1e-8, rtol = 1e-8,
           compute_opnorm = true, callback = callback)
    global avg_num_iter_compute_true += stats.solver_specific[:total_iter]
    global avg_time_compute_true += stats.elapsed_time

    solve!(solver, reg_nlp, stats; x = x0, σk = 1.0, atol = 1e-8, rtol = 1e-8,
           compute_opnorm = false, callback = callback)
    global avg_num_iter_compute_false += stats.solver_specific[:total_iter]
    global avg_time_compute_false += stats.elapsed_time
end

println("Average number of iterations with compute_opnorm = true : $(avg_num_iter_compute_true / Nrun)")
println("Average computation time with compute_opnorm = true : $(avg_time_compute_true / Nrun)")
println("")
println("Average number of iterations with compute_opnorm = false : $(avg_num_iter_compute_false / Nrun)")
println("Average computation time with compute_opnorm = false : $(avg_time_compute_false / Nrun)")

which returns

Average number of iterations with compute_opnorm = true : 575.015
Average computation time with compute_opnorm = true : 0.023134000301361084

Average number of iterations with compute_opnorm = false : 593.003
Average computation time with compute_opnorm = false : 0.013637000322341919

so that we are two times faster on average.

@MaxenceGollier
Copy link
Collaborator Author

I ran the same experiment with the L0 norm and the algorithm then fails sometimes if compute_opnorm = false. I am guessing that it is due to the fact that this regularization is not convex.
Perhaps I should set the default value of compute_opnorm to true and let the user know in the doc.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

Thank you @MaxenceGollier for the experiments, I was afraid that the L0 norm will fail because of its instability and we ought to have a good approximation of Bk in order to ensure sufficient decrease, otherwise we might get a negative $$\xi_{cp}$$ which will make the algorithm fail.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

@dpo @MaxenceGollier
Instead of the $$l_0$$ norm, we could consider alternative surrogates such as the Capped $$l_1$$. We already have an implementation for the matrix case, and I can readily extend it to the vector case.

@dpo
Copy link
Member

dpo commented Aug 26, 2025

I think this needs to be investigated. Convexity should have nothing to do with it. The descent lemma holds for any lsc h.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

MohamedLaghdafHABIBOULLAH commented Aug 26, 2025

I agree, I'll look into it in more detail. I think we should consider an easily trackable example rather than BPDN, since it is random and not straightforward to reproduce.

Any ideas, @dpo? Perhaps a nontrivial one-dimensional function, like Rosenbrock, where we add the $$l_0$$ norm as a regularizer?

@dpo
Copy link
Member

dpo commented Sep 11, 2025

It would be good to try computing several Rayleigh quotients here.

@MaxenceGollier MaxenceGollier changed the title Add option to approximate the operator norm with a single rayliegh quotient in R2N Add option to approximate the operator norm with rayliegh quotients Sep 15, 2025
@MaxenceGollier
Copy link
Collaborator Author

It would be good to try computing several Rayleigh quotients here.

Done @dpo.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

@MaxenceGollier Could you please rebase this branch? I think it would be very useful, especially for the reproducibility of the results.

@MaxenceGollier
Copy link
Collaborator Author

MaxenceGollier commented Sep 18, 2025

Interesting, the R2N test with LBFGS and L0 fails here : @MohamedLaghdafHABIBOULLAH is it expected behaviour ?
I could add safeguards to prevent this from happening, something like

while \xi < 0
  compute a rayleigh quotient to improve the estimate on $$\lambda_{\max}$$
end

What do you think ? @dpo. I might just increase the default number of iterations by default.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an option to approximate the operator norm using the power method with Rayleigh quotients instead of relying solely on Arpack. The implementation provides a configurable parameter opnorm_maxiter that controls how many power method iterations to use, with negative values falling back to the original Arpack-based approach.

Key changes:

  • Added a new power_method! function that implements the power method algorithm for operator norm approximation
  • Modified TR and R2N solvers to support the new opnorm_maxiter parameter and use power method when appropriate
  • Updated test allocation checks to include R2N solver and removed TR solver from the test suite

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/utils.jl Adds the power_method! function implementation
src/TR_alg.jl Integrates power method option into TR solver with new parameter and vector allocation
src/R2N.jl Integrates power method option into R2N solver with new parameter and vector allocation
test/test_allocs.jl Updates allocation tests to include R2N and remove TR solver

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

Interesting, the R2N test with LBFGS and L0 fails here : @MohamedLaghdafHABIBOULLAH is it expected behaviour ? I could add safeguards to prevent this from happening, something like

while \xi < 0
  compute a rayleigh quotient to improve the estimate on $$\lambda_{\max}$$
end

What do you think ? @dpo. I might just increase the default number of iterations by default.

In general, if $$\xi_{cp}$$ is small and negative, we should terminate.

@dpo
Copy link
Member

dpo commented Sep 21, 2025

I see

   R2N: failed to compute a step: ξ = -3.6563955752022537e-9

in the log. I guess we didn't stop because of $\nu_k$. Can you reproduce this locally? It could be an indication that the approximation of $\Vert B_k \Vert$ (and therefore, the value of $\nu_k$) is not right in the sense that $\nu_k$ is not small enough to produce sufficient decrease. Or it could be rounding errors. To figure it out, you have to step through the evaluation of the prox.

@MaxenceGollier
Copy link
Collaborator Author

No I can not reproduce... We should add a seed in the tests...

@MaxenceGollier MaxenceGollier requested a review from dpo September 23, 2025 21:52
@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

@MaxenceGollier @dpo should we move ahead with merging? This PR is a key step to guarantee reproducible and trustworthy results.

@MaxenceGollier
Copy link
Collaborator Author

In my opinion we can merge @dpo

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

It would be beneficial to do the same for LM and align it with R2N and TR issue

@MaxenceGollier MaxenceGollier merged commit aa0778c into JuliaSmoothOptimizers:master Sep 29, 2025
13 of 14 checks passed
@MaxenceGollier MaxenceGollier deleted the rayleigh branch September 29, 2025 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants