Optimization methods

Jorge Cabral

Optimization

In the “Generalized Maximum Entropy framework” vignette a solution for the constrained optimization problem was proposed to compute \(\widehat{\boldsymbol{\beta}}^{GME}\). However, a more efficient way is to solve the unconstrained dual formulation of the problem, which is a function of \(\hat {\boldsymbol\lambda}\). The resulting function, \(\mathcal{M}(\boldsymbol{\lambda})\), is referred to as the minimal value function [1].
Let
\[\begin{equation} \Omega_k \left( \widehat{\boldsymbol\lambda}\right) = \sum_{m=1}^M exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk}) \end{equation}\] and \[\begin{equation} \Psi_n \left( \widehat{\boldsymbol\lambda}\right) = \sum_{j=1}^J exp(-\hat\lambda_n v_{n}). \end{equation}\]

The primal–dual relationship is

\[\begin{equation} \underset{\mathbf{p},\mathbf{w}}{\operatorname{argmax}} \left\{-\mathbf{p}' \ln \mathbf{p} - \mathbf{w}' \ln \mathbf{w} \right\} = \underset{\boldsymbol{\lambda}}{\operatorname{argmin}} \left\{ \mathbf{y}'\boldsymbol{\lambda} - \sum_{k=1}^{K+1}log\left( \Omega_k(\boldsymbol{\lambda})\right) - \sum_{n=1}^{N}log\left( \Psi_n(\boldsymbol{\lambda})\right)\right\} \equiv \mathcal{M}(\boldsymbol{\lambda}). \end{equation}\]

\(\mathcal{M}(\boldsymbol{\lambda})\), may be interpreted as a constrained expected log-likelihood function. Estimation is considerably simplified using this dual version. The analytical gradient of the dual problem is

\[\begin{equation} \nabla_{\boldsymbol{\lambda}}\mathcal{M} = \mathbf{y}-\mathbf{XZp} - \mathbf{Vw} \end{equation}\]

\(\boldsymbol{\hat\lambda}\) is then used to solve

\[\begin{equation} \hat p_{km} = \frac{exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})}{\Omega_k(\boldsymbol{\lambda})} \end{equation}\] and \[\begin{equation} \hat w_{nj} = \frac{exp(-\hat\lambda_n v_{n})}{\Psi_n(\boldsymbol{\lambda})}. \end{equation}\]

\(\widehat{\boldsymbol{\beta}}^{GME}\) and \(\boldsymbol{\hat \epsilon}\) are given by

\[\begin{equation} \widehat{\boldsymbol{\beta}}^{GME}=\mathbf{Z\hat p} \end{equation}\] and \[\begin{equation} \boldsymbol{\hat \epsilon}=\mathbf{V\hat w}. \end{equation}\]

Examples

Different numerical optimization options will be explained using dataGCE (see “Generalized Maximum Entropy framework”).

library(GCEstim)

Assume that \(z_k^{upper}=50\), \(k\in\left\lbrace 0,\dots,6\right\rbrace\) is a “wide upper bound” for the signal support space and set \(M=5\) and \(J=3\).
The primal version of the optimization problem previously defined can be solved by the optimization methods:

res.lmgce.50.primal.solnp <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = c(-50, 50),
    twosteps.n = 0,
    method = "primal.solnp"
  )

res.lmgce.50.primal.solnl <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = c(-50, 50),
    twosteps.n = 0,
    method = "primal.solnl"
  )

Both methods converged which can be confirmed by the \(0\) returned by

res.lmgce.50.primal.solnp$convergence
#> [1] 0
res.lmgce.50.primal.solnl$convergence
#> [1] 0

The dual version of the optimization problem can be solved by the optimization methods:

res.lmgce.50.dual.BFGS <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = c(-50, 50),
    twosteps.n = 0,
    method = "dual.BFGS"
  )

res.lmgce.50.dual.CG <-
  GCEstim::lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = c(-50, 50),
    twosteps.n = 0,
    method = "dual.CG"
  )

Any error or warning produced by the optimization procedure results in outputting \(1\). Note that dual.BFGS converged while dual.CG did not. Conjugate gradient methods will generally be more fragile but may be successful in much larger optimization problems.

res.lmgce.50.dual.BFGS$convergence
#> [1] 0
res.lmgce.50.dual.CG$convergence
#> [1] 1

The values of \(\boldsymbol{\hat\lambda}\), \(\hat {\mathbf{p}}\) and \(\hat {\mathbf{w}}\) for dual.BFGS are, respectively,

res.lmgce.50.dual.BFGS$lambda
#>   [1] -0.0054315581 -0.0362851075 -0.0385981482  0.0566161704  0.0138698331
#>   [6] -0.0024651553 -0.0227793090 -0.0023567234 -0.0187668165  0.0205529920
#>  [11]  0.0189754236  0.0350418904  0.0337957319 -0.0092406559 -0.0217101748
#>  [16] -0.0380872670  0.0233102524  0.0106105899 -0.0450483760 -0.0023395159
#>  [21]  0.0198291478  0.0275226658 -0.0629130452 -0.0001487608  0.0125237687
#>  [26] -0.0328692845 -0.0732469503 -0.0191117768  0.0336050468  0.0562760350
#>  [31]  0.0218128341  0.0231869574  0.0753808406  0.0281004090  0.0402667764
#>  [36] -0.0231789955  0.0462642119  0.0148705520 -0.0002027890 -0.0038059321
#>  [41] -0.0544474657  0.0190416427 -0.0256211058  0.0044666854  0.0068425455
#>  [46] -0.0116029419  0.0503567564  0.0075374883 -0.0374695950 -0.0167435100
#>  [51]  0.0026509575  0.0025613560 -0.0260627773  0.0593285252  0.0336042544
#>  [56]  0.0366290162 -0.0255469000  0.0502924605 -0.0155720536 -0.0143055455
#>  [61]  0.0350497334  0.0061215759 -0.0734237406 -0.0677698468 -0.0422326454
#>  [66] -0.0352340691 -0.0018549268 -0.0410253228 -0.0167136632  0.0272622276
#>  [71]  0.0373657555  0.0137718621 -0.0601873982 -0.1033502240  0.0500338667
#>  [76]  0.0342158472 -0.0184214830  0.0803730211 -0.0274860122 -0.0016426092
#>  [81] -0.0037500133  0.0228453331 -0.0113385292 -0.0249442628 -0.0122691137
#>  [86]  0.0286318993 -0.0391985721  0.0292200582 -0.0147311748 -0.0174859191
#>  [91]  0.0601325584  0.0338395555 -0.0208172623 -0.0356295971  0.0043824510
#>  [96] -0.0199980639  0.0574606377  0.0030525401 -0.0212540400 -0.0123546793
res.lmgce.50.dual.BFGS$p
#>                   p_1       p_2       p_3       p_4       p_5
#> (Intercept) 0.1918591 0.1958458 0.1999154 0.2040696 0.2083101
#> X001        0.1986565 0.1993260 0.1999977 0.2006718 0.2013481
#> X002        0.2150038 0.2072256 0.1997287 0.1925031 0.1855389
#> X003        0.1897860 0.1947609 0.1998662 0.2051053 0.2104817
#> X004        0.1706791 0.1842217 0.1988389 0.2146158 0.2316445
#> X005        0.1473696 0.1699513 0.1959933 0.2260257 0.2606601
res.lmgce.50.dual.BFGS$w
#>           p_1       p_2       p_3
#> 1   0.3442535 0.3332154 0.3225311
#> 2   0.4079385 0.3281286 0.2639329
#> 3   0.4127877 0.3274530 0.2597593
#> 4   0.2284563 0.3208720 0.4506717
#> 5   0.3060097 0.3325652 0.3614251
#> 6   0.3382756 0.3333090 0.3284154
#> 7   0.3797836 0.3312674 0.2889489
#> 8   0.3380577 0.3333111 0.3286311
#> 9   0.3714900 0.3319290 0.2965810
#> 10  0.2931728 0.3316500 0.3751771
#> 11  0.2961820 0.3318977 0.3719202
#> 12  0.2661900 0.3284753 0.4053347
#> 13  0.2684620 0.3288112 0.4027267
#> 14  0.3519758 0.3329920 0.3150321
#> 15  0.3775700 0.3314560 0.2909740
#> 16  0.4117161 0.3276056 0.2606783
#> 17  0.2879456 0.3311704 0.3808839
#> 18  0.3123514 0.3328835 0.3547651
#> 19  0.4263360 0.3253617 0.2483024
#> 20  0.3380232 0.3333114 0.3286654
#> 21  0.2945519 0.3317661 0.3736820
#> 22  0.2800414 0.3303239 0.3896347
#> 23  0.4639038 0.3180469 0.2180492
#> 24  0.3336309 0.3333332 0.3330359
#> 25  0.3086226 0.3327068 0.3586706
#> 26  0.4007893 0.3290534 0.2701573
#> 27  0.4855351 0.3128644 0.2016005
#> 28  0.3722015 0.3318771 0.2959214
#> 29  0.2688105 0.3288616 0.4023279
#> 30  0.2290266 0.3210171 0.4499563
#> 31  0.2907793 0.3314382 0.3777825
#> 32  0.2881785 0.3311931 0.3806284
#> 33  0.1983037 0.3117136 0.4899826
#> 34  0.2789653 0.3301971 0.3908376
#> 35  0.2567696 0.3269411 0.4162893
#> 36  0.3806119 0.3311946 0.2881935
#> 37  0.2461734 0.3249338 0.4288927
#> 38  0.3040730 0.3324506 0.3634764
#> 39  0.3337390 0.3333332 0.3329278
#> 40  0.3409735 0.3332754 0.3257511
#> 41  0.4461098 0.3217839 0.2321063
#> 42  0.2960555 0.3318877 0.3720568
#> 43  0.3856799 0.3307230 0.2835971
#> 44  0.3244409 0.3332535 0.3423055
#> 45  0.3197457 0.3331461 0.3471082
#> 46  0.3567894 0.3327955 0.3104151
#> 47  0.2390807 0.3234168 0.4375025
#> 48  0.3183771 0.3331062 0.3485167
#> 49  0.4104210 0.3277876 0.2617915
#> 50  0.3673234 0.3322148 0.3004618
#> 51  0.3280457 0.3333052 0.3386491
#> 52  0.3282239 0.3333071 0.3384690
#> 53  0.3865978 0.3306328 0.2827694
#> 54  0.2239385 0.3196870 0.4563745
#> 55  0.2688120 0.3288618 0.4023262
#> 56  0.2633103 0.3280307 0.4086591
#> 57  0.3855257 0.3307380 0.2837363
#> 58  0.2391913 0.3234415 0.4373672
#> 59  0.3649161 0.3323655 0.3027184
#> 60  0.3623179 0.3325162 0.3051659
#> 61  0.2661757 0.3284732 0.4053511
#> 62  0.3211679 0.3331835 0.3456487
#> 63  0.4859039 0.3127701 0.2013261
#> 64  0.4740870 0.3156935 0.2102196
#> 65  0.4204179 0.3263118 0.2532703
#> 66  0.4057370 0.3284225 0.2658405
#> 67  0.3370500 0.3333196 0.3296304
#> 68  0.4178820 0.3267016 0.2554164
#> 69  0.3672621 0.3322188 0.3005192
#> 70  0.2805272 0.3303802 0.3890926
#> 71  0.2619788 0.3278179 0.4102033
#> 72  0.3061996 0.3325760 0.3612244
#> 73  0.4581794 0.3193017 0.2225189
#> 74  0.5472808 0.2943768 0.1583423
#> 75  0.2396362 0.3235407 0.4368231
#> 76  0.2676950 0.3286993 0.4036057
#> 77  0.3707781 0.3319801 0.2972418
#> 78  0.1907265 0.3089189 0.5003545
#> 79  0.3895584 0.3303318 0.2801098
#> 80  0.3366239 0.3333225 0.3300536
#> 81  0.3408608 0.3332771 0.3258621
#> 82  0.2888241 0.3312555 0.3799204
#> 83  0.3562497 0.3328197 0.3109306
#> 84  0.3842740 0.3308583 0.2848677
#> 85  0.3581501 0.3327320 0.3091179
#> 86  0.2779770 0.3300782 0.3919448
#> 87  0.4140474 0.3272711 0.2586815
#> 88  0.2768852 0.3299441 0.3931706
#> 89  0.3631906 0.3324670 0.3043424
#> 90  0.3688510 0.3321137 0.2990353
#> 91  0.2226094 0.3193264 0.4580642
#> 92  0.2683820 0.3287996 0.4028184
#> 93  0.3757233 0.3316066 0.2926701
#> 94  0.4065653 0.3283129 0.2651218
#> 95  0.3246078 0.3332565 0.3421356
#> 96  0.3740308 0.3317394 0.2942299
#> 97  0.2270441 0.3205083 0.4524476
#> 98  0.3272472 0.3332961 0.3394567
#> 99  0.3766264 0.3315337 0.2918399
#> 100 0.3583249 0.3327236 0.3089514

The following table shows a comparison of efficiency between all the optimization methods available. Note that in the current version, the default optimization parameters of each dependency package are used.

method.opt <-
  c(
    "primal.solnl",
    "primal.solnp",
    "dual.BFGS",
    "dual.CG",
    "dual.L-BFGS-B",
    "dual.Rcgmin",
    "dual.bobyqa",
    "dual.newuoa",
    "dual.nlminb",
    "dual.nlm",
    "dual.lbfgs",
    "dual.lbfgsb3c"
  )

compare_methods <- data.frame(
  method = method.opt,
  time = NA,
  r.squared = NA,
  error.measure = NA,
  error.measure.cv.mean = NA,
  beta.error = NA,
  nep = NA,
  nep.cv.mean = NA,
  convergence = NA)

for (i in 1:length(method.opt)) {
start.time <- Sys.time()
res.method <- 
  lmgce(
    y ~ .,
    data = dataGCE,
    support.signal = c(-50,50),
    twosteps.n = 0,
    method = method.opt[i]
    )

compare_methods$time[i] <- difftime(Sys.time(),
                                    start.time,
                                    units = "secs")
compare_methods$r.squared[i] <- summary(res.method)$r.squared
compare_methods$error.measure[i] <- res.method$error.measure
compare_methods$error.measure.cv.mean[i] <- res.method$error.measure.cv.mean
compare_methods$beta.error[i] <- accmeasure(coef(res.method), coef.dataGCE)
compare_methods$nep[i] <- res.method$nep
compare_methods$nep.cv.mean [i] <- res.method$nep.cv.mean
compare_methods$convergence[i] <- res.method$convergence
}

compare_methods_ordered <- compare_methods[order(compare_methods$time),]

compare_methods_ordered$convergence <- factor(compare_methods_ordered$convergence,
                                              levels = c(0,1),
                                              labels = c(TRUE, FALSE))
optimization method time (s) \(RMSE_{\widehat{y}}\) \(CV \text{-} RMSE_{\widehat{y}}\) \(RMSE_{\widehat{\beta}}\) Convergence
dual.lbfgsb3c 0.372 0.411 0.441 1.578 TRUE
dual.BFGS 0.385 0.411 0.441 1.575 TRUE
dual.L-BFGS-B 0.539 0.411 0.440 1.586 FALSE
dual.CG 0.603 0.436 0.697 2.948 FALSE
dual.lbfgs 0.627 0.411 0.441 1.575 TRUE
dual.nlminb 2.263 0.411 0.441 1.575 TRUE
dual.nlm 2.562 0.411 0.441 1.575 TRUE
dual.Rcgmin 2.816 0.411 0.441 1.575 TRUE
primal.solnp 4.747 0.411 0.441 1.575 TRUE
primal.solnl 5.503 0.411 0.441 1.575 TRUE
dual.newuoa 56.082 0.411 0.441 1.579 TRUE
dual.bobyqa 67.649 0.638 0.448 2.915 TRUE

References

1.
Golan A, Judge GG, Miller D. Maximum Entropy Econometrics : Robust Estimation with Limited Data. Wiley; 1996.
2.
Ye Y. Interior Algorithms for Linear, Quadratic, and Linearly Constrained Non-Linear Programming. PhD thesis. Department of ESS, Stanford University; 1987.
3.
Ghalanos A, Theussl S. Rsolnp: General Non-Linear Optimization Using Augmented Lagrange Multiplier Method.; 2015.
4.
Chen X, Yin X. NlcOptim: Solve Nonlinear Optimization with Nonlinear Constraints.; 2019. https://CRAN.R-project.org/package=NlcOptim
5.
Broyden CG. The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics. 1970;6:76-90. doi:10.1093/imamat/6.1.76
6.
Fletcher R. A new approach to variable metric algorithms. The Computer Journal. 1970;13:317-322. doi:10.1093/comjnl/13.3.317
7.
Goldfarb D. A family of variable-metric methods derived by variational means. Mathematics of computation. 1970;24:23-26. doi:10.1090/s0025-5718-1970-0258249-6
8.
Shanno DF. Conditioning of quasi-newton methods for function minimization. Mathematics of computation. 1970;24:647-656. doi:10.1090/s0025-5718-1970-0274029-x
9.
Fletcher R, Reeves CM. Function minimization by conjugate gradients. The Computer Journal. 1964;7:149-154. doi:10.1093/comjnl/7.2.149
10.
Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing. 1995;16:1190-1208. doi:10.1137/0916069
11.
Nash JC, Varadhan R. Unifying optimization algorithms to aid software system users: Optimx for r. Journal of Statistical Software. 2011;43:1-14. doi:10.18637/jss.v043.i09
12.
Dai YH, Yuan Y. An efficient hybrid conjugate gradient method for unconstrained optimization. Annals of Operations Research. 2001;103:33-47. doi:10.1023/A:1012930416777
13.
Powell M. The BOBYQA algorithm for bound constrained optimization without derivatives. Technical Report, Department of Applied Mathematics and Theoretical Physics. Published online January 2009.
14.
Bates D, Mullen KM, Nash JC, Varadhan R. Minqa: Derivative-Free Optimization Algorithms by Quadratic Approximation.; 2024. https://CRAN.R-project.org/package=minqa
15.
Gay DM. Usage summary for selected optimization routines. Computing Science Technical Report. 1990;153.
16.
Dennis JE, Schnabel RB. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall; 1983.
17.
Schnabel RB, Koonatz JE, Weiss BE. A modular system of algorithms for unconstrained minimization. ACM transactions on mathematical software Association for Computing Machinery. 1985;11:419-440. doi:10.1145/6187.6192
18.
Coppola A, Stewart B, Okazaki N. Lbfgs: Limited-Memory BFGS Optimization.; 2022. https://CRAN.R-project.org/package=lbfgs
19.
Fidler ML, Nash JC, Zhu C, Byrd R, Nocedal J, Morales JL. Lbfgsb3c: Limited Memory BFGS Minimizer with Bounds on Parameters with Optim() ’c’ Interface.; 2024. https://CRAN.R-project.org/package=lbfgsb3c

Acknowledgements

This work was supported by Fundação para a Ciência e Tecnologia (FCT) through CIDMA and projects https://doi.org/10.54499/UIDB/04106/2020 and https://doi.org/10.54499/UIDP/04106/2020.

mirror server hosted at Truenetwork, Russian Federation.