What is the Levenberg–Marquardt Algorithm?
Both the Gradient Descent and Gauss-Newton methods are iterative algorithms, which means they use a series of calculations (based on guesses for x-values) to find a solution. The gradient descent differs in that at each iteration, the solution updates by choosing values that make the function value smaller. More specifically, the sum of the squared errors is reduced by moving toward the direction of steepest descent. At each iteration, the Levenberg–Marquardt Algorithm chooses either the gradient descent or GN and updates the solution.
The iterative update is dependent on the value of an algorithmic parameter, λ— a non-negative damping factor which smooths out the graph. The update is Gauss-Newton if λ is small (i.e. close to the optimal value) and a gradient descent if λ is large (Gavin, 2007). The Gauss-Newton is more accurate and faster than the gradient descent when close to the minimum error. Therefore, so the algorithm will migrate towards the GN algorithm as soon as possible.
Advantages and Disadvantages
- As there are two possible options for the algorithm’s direction at each iteration, the LM is more robust than the Gauss-Newton (Nelles, 2001).
- It’s faster to converge than either the GN or gradient descent on its own.
- It can handle models with multiple free parameters— which aren’t precisely known (note that for very large sets, the algorithm can be slow).
- If your initial guess is far from the mark, the algorithm can still find an optimal solution.
- For flat functions (i.e. smooth functions with derivatives that vanish at a certain point), the algorithm can get “…lost in parameter space” (Transtrum & Sethna, 2012).
- In some cases, the LM can be very slow to converge. This is particularly true if the model has more than ten parameters (Waterfall et. al, 2006; Gutenkunst et.al, 2007) which requires the algorithm to inch slowly along a narrowly defined crawl space.
Gavin (2007). The Levenberg-Marquardt method for
nonlinear least squares curve-fitting problems. Retrieved January 22, 2017 from: http://people.duke.edu/~hpgavin/ce281/lm.pdf
Gutenkunst, R. et al. (2007). Universally sloppy parameter sensitivities in systems biology models. PLoS Comput Biol 3: 1871–1878. e189.
Nelles, O. (2001) Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models. Retrieved January 24, 2001 from: https://books.google.com/books?id=7qHDgwMRqM4C
Marquardt, D., “An Algorithm for Least-Squares Estimation of Nonlinear Parameters,” SIAM Journal on Applied Mathematics, Vol. 11, No. 2, June 1963, pp. 431–441.
Ng, A. (2017). CS22 Lecture Notes. Retrieved January 22, 2018 from: http://cs229.stanford.edu/notes/cs229-notes1.pdf
Transtrum, M. & Sethna, J. (2012). Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization. Retrieved January 25, 2018 from: https://arxiv.org/pdf/1201.5885.pdf
Waterfall, J. et al. (2006). Sloppy-Model Universality Class and the Vandermonde Matrix. Physical Review Letters 97 150601.