Documentation

# trainlm

Levenberg-Marquardt backpropagation

## Syntax

```net.trainFcn = 'trainlm' [net,tr] = train(net,...) ```

## Description

`trainlm` is a network training function that updates weight and bias values according to Levenberg-Marquardt optimization.

`trainlm` is often the fastest backpropagation algorithm in the toolbox, and is highly recommended as a first-choice supervised algorithm, although it does require more memory than other algorithms.

`net.trainFcn = 'trainlm'` sets the network `trainFcn` property.

`[net,tr] = train(net,...)` trains the network with `trainlm`.

Training occurs according to `trainlm` training parameters, shown here with their default values:

 `net.trainParam.epochs` `1000` Maximum number of epochs to train `net.trainParam.goal` `0` Performance goal `net.trainParam.max_fail` `6` Maximum validation failures `net.trainParam.min_grad` `1e-7` Minimum performance gradient `net.trainParam.mu` `0.001` Initial `mu` `net.trainParam.mu_dec` `0.1` `mu` decrease factor `net.trainParam.mu_inc` `10` `mu` increase factor `net.trainParam.mu_max` `1e10` Maximum `mu` `net.trainParam.show` `25` Epochs between displays (`NaN` for no displays) `net.trainParam.showCommandLine` `false` Generate command-line output `net.trainParam.showWindow` `true` Show training GUI `net.trainParam.time` `inf` Maximum time to train in seconds

Validation vectors are used to stop training early if the network performance on the validation vectors fails to improve or remains the same for `max_fail` epochs in a row. Test vectors are used as a further check that the network is generalizing well, but do not have any effect on training.

## Network Use

You can create a standard network that uses `trainlm` with `feedforwardnet` or `cascadeforwardnet`.

To prepare a custom network to be trained with `trainlm`,

1. Set `net.trainFcn` to `'trainlm'`. This sets `net.trainParam` to `trainlm`’s default parameters.

2. Set `net.trainParam` properties to desired values.

In either case, calling `train` with the resulting network trains the network with `trainlm`.

See `help feedforwardnet` and `help cascadeforwardnet` for examples.

## Examples

collapse all

This example shows how to train a neural network using the `trainlm` train function.

Here a neural network is trained to predict body fat percentages.

```[x, t] = bodyfat_dataset; net = feedforwardnet(10, 'trainlm'); net = train(net, x, t); y = net(x);```

## Limitations

This function uses the Jacobian for calculations, which assumes that performance is a mean or sum of squared errors. Therefore, networks trained with this function must use either the `mse` or `sse` performance function.

## More About

collapse all

### Levenberg-Marquardt Algorithm

Like the quasi-Newton methods, the Levenberg-Marquardt algorithm was designed to approach second-order training speed without having to compute the Hessian matrix. When the performance function has the form of a sum of squares (as is typical in training feedforward networks), then the Hessian matrix can be approximated as

H = JTJ

and the gradient can be computed as

g = JTe

where J is the Jacobian matrix that contains first derivatives of the network errors with respect to the weights and biases, and e is a vector of network errors. The Jacobian matrix can be computed through a standard backpropagation technique (see [HaMe94]) that is much less complex than computing the Hessian matrix.

The Levenberg-Marquardt algorithm uses this approximation to the Hessian matrix in the following Newton-like update:

`${x}_{k+1}={x}_{k}-{\left[{J}^{T}J+\mu I\right]}^{-1}{J}^{T}e$`

When the scalar µ is zero, this is just Newton’s method, using the approximate Hessian matrix. When µ is large, this becomes gradient descent with a small step size. Newton’s method is faster and more accurate near an error minimum, so the aim is to shift toward Newton’s method as quickly as possible. Thus, µ is decreased after each successful step (reduction in performance function) and is increased only when a tentative step would increase the performance function. In this way, the performance function is always reduced at each iteration of the algorithm.

The original description of the Levenberg-Marquardt algorithm is given in [Marq63]. The application of Levenberg-Marquardt to neural network training is described in [HaMe94] and starting on page 12-19 of [HDB96]. This algorithm appears to be the fastest method for training moderate-sized feedforward neural networks (up to several hundred weights). It also has an efficient implementation in MATLAB® software, because the solution of the matrix equation is a built-in function, so its attributes become even more pronounced in a MATLAB environment.

Try the Neural Network Design demonstration `nnd12m` [HDB96] for an illustration of the performance of the batch Levenberg-Marquardt algorithm.

## Algorithms

`trainlm` supports training with validation and test vectors if the network’s `NET.divideFcn` property is set to a data division function. Validation vectors are used to stop training early if the network performance on the validation vectors fails to improve or remains the same for `max_fail` epochs in a row. Test vectors are used as a further check that the network is generalizing well, but do not have any effect on training.

`trainlm` can train any network as long as its weight, net input, and transfer functions have derivative functions.

Backpropagation is used to calculate the Jacobian `jX` of performance `perf` with respect to the weight and bias variables `X`. Each variable is adjusted according to Levenberg-Marquardt,

```jj = jX * jX je = jX * E dX = -(jj+I*mu) \ je ```

where `E` is all errors and `I` is the identity matrix.

The adaptive value `mu` is increased by `mu_inc` until the change above results in a reduced performance value. The change is then made to the network and `mu` is decreased by `mu_dec`.

Training stops when any of these conditions occurs:

• The maximum number of `epochs` (repetitions) is reached.

• The maximum amount of `time` is exceeded.

• Performance is minimized to the `goal`.

• The performance gradient falls below `min_grad`.

• `mu` exceeds `mu_max`.

• Validation performance has increased more than `max_fail` times since the last time it decreased (when using validation).

Download ebook