Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English version of the page.

# trainFasterRCNNObjectDetector

Train a Faster R-CNN deep learning object detector

## Syntax

``trainedDetector = trainFasterRCNNObjectDetector(trainingData,network,options)``
``trainedDetector = trainFasterRCNNObjectDetector(trainingData,checkpoint,options)``
``trainedDetector = trainFasterRCNNObjectDetector(trainingData,detector,options)``
``trainedDetector = trainFasterRCNNObjectDetector(___,Name,Value)``
``[trainedDetector,info] = trainFasterRCNNObjectDetector(___)``

## Description

example

````trainedDetector = trainFasterRCNNObjectDetector(trainingData,network,options)` trains a Faster R-CNN (regions with convolution neural networks) object detector using the four-step alternating training method in deep learning [1]. You can train a Faster R-CNN detector to detect multiple object classes. This function requires that you have Deep Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA®-enabled NVIDIA® GPU with compute capability 3.0 or higher.```
````trainedDetector = trainFasterRCNNObjectDetector(trainingData,checkpoint,options)` resumes training from a detector checkpoint.```
````trainedDetector = trainFasterRCNNObjectDetector(trainingData,detector,options)` continues training a Faster R-CNN object detector. Use this syntax for fine-tuning a detector.```
````trainedDetector = trainFasterRCNNObjectDetector(___,Name,Value)` uses additional options specified by one or more `Name,Value` pair arguments and any of the previous inputs.```
````[trainedDetector,info] = trainFasterRCNNObjectDetector(___)` also returns information on the training progress, such as training loss and accuracy, for each iteration.```

## Examples

collapse all

```data = load('fasterRCNNVehicleTrainingData.mat'); trainingData = data.vehicleTrainingData; trainingData.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ... trainingData.imageFilename); ```

Setup network layers.

```layers = data.layers ```
```layers = 11x1 Layer array with layers: 1 '' Image Input 32x32x3 images with 'zerocenter' normalization 2 '' Convolution 32 3x3 convolutions with stride [1 1] and padding [1 1 1 1] 3 '' ReLU ReLU 4 '' Convolution 32 3x3 convolutions with stride [1 1] and padding [1 1 1 1] 5 '' ReLU ReLU 6 '' Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 7 '' Fully Connected 64 fully connected layer 8 '' ReLU ReLU 9 '' Fully Connected 2 fully connected layer 10 '' Softmax softmax 11 '' Classification Output crossentropyex ```

Configure training options.

• Lower the InitialLearnRate to reduce the rate at which network parameters are changed.

• Set the CheckpointPath to save detector checkpoints to a temporary directory. Change this to another location if required.

• Set MaxEpochs to 1 to reduce example training time. Increase this to 10 for proper training.

``` options = trainingOptions('sgdm', ... 'MiniBatchSize', 1, ... 'InitialLearnRate', 1e-3, ... 'MaxEpochs', 5, ... 'VerboseFrequency', 200, ... 'CheckpointPath', tempdir); ```

Train detector. Training will take a few minutes.

```detector = trainFasterRCNNObjectDetector(trainingData, layers, options) ```
```Starting parallel pool (parpool) using the 'local' profile ... connected to 12 workers. ************************************************************************* Training a Faster R-CNN Object Detector for the following object classes: * vehicle Step 1 of 4: Training a Region Proposal Network (RPN). Training on single GPU. |=======================================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | Base Learning | | | | (hh:mm:ss) | Loss | Accuracy | RMSE | Rate | |=======================================================================================================| | 1 | 1 | 00:00:06 | 1.5273 | 53.91% | 0.92 | 0.0010 | | 1 | 200 | 00:00:27 | 1.6777 | 50.00% | 0.83 | 0.0010 | | 2 | 400 | 00:00:48 | 1.1392 | 100.00% | 1.05 | 0.0010 | | 3 | 600 | 00:01:08 | 1.8571 | 100.00% | 1.50 | 0.0010 | | 3 | 800 | 00:01:27 | 2.4457 | 100.00% | 1.82 | 0.0010 | | 4 | 1000 | 00:01:48 | 0.5591 | 100.00% | 0.66 | 0.0010 | | 5 | 1200 | 00:02:11 | 2.4903 | 100.00% | 1.93 | 0.0010 | | 5 | 1400 | 00:02:30 | 0.7697 | 100.00% | 0.84 | 0.0010 | | 5 | 1475 | 00:02:37 | 0.5513 | 100.00% | 0.68 | 0.0010 | |=======================================================================================================| Step 2 of 4: Training a Fast R-CNN Network using the RPN from step 1. ******************************************************************* Training a Fast R-CNN Object Detector for the following object classes: * vehicle --> Extracting region proposals from 295 training images...done. Training on single GPU. |=======================================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | Base Learning | | | | (hh:mm:ss) | Loss | Accuracy | RMSE | Rate | |=======================================================================================================| | 1 | 1 | 00:00:02 | 0.9051 | 75.78% | 0.93 | 0.0010 | | 1 | 200 | 00:00:19 | 0.2377 | 92.31% | 0.71 | 0.0010 | | 2 | 400 | 00:00:37 | 0.2268 | 92.45% | 0.53 | 0.0010 | | 3 | 600 | 00:00:54 | 0.3148 | 89.92% | 0.70 | 0.0010 | | 3 | 800 | 00:01:11 | 0.2093 | 91.41% | 0.56 | 0.0010 | | 4 | 1000 | 00:01:27 | 0.1125 | 97.66% | 1.02 | 0.0010 | | 5 | 1200 | 00:01:46 | 0.4125 | 91.41% | 0.82 | 0.0010 | | 5 | 1400 | 00:02:03 | 0.2403 | 91.41% | 0.64 | 0.0010 | | 5 | 1445 | 00:02:07 | 0.9817 | 76.56% | 0.82 | 0.0010 | |=======================================================================================================| Step 3 of 4: Re-training RPN using weight sharing with Fast R-CNN. Training on single GPU. |=======================================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | Base Learning | | | | (hh:mm:ss) | Loss | Accuracy | RMSE | Rate | |=======================================================================================================| | 1 | 1 | 00:00:00 | 1.0772 | 100.00% | 1.01 | 0.0010 | | 1 | 200 | 00:00:18 | 2.4481 | 100.00% | 1.86 | 0.0010 | | 2 | 400 | 00:00:36 | 1.3111 | 50.78% | 0.72 | 0.0010 | | 3 | 600 | 00:00:54 | 0.5687 | 100.00% | 0.71 | 0.0010 | | 3 | 800 | 00:01:12 | 0.7452 | 97.66% | 0.81 | 0.0010 | | 4 | 1000 | 00:01:30 | 0.8767 | 97.66% | 0.82 | 0.0010 | | 5 | 1200 | 00:01:49 | 1.2515 | 94.53% | 1.15 | 0.0010 | | 5 | 1400 | 00:02:07 | 0.6098 | 98.44% | 0.73 | 0.0010 | | 5 | 1475 | 00:02:14 | 0.5851 | 100.00% | 0.73 | 0.0010 | |=======================================================================================================| Step 4 of 4: Re-training Fast R-CNN using updated RPN. ******************************************************************* Training a Fast R-CNN Object Detector for the following object classes: * vehicle --> Extracting region proposals from 295 training images...done. Training on single GPU. |=======================================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Mini-batch | Base Learning | | | | (hh:mm:ss) | Loss | Accuracy | RMSE | Rate | |=======================================================================================================| | 1 | 1 | 00:00:00 | 0.1679 | 96.88% | 0.51 | 0.0010 | | 1 | 200 | 00:00:15 | 0.1168 | 96.40% | 0.64 | 0.0010 | | 2 | 400 | 00:00:31 | 0.1058 | 97.66% | 0.57 | 0.0010 | | 3 | 600 | 00:00:47 | 0.1568 | 95.31% | 0.45 | 0.0010 | | 3 | 800 | 00:01:03 | 0.0710 | 99.22% | 0.65 | 0.0010 | | 4 | 1000 | 00:01:18 | 0.1159 | 93.75% | 0.55 | 0.0010 | | 5 | 1200 | 00:01:36 | 0.0874 | 98.44% | 0.59 | 0.0010 | | 5 | 1400 | 00:01:51 | 0.0827 | 99.22% | 0.69 | 0.0010 | | 5 | 1470 | 00:01:57 | 0.0778 | 99.22% | 0.43 | 0.0010 | |=======================================================================================================| Detector training complete. ******************************************************************* detector = fasterRCNNObjectDetector with properties: ModelName: 'vehicle' Network: [1×1 DAGNetwork] AnchorBoxes: [5×2 double] ClassNames: {'vehicle' 'Background'} MinObjectSize: [1 1] ```

Test the Fast R-CNN detector on a test image.

```img = imread('highway.png'); ```

Run detector.

```[bbox, score, label] = detect(detector, img); ```

Display detection results.

```detectedImg = insertShape(img, 'Rectangle', bbox); figure imshow(detectedImg) ```

## Input Arguments

collapse all

Labeled ground truth images, specified as a table with two or more columns. The first column must contain paths and file names to grayscale or truecolor (RGB) images. The remaining columns must contain bounding boxes related to the corresponding image. Each column represents a single object class, such as a car, dog, flower, or stop sign.

Each bounding box must be in the format [x y width height]. The format specifies the upper-left corner location and size of the object in the corresponding image. The table variable name defines the object class name. To create the ground truth table, use the Image Labeler or Video Labeler app.

Network, specified as a `SeriesNetwork`, an array of `Layer` objects, a `layerGraph` object, or by the network name. The network is trained to classify the object classes defined in the `trainingData` table. The `SeriesNetwork`, `Layer`, and `layerGraph` objects are available in the Deep Learning Toolbox.

• When you specify the network as a `SeriesNetwork`, an array of `Layer` objects, or by the network name, the network is automatically transformed into a Faster R-CNN network by adding a region proposal network (RPN), an ROI max pooling layer, and new classification and regression layers to support object detection. Additionally, the `GridSize` property of the ROI max pooling layer is set to the output size of the last max pooling layer in the network.

• The array of `Layer` objects must contain a classification layer that supports the number of object classes, plus a background class. Use this input type to customize the learning rates of each layer. An example of an array of `Layer` objects:

```layers = [imageInputLayer([28 28 3]) convolution2dLayer([5 5],10) reluLayer() fullyConnectedLayer(10) softmaxLayer() classificationLayer()]; ```

• When you specify the network as `SeriesNetwork`, `Layer` array, or network by name, the weights for additional convolution and fully-connected layers are initialized to `'narrow-normal'`. The function adds these weights to create the network.

• The network name must be one of the following valid networks names. You must also install the corresponding Add-on.

Network NameFeature Extraction Layer NameROI Pooling Layer OutputSizeDescription
`alexnet``'relu5'`[6 6]Last max pooling layer is replaced by ROI max pooling layer
`vgg16``'relu5_3'`[7 7]
`vgg19``'relu5_4'`
`squeezenet``'fire5-concat'`[14 14]
`resnet18``'res4b_relu'`ROI pooling layer is inserted after the feature extraction layer.
`resnet50``'activation_40_relu'`
`resnet101``'res4b22_relu'`
`googlenet``'inception_4d-output'`
`mobilenetv2``'block_13_expand_relu'`
`inceptionv3``'mixed7'`[17 17]
`inceptionresnetv2``'block17_20_ac'`

• The `LayerGraph` object must be a valid Faster R-CNN object detection network. You can also use a `LayerGraph` object to train a custom Faster R-CNN network.

### Tip

If your network is a `DAGNetwork`, use the `layerGraph` function to convert the network to a `LayerGraph` object. Then, create a custom Faster R-CNN network as described by the Create Faster R-CNN Object Detection Network example.

See R-CNN, Fast R-CNN, and Faster R-CNN Basics to learn more about how to create a Faster R-CNN network.

Training options, returned by the `trainingOptions` function from the Deep Learning Toolbox. To specify solver and other options for network training, use `trainingOptions`.

### Note

`trainFasterRCNNObjectDetector` does not support these training options:

• The `Plots` value: `'training-progress'`

• The `ValidationData`, `ValidationFrequency`, or `ValidationPatience` options

• The `OutputFcn` option.

Saved detector checkpoint, specified as a `fasterRCNNObjectDetector` object. To save the detector after every epoch, set the `'CheckpointPath'` property when using the `trainingOptions` function. Saving a checkpoint after every epoch is recommended because network training can take a few hours.

To load a checkpoint for a previously trained detector, load the MAT-file from the checkpoint path. For example, if the `'CheckpointPath'` property of `options` is `'/tmp'`, load a checkpoint MAT-file using:

`data = load('/tmp/faster_rcnn_checkpoint__105__2016_11_18__14_25_08.mat');`

The name of the MAT-file includes the iteration number and timestamp of when the detector checkpoint was saved. The detector is saved in the `detector` variable of the file. Pass this file back into the `trainFasterRCNNObjectDetector` function:

```frcnn = trainFasterRCNNObjectDetector(stopSigns,... data.detector,options);```

Previously trained Faster R-CNN object detector, specified as a `fasterRCNNObjectDetector` object. Use this syntax to continue training a detector with additional training data or to perform more training iterations to improve detector accuracy.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'PositiveOverlapRange',[0.75 1]`

Bounding box overlap ratios for positive training samples, specified as the comma-separated pair consisting of `'PositiveOverlapRange'` and one of the following:

• A two-element vector that specifies an identical overlap ratio for all four training stages.

• A 4-by-2 matrix, where each row specifies the overlap ratio for each of the four training stages.

Values are in the range [0,1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.

The overlap ratio used for both the `PositiveOverlapRange` and `NegativeOverlapRange` is defined as:

`$\frac{area\left(A\cap B\right)}{area\left(A\cup B\right)}$`

A and B are bounding boxes.

Bounding box overlap ratios for negative training samples, specified as the comma-separated pair consisting of `NegativeOverlapRange` and one of the following.

• A two-element vector that specifies an identical overlap ratio for all four training stages.

• A 4-by-2 matrix, where each row specifies the overlap ratio for each of the four training stages.

Values are the range [0,1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.

The overlap ratio used for both the `PositiveOverlapRange` and `NegativeOverlapRange` is defined as:

`$\frac{area\left(A\cap B\right)}{area\left(A\cup B\right)}$`

A and B are bounding boxes.

Maximum number of strongest region proposals to use for generating training samples, specified as the comma-separated pair consisting of `'NumStrongestRegions'` and a positive integer. Reduce this value to speed up processing time at the cost of training accuracy. To use all region proposals, set this value to `Inf`.

Number of region proposals to randomly sample from each training image, specified as an integer. Reduce the number of regions to sample to reduce memory usage and speed-up training. Reducing the value can also decrease training accuracy.

Length of smallest image dimension, either width or height, specified as the comma-separated pair consisting of `'SmallestImageDimension'` and a positive integer. Training images are resized such that the length of the shortest dimension is equal to the specified integer. By default, training images are not resized. Resizing training images helps reduce computational costs and memory used when training images are large. Typical values range from 400–600 pixels.

Minimum anchor box sizes used to build the anchor box pyramid of the region proposal network (RPN), specified as the comma-separated pair consisting of`'MinBoxSizes'` and an m-by-2 matrix. Each row defines the [height width] of an anchor box.

The default `'auto'` setting uses the minimum size and the median aspect ratio from the bounding boxes for each class in the ground truth data. To remove redundant box sizes, the function keeps boxes that have an intersection-over-union that is less than or equal to 0.5. This behavior ensures that the minimum number of anchor boxes are used to cover all the object sizes and aspect ratios.

When anchor boxes are computed based on `MinBoxSizes`, the ith anchor box size is:

`round(MinBoxSizes(i,:) .* BoxPyramidScale ,^ (0:NumBoxPyramidLevels-1)')`
You cannot use this property if you set the network to a `LayerGraph` object or if you resume training from a detector checkpoint.

Anchor box pyramid scale factor used to successively upscale anchor box sizes, specified as the comma-separated pair consisting of `'BoxPyramidScale'` and a scalar. Recommended values are from 1 through 2. Increase this value for faster results. Decrease the number for greater accuracy.

Number of levels in an anchor box pyramid, specified as the comma-separated pair consisting of `'NumBoxPyramidLevels'` and a scalar. Select a value that ensures that the multiscale anchor boxes are comparable in size to the size of objects in the ground truth data.

The default setting, `'auto'`, selects the number of levels based on the size of objects within the ground truth data. The number of levels is selected such that it covers the range of object sizes.

Frozen batch normalization during training, specified as the comma-separated pair consisting of '`FreezeBatchNormalization`' and `true` or `false`. The value indicates whether the input layers to the network are frozen during training. Set this value to `true` if you are training with a small mini-batch size. Small batch sizes result in poor estimates of the batch mean and variance that is required for effective batch normalization.

If you do not specify a value for '`FreezeBatchNormalization`', the function sets the property to

• `true` if the '`MiniBatchSize`' name-value argument for the `trainingOptions` function is less than `8`.

• `false` if the '`MiniBatchSize`' name-value argument for the `trainingOptions` function is greater than or equal to `8`.

You must specify a value for '`FreezeBatchNormalization`' to overide this default behavior.

## Output Arguments

collapse all

Trained Faster R-CNN object detector, returned as a `fasterRCNNObjectDetector` object.

Training information, returned as a structure array with four elements. Each element corresponds to a stage of training Faster R-CNN, and has following fields. Each field is a numeric vector with one element per training iteration. Values that have not been calculated at a specific iteration are represented by `NaN`.

• `TrainingLoss` — Training loss at each iteration. This is the combination of the classification and regression loss used to train the Faster R-CNN network.

• `TrainingAccuracy` — Training set accuracy at each iteration

• `TrainingRMSE` — Training root mean square error (RMSE) for the box regression layer

• `BaseLearnRate` — Learning rate at each iteration

## Tips

• To accelerate data preprocessing for training, `trainFastRCNNObjectDetector` automatically creates and uses a parallel pool based on your parallel preference settings. For more details about setting these preferences, see parallel preference settings. Using parallel computing preferences requires Parallel Computing Toolbox.

• VGG-16, VGG-19, ResNet-101, and Inception-ResNet-v2 are large models. Training with large images can produce "Out of Memory" errors. To mitigate these errors, try one or more of these options:

• This function supports transfer learning. When you input a `network` by name, such as `'resnet50'`, then the function automatically transforms the network into a valid Faster R-CNN network model based on the pretrained `resnet50` model. Alternatively, manually specify a custom Faster R-CNN network by using the `LayerGraph` extracted from a pretrained DAG network. For more details, see Create Faster R-CNN Object Detection Network.

• This table describes how to transform each named network into a Fast R-CNN network. The feature extraction layer name specifies which layer is processed by the ROI pooling layer. The ROI output size specifies the size of the feature maps output by the ROI pooling layer.

Network NameFeature Extraction Layer NameROI Pooling Layer OutputSizeDescription
`alexnet``'relu5'`[6 6]Last max pooling layer is replaced by ROI max pooling layer
`vgg16``'relu5_3'`[7 7]
`vgg19``'relu5_4'`
`squeezenet``'fire5-concat'`[14 14]
`resnet18``'res4b_relu'`ROI pooling layer is inserted after the feature extraction layer.
`resnet50``'activation_40_relu'`
`resnet101``'res4b22_relu'`
`googlenet``'inception_4d-output'`
`mobilenetv2``'block_13_expand_relu'`
`inceptionv3``'mixed7'`[17 17]
`inceptionresnetv2``'block17_20_ac'`

If you want to modify how a network is transformed into a Faster R-CNN network, see Design an R-CNN, Fast R-CNN, and a Faster R-CNN Model.

• During training, multiple image regions are processed from the training images The number of image regions per image is controlled by the `NumRegionsToSample` property. The `PositiveOverlapRange` and `NegativeOverlapRange` properties control which image regions are used for training. Positive training samples are those that overlap with the ground truth boxes by 0.6 to 1.0, as measured by the bounding box intersection over union metric (IoU). Negative training samples are those that overlap by 0 to 0.3. Choose values for these properties by testing the trained detector on a validation set. For example,

Overlap ValuesDescription
`PositiveOverlapRange` set to `[0.6 1]`Positive training samples are set equal to the samples that overlap with the ground truth boxes by 0.6 to 1.0, measured by the bounding box IoU metric.
`NegativeOverlapRange` set to `[0 0.3]`Negative training samples are set equal to the samples that overlap with the ground truth boxes by 0 to 0.3.

if you set `PositiveOverlapRange` to `[0.6 1]`, then the function sets the positive training samples equal to the samples that overlap with the ground truth boxes by 0.6 to 1.0, measured by the bounding box intersection over union metric. If you set `NegativeOverlapRange` to ```[0 0.3]```, then the function sets negative training samples are those that overlap by 0 to 0.3 if `NegativeOverlapRange` is `[0 0.3]`.

• Use the `trainingOptions` function to enable or disable verbose printing.

## Algorithms

The `trainFasterRCNNObjectDetector` function trains the Faster R-CNN object detector in four stages with alternating optimization [1].

## References

[1] Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." Advances in Neural Information Processing Systems . Vol. 28, 2015.

[2] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015

[3] Girshick, R., J. Donahue, T. Darrell, and J. Malik. "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation." CVPR '14 Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Pages 580-587. 2014

[4] Zitnick, C. Lawrence, and P. Dollar. "Edge boxes: Locating object proposals from edges." Computer Vision-ECCV. Springer International Publishing. Pages 391-4050. 2014.