trainMaskRCNN

Train Mask R-CNN network to perform instance segmentation

Since R2022a

Syntax

trainedDetector = trainMaskRCNN(trainingData,network,options)

trainedDetector = trainMaskRCNN(trainingData,network,options,Name=Value)

[trainedDetector,info] = trainMaskRCNN(trainingData,network,options)

Description

trainedDetector = trainMaskRCNN(trainingData,network,options) trains a Mask R-CNN network. A trained Mask R-CNN network object can perform instance segmentation to detect and segment multiple object classes. This syntax supports transfer learning on a pretrained Mask R-CNN network and training an uninitialized Mask R-CNN network.

This function requires that you have Deep Learning Toolbox™. It is recommended that you also have Parallel Computing Toolbox™ to use with a CUDA^®-enabled NVIDIA^® GPU. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).

trainedDetector = trainMaskRCNN(trainingData,network,options,Name=Value) uses additional options specified by one or more name-value arguments.

[trainedDetector,info] = trainMaskRCNN(trainingData,network,options) also returns information on the training progress, such as training loss and accuracy, for each iteration.

Input Arguments

collapse all

`trainingData` — Labeled ground truth
datastore

Labeled ground truth training data, specified as a datastore. Your data must be set up so that calling the datastore with the read and readall functions returns a cell array with four columns. This table describes the format of each column.

data	boxes	labels	masks
RGB image that serves as a network input, specified as an H-by-W-by-3 numeric array.	Bounding boxes, specified as M-by-4 matrices, where M is the number of objects within the image. Each bounding box has the format [x y width height], where [x, y] represent the top-left coordinates of the bounding box.	Object class names, specified as an M-by-1 categorical vector. All categorical data returned by the datastore must contain the same categories.	Binary masks, specified as a logical array of size H-by-W-by-M. Each mask is the segmentation of one instance in the image.

You can create a datastore that returns data in the required format using these steps:

Create an imageDatastore that returns RGB image data
Create a boxLabelDatastore that returns bounding box data and instance labels as a two-element cell array
Create an imageDatastore and specify a custom read function that returns mask data as a binary matrix
Combine the three datastores using the combine function

For more information, see Getting Started with Mask R-CNN for Instance Segmentation.

`network` — Mask R-CNN network to train
`maskrcnn` object

Mask R-CNN network to train, specified as a maskrcnn object.

`options` — Training options
`TrainingOptionsSGDM` object | `TrainingOptionsRMSProp` object | `TrainingOptionsADAM` object

Training options, specified as a TrainingOptionsSGDM, TrainingOptionsRMSProp, or TrainingOptionsADAM object returned by the trainingOptions (Deep Learning Toolbox) function. To specify the solver name and other options for network training, use the trainingOptions function. You must set the ResetInputNormalization property as false.

Note

If you specify the OutputFcn function handle using the OutputFcn (Deep Learning Toolbox) name-value argument, it must use a per-epoch info structure with these fields:

Epoch
Iteration
TimeElapsed
LearnRate
TrainingLoss

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: trainedDetector = trainMaskRCNN(trainingData,network,options,NumRegionsToSample=64) samples 64 region proposals from each training image

`PositiveOverlapRange` — Bounding box overlap ratios for positive training samples
`[0.5 1]` (default) | two-element numeric vector

Bounding box overlap ratios for positive training samples, specified as a two-element numeric vector with values in the range [0, 1]. Region proposals that overlap with ground truth bounding boxes within the specified range are used as positive training samples.

The overlap ratio for bounding boxes A and B is:

$\frac{a r e a (A \cap B)}{a r e a (A \cup B)}$

`NegativeOverlapRange` — Bounding box overlap ratios for negative training samples
`[0.1 0.5]` (default) | two-element numeric vector

Bounding box overlap ratios for negative training samples, specified as a two-element numeric vector with values in the range [0, 1]. Region proposals that overlap with the ground truth bounding boxes within the specified range are used as negative training samples.

The overlap ratio for bounding boxes A and B is:

$\frac{a r e a (A \cap B)}{a r e a (A \cup B)}$

`NumStrongestRegions` — Maximum number of strongest region proposals
`1000` (default) | positive integer | `Inf`

Maximum number of strongest region proposals to use for generating training samples, specified as a positive integer. Reduce this value to speed up processing time at the cost of training accuracy. To use all region proposals, set this value to Inf.

`NumRegionsToSample` — Number of region proposals
`128` (default) | positive integer

Number of region proposals to randomly sample from each training image, specified as a positive integer. Reduce the number of regions to sample to reduce memory usage and speed up training. Reducing the value can also decrease training accuracy.

`FreezeSubNetwork` — Subnetworks to freeze
`"none"` (default) | `"backbone"` | `"rpn"` | `["backbone" "rpn"]`

Subnetworks to freeze during training, specified as one of these values:

"none" — Do not freeze subnetworks
"backbone" — Freeze the feature extraction subnetwork, including the layers following the ROI align layer
"rpn" — Freeze the region proposal subnetwork
["backbone" "rpn"] — Freeze both the feature extraction and the region proposal subnetworks

The weight of layers in frozen subnetworks does not change during training.

`ExperimentManager` — Training experiment monitor
`"none"` (default) | `experiments.Monitor` object

Training experiment monitor, specified as an experiments.Monitor (Deep Learning Toolbox) object for use with the Experiment Manager (Deep Learning Toolbox) app. You can use this object to track the progress of training, update information fields in the training results table, record values of the metrics used by the training, and to produce training plots.

Information monitored during training:

Training loss at each iteration
Training accuracy at each iteration
Training root mean square error (RMSE) for the box regression layer
Training loss for the mask segmentation branch
Learning rate at each iteration

Validation information when the training options input contains validation data:

Validation loss at each iteration
Validation accuracy at each iteration
Validation RMSE at each iteration
Validation loss for the mask segmentation branch

Output Arguments

collapse all

`trainedDetector` — Trained Mask R-CNN network
`maskrcnn` object

Trained Mask R-CNN network, returned as a maskrcnn object.

`info` — Training progress information
structure

Training progress information, returned as a structure. Each field corresponds to a stage of training.

TrainingLoss — Training loss at each iteration. The loss is the combination of the region proposal network (RPN), classification, regression and mask loss used to train the Mask R-CNN network.
TrainingRPNLoss — Total RPN loss at the end of each iteration.
TrainingRMSE — Training root mean squared error (RMSE) for the box regression layer at the end of each iteration.
TrainingMaskLoss — Training cross-entropy loss for the mask segmentation branch at the end of each iteration.
LearnRate — Learning rate at each iteration.
ValidationLoss — Validation loss at each iteration.
ValidationRPNLoss — Validation RPN loss at each iteration.
ValidationRMSE — Validation RMSE at each iteration.
ValidationMaskLoss — Validation cross-entropy loss for the mask segmentation branch at each iteration.

Each field is a numeric vector with one element per training iteration. Values that are not calculated at a specific iteration are assigned as NaN. The structure contains the ValidationLoss, ValidationRPNLoss, ValidationRMSE, and ValidationMaskLoss fields only when options specifies validation data.

Tips

The trainMaskRCNN function has a high GPU memory requirement. It is recommended to train a Mask R-CNN network with at least 12 GB of available GPU memory.
To reduce the training memory consumption, try reducing the InputSize property of the network argument or the NumRegionsToSample name-value argument.
When you want to perform transfer learning on a data set with similar content to the COCO data set, freezing the feature extraction and region proposal subnetworks can help the network training converge faster.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU acceleration.

By default, the trainMaskRCNN function uses a GPU if one is available. You can specify the hardware that the trainMaskRCNN function uses by setting the ExecutionEnvironment (Deep Learning Toolbox) training option using the trainingOptions (Deep Learning Toolbox) function.

For more information, see Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud (Deep Learning Toolbox).

Version History

Introduced in R2022a

expand all

R2025a: MATLAB Compiler support will be removed

Support for using MATLAB^® Compiler™ will be removed in a future release.

R2022b: Supports training options with plots

When you specify options, the Plots property of the trainingOptions (Deep Learning Toolbox) object can now have a value other than "none". Before, trainMaskRCNN supported only the value of "none".

trainMaskRCNN

Syntax

Description

Input Arguments

`trainingData` — Labeled ground truth
datastore

`network` — Mask R-CNN network to train
`maskrcnn` object

`options` — Training options
`TrainingOptionsSGDM` object | `TrainingOptionsRMSProp` object | `TrainingOptionsADAM` object

Name-Value Arguments

`PositiveOverlapRange` — Bounding box overlap ratios for positive training samples
`[0.5 1]` (default) | two-element numeric vector

`NegativeOverlapRange` — Bounding box overlap ratios for negative training samples
`[0.1 0.5]` (default) | two-element numeric vector

`NumStrongestRegions` — Maximum number of strongest region proposals
`1000` (default) | positive integer | `Inf`

`NumRegionsToSample` — Number of region proposals
`128` (default) | positive integer

`FreezeSubNetwork` — Subnetworks to freeze
`"none"` (default) | `"backbone"` | `"rpn"` | `["backbone" "rpn"]`

`ExperimentManager` — Training experiment monitor
`"none"` (default) | `experiments.Monitor` object

Output Arguments

`trainedDetector` — Trained Mask R-CNN network
`maskrcnn` object

`info` — Training progress information
structure

Tips

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2025a: MATLAB Compiler support will be removed

R2022b: Supports training options with plots

See Also

Topics

trainMaskRCNN

Syntax

Description

Input Arguments

trainingData — Labeled ground truth datastore

network — Mask R-CNN network to train maskrcnn object

options — Training options TrainingOptionsSGDM object | TrainingOptionsRMSProp object | TrainingOptionsADAM object

Name-Value Arguments

PositiveOverlapRange — Bounding box overlap ratios for positive training samples [0.5 1] (default) | two-element numeric vector

NegativeOverlapRange — Bounding box overlap ratios for negative training samples [0.1 0.5] (default) | two-element numeric vector

NumStrongestRegions — Maximum number of strongest region proposals 1000 (default) | positive integer | Inf

NumRegionsToSample — Number of region proposals 128 (default) | positive integer

FreezeSubNetwork — Subnetworks to freeze "none" (default) | "backbone" | "rpn" | ["backbone" "rpn"]

ExperimentManager — Training experiment monitor "none" (default) | experiments.Monitor object

Output Arguments

trainedDetector — Trained Mask R-CNN network maskrcnn object

info — Training progress information structure

Tips

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2025a: MATLAB Compiler support will be removed

R2022b: Supports training options with plots

See Also

Topics

`trainingData` — Labeled ground truth
datastore

`network` — Mask R-CNN network to train
`maskrcnn` object

`options` — Training options
`TrainingOptionsSGDM` object | `TrainingOptionsRMSProp` object | `TrainingOptionsADAM` object

`PositiveOverlapRange` — Bounding box overlap ratios for positive training samples
`[0.5 1]` (default) | two-element numeric vector

`NegativeOverlapRange` — Bounding box overlap ratios for negative training samples
`[0.1 0.5]` (default) | two-element numeric vector

`NumStrongestRegions` — Maximum number of strongest region proposals
`1000` (default) | positive integer | `Inf`

`NumRegionsToSample` — Number of region proposals
`128` (default) | positive integer

`FreezeSubNetwork` — Subnetworks to freeze
`"none"` (default) | `"backbone"` | `"rpn"` | `["backbone" "rpn"]`

`ExperimentManager` — Training experiment monitor
`"none"` (default) | `experiments.Monitor` object

`trainedDetector` — Trained Mask R-CNN network
`maskrcnn` object

`info` — Training progress information
structure

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.