Documentation

# pixelLabelImageDatastore

Datastore for semantic segmentation networks

## Description

Use pixelLabelImageDatastore to create a datastore for training a semantic segmentation network using deep learning.

## Creation

### Description

example

pximds = pixelLabelImageDatastore(gTruth) returns a datastore for training a semantic segmentation network based on the input groundTruth object or array of groundTruth objects. Use the output pixelLabelImageDatastore object with the Deep Learning Toolbox™ function trainNetwork to train convolutional neural networks for semantic segmentation.

pximds = pixelLabelImageDatastore(imds,pxds) returns a datastore based on the input image datastore and the pixel label datastore objects. imds is an ImageDatastore object that represents the training input to the network. pxds is a PixelLabelDatastore object that represents the required network output.

pximds = pixelLabelImageDatastore(___,Name,Value) additionally uses name-value pairs to set the DispatchInBackground and OutputSizeMode properties. For 2-D data, you can also use name-value pairs to specify the ColorPreprocessing, DataAugmentation, and OutputSize augmentation properties. You can specify multiple name-value pairs. Enclose each property name in quotes.

For example, pixelLabelImageDatastore(gTruth,'PatchesPerImage',40) creates a pixel label image datastore that randomly generates 40 patches from each ground truth object in gTruth.

### Input Arguments

expand all

Ground truth data, specified as a groundTruth object or as an array of groundTruth objects. Each groundTruth object contains information about the data source, the list of label definitions, and all marked labels for a set of ground truth labels.

Collection of images, specified as an ImageDatastore object.

Collection of pixel labeled images, specified as a PixelLabelDatastore object. The object contains the pixel labeled images for each image contained in the imds input object.

## Properties

expand all

Image file names used as the source for ground truth images, specified as a character vector or a cell array of character vectors.

Pixel label data file names used as the source for ground truth label images, specified as a character or a cell array of characters.

Class names, specified as a cell array of character vectors.

Color channel preprocessing for 2-D data, specified as 'none', 'gray2rgb', or 'rgb2gray'. Use this property when you need the image data created by the data source must be only color or grayscale, but the training set includes both. Suppose you need to train a network that expects color images but some of your training images are grayscale. Set ColorPreprocessing to 'gray2rgb' to replicate the color channels of the grayscale images in the input image set. Using the 'gray2rgb' option creates M-by-N-by-3 output images.

The ColorPreprocessing property is not supported for 3-D data. To perform color channel preprocessing of 3-D data, use the transform function.

Preprocessing applied to input images, specified as an imageDataAugmenter object or 'none'. When DataAugmentation is 'none', no preprocessing is applied to input images. Training data can be augmented in real-time during training.

The DataAugmentation property is not supported for 3-D data. To preprocess 3-D data, use the transform function.

Dispatch observations in the background during training, prediction, and classification, specified as false or true. To use background dispatching, you must have Parallel Computing Toolbox™. If DispatchInBackground is true and you have Parallel Computing Toolbox, then pixelLabelImageDatastore asynchronously reads patches, adds noise, and queues patch pairs.

Number of observations that are returned in each batch. The default value is equal to the ReadSize of image datastore imds. You can change the value of MiniBatchSize only after you create the datastore. For training, prediction, or classification, the MiniBatchSize property is set to the mini-batch size defined in trainingOptions.

Total number of observations in the denoising image datastore. The number of observations is the length of one training epoch.

Size of output images, specified as a vector of two positive integers. The first element specifies the number of rows in the output images, and the second element specifies the number of columns. When you specify OutputSize, image sizes are adjusted as necessary. By default, this property is empty, which means that the images are not adjusted.

The OutputSize property is not supported for 3-D data. To set the output size of 3-D data, use the transform function.

Method used to resize output images, specified as one of the following. This property applies only when you set OutputSize to a value other than [].

• 'resize' — Scale the image to fit the output size. For more information, see imresize.

• 'centercrop' — Take a crop from the center of the training image. The crop has the same size as the output size.

• 'randcrop' — Take a random crop from the training image. The random crop has the same size as the output size.

Data Types: char | string

## Object Functions

 combine Combine data from multiple datastores countEachLabel Count occurrence of pixel or box labels hasdata Determine if data is available to read partitionByIndex Partition pixelLabelImageDatastore according to indices preview Subset of data in datastore read Read data from a datastore readall Read all data in datastore readByIndex Read data specified by index from pixelLabelImageDatastore reset Reset datastore to initial state shuffle Shuffle data in pixelLabelImageDatastore transform Transform datastore

## Examples

collapse all

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an image datastore for the images.

imds = imageDatastore(imageDir);

Create a pixelLabelDatastore for the ground truth pixel labels.

classNames = ["triangle","background"];
labelIDs   = [255 0];
pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);

Visualize training images and ground truth pixel labels.

I = imresize(I,5);
L = imresize(uint8(C),5);
imshowpair(I,L,'montage')

Create a semantic segmentation network. This network uses a simple semantic segmentation network based on a downsampling and upsampling design.

numFilters = 64;
filterSize = 3;
numClasses = 2;
layers = [
imageInputLayer([32 32 1])
reluLayer()
maxPooling2dLayer(2,'Stride',2)
reluLayer()
transposedConv2dLayer(4,numFilters,'Stride',2,'Cropping',1);
convolution2dLayer(1,numClasses);
softmaxLayer()
pixelClassificationLayer()
]
layers =
10x1 Layer array with layers:

1   ''   Image Input                  32x32x1 images with 'zerocenter' normalization
2   ''   Convolution                  64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
3   ''   ReLU                         ReLU
4   ''   Max Pooling                  2x2 max pooling with stride [2  2] and padding [0  0  0  0]
5   ''   Convolution                  64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
6   ''   ReLU                         ReLU
7   ''   Transposed Convolution       64 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
8   ''   Convolution                  2 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
9   ''   Softmax                      softmax
10   ''   Pixel Classification Layer   Cross-entropy loss

Setup training options.

opts = trainingOptions('sgdm', ...
'InitialLearnRate',1e-3, ...
'MaxEpochs',100, ...
'MiniBatchSize',64);

Create a pixel label image datastore that contains training data.

trainingData = pixelLabelImageDatastore(imds,pxds);

Train the network.

net = trainNetwork(trainingData,layers,opts);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |       31.86% |       0.6934 |          0.0010 |
|      17 |          50 |       00:00:03 |       94.52% |       0.5564 |          0.0010 |
|      34 |         100 |       00:00:07 |       95.25% |       0.4415 |          0.0010 |
|      50 |         150 |       00:00:11 |       95.14% |       0.3722 |          0.0010 |
|      67 |         200 |       00:00:14 |       94.52% |       0.3336 |          0.0010 |
|      84 |         250 |       00:00:18 |       95.25% |       0.2931 |          0.0010 |
|     100 |         300 |       00:00:21 |       95.14% |       0.2708 |          0.0010 |
|========================================================================================|

Read and display a test image.

imshow(testImage)

Segment the test image and display the results.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
imshow(B)

Improve the results

The network failed to segment the triangles and classified every pixel as "background". The training appeared to be going well with training accuracies greater than 90%. However, the network only learned to classify the background class. To understand why this happened, you can count the occurrence of each pixel label across the dataset.

tbl = countEachLabel(trainingData)
tbl=2×3 table
Name        PixelCount    ImagePixelCount
____________    __________    _______________

'triangle'           10326       2.048e+05
'background'    1.9447e+05       2.048e+05

The majority of pixel labels are for the background. The poor results are due to the class imbalance. Class imbalance biases the learning process in favor of the dominant class. That's why every pixel is classified as "background". To fix this, use class weighting to balance the classes. There are several methods for computing class weights. One common method is inverse frequency weighting where the class weights are the inverse of the class frequencies. This increases weight given to under-represented classes.

totalNumberOfPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / totalNumberOfPixels;
classWeights = 1./frequency
classWeights = 2×1

19.8334
1.0531

Class weights can be specified using the pixelClassificationLayer. Update the last layer to use a pixelClassificationLayer with inverse class weights.

layers(end) = pixelClassificationLayer('Classes',tbl.Name,'ClassWeights',classWeights);

Train network again.

net = trainNetwork(trainingData,layers,opts);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |       47.50% |       0.6925 |          0.0010 |
|      17 |          50 |       00:00:04 |       19.67% |       0.6837 |          0.0010 |
|      34 |         100 |       00:00:08 |       75.77% |       0.4433 |          0.0010 |
|      50 |         150 |       00:00:12 |       85.00% |       0.4018 |          0.0010 |
|      67 |         200 |       00:00:16 |       87.00% |       0.3568 |          0.0010 |
|      84 |         250 |       00:00:20 |       88.03% |       0.3153 |          0.0010 |
|     100 |         300 |       00:00:24 |       90.42% |       0.2890 |          0.0010 |
|========================================================================================|

Try to segment the test image again.

C = semanticseg(testImage,net);
B = labeloverlay(testImage,C);
imshow(B)

Using class weighting to balance the classes produced a better segmentation result. Additional steps to improve the results include increasing the number of epochs used for training, adding more training data, or modifying the network.

Configure a pixel label image datastore to augment data while training.

Load training images and pixel labels.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an imageDatastore object to hold the training images.

imds = imageDatastore(imageDir);

Define the class names and their associated label IDs.

classNames = ["triangle","background"];
labelIDs   = [255 0];

Create a pixelLabelDatastore object to hold the ground truth pixel labels for the training images.

pxds = pixelLabelDatastore(labelDir, classNames, labelIDs);

Create an imageDataAugmenter object to randomly rotate and mirror image data.

augmenter = imageDataAugmenter('RandRotation',[-10 10],'RandXReflection',true)
augmenter =
imageDataAugmenter with properties:

FillValue: 0
RandXReflection: 1
RandYReflection: 0
RandRotation: [-10 10]
RandScale: [1 1]
RandXScale: [1 1]
RandYScale: [1 1]
RandXShear: [0 0]
RandYShear: [0 0]
RandXTranslation: [0 0]
RandYTranslation: [0 0]

Create a pixelLabelImageDatastore object to train the network with augmented data.

plimds = pixelLabelImageDatastore(imds,pxds,'DataAugmentation',augmenter)
plimds =
pixelLabelImageDatastore with properties:

Images: {200x1 cell}
PixelLabelData: {200x1 cell}
ClassNames: {2x1 cell}
DataAugmentation: [1x1 imageDataAugmenter]
ColorPreprocessing: 'none'
OutputSize: []
OutputSizeMode: 'resize'
MiniBatchSize: 1
NumObservations: 200
DispatchInBackground: 0

Define and create a custom pixel classification layer that uses Dice loss.

You can use this layer to train semantic segmentation networks. To learn more about creating custom deep learning layers, see Define Custom Deep Learning Layers (Deep Learning Toolbox).

Dice Loss

The Dice loss is based on the Sørensen-Dice similarity coefficient for measuring the overlap between two segmented images. The generalized Dice loss [1,2] $\mathit{L}$ for between one image $\mathit{Y}$ and the corresponding ground truth $\mathit{T}$ is given by

$\mathit{L}=1-\frac{2{\sum }_{\mathit{k}=1}^{\mathit{K}}{\mathit{w}}_{\mathit{k}}{\sum }_{\mathit{m}=1}^{\mathit{M}}{\mathit{Y}}_{\mathit{km}}{\mathit{T}}_{\mathit{km}}}{{\sum }_{\mathit{k}=1}^{\mathit{K}}{\mathit{w}}_{\mathit{k}}{\sum }_{\mathit{m}=1}^{\mathit{M}}{\mathit{Y}}_{\mathit{km}}^{2}+{\mathit{T}}_{\mathit{km}}^{2}}$ ,

where $\mathit{K}$ is the number of classes, $\mathit{M}$ is the number of elements along the first two dimensions of $\mathit{Y}$, and${\mathit{w}}_{\mathit{k}}$ is a class-specific weighting factor that controls the contribution each class makes to the loss. ${\mathit{w}}_{\mathit{k}}$ is typically the inverse area of the expected region:

${\mathit{w}}_{\mathit{k}}=\frac{1}{{\left(\sum _{\mathit{m}=1}^{\mathit{M}}{\mathit{T}}_{\mathit{km}}\right)}^{2}}$

This weighting helps counter the influence of larger regions on the Dice score and makes it easier for the network to learn how to segment smaller regions.

Classification Layer Template

Copy the classification layer template into a new file in MATLAB®. This template outlines the structure of a classification layer and includes the functions that define the layer behavior. The rest of the example shows how to complete the dicePixelClassificationLayer.

classdef dicePixelClassificationLayer < nnet.layer.ClassificationLayer

properties
% Optional properties
end

methods

function loss = forwardLoss(layer, Y, T)
% Layer forward loss function goes here.
end

function dLdY = backwardLoss(layer, Y, T)
% Layer backward loss function goes here.
end
end
end

Declare Layer Properties

By default, custom output layers have the following properties:

• Name — Layer name, specified as a character vector or a string scalar. To include this layer in a layer graph, you must specify a nonempty unique layer name. If you train a series network with this layer and Name is set to '', then the software automatically assigns a name at training time.

• Description — One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a Layer array. If you do not specify a layer description, then the software displays the layer class name.

• Type — Type of the layer, specified as a character vector or a string scalar. The value of Type appears when the layer is displayed in a Layer array. If you do not specify a layer type, then the software displays 'Classification layer' or 'Regression layer'.

Custom classification layers also have the following property:

• Classes — Classes of the output layer, specified as a categorical vector, string array, cell array of character vectors, or 'auto'. If Classes is 'auto', then the software automatically sets the classes at training time. If you specify a string array or cell array of character vectors str, then the software sets the classes of the output layer to categorical(str,str). The default value is 'auto'.

If the layer has no other properties, then you can omit the properties section.

The Dice loss requires a small constant value to prevent division by zero. Specify the property, Epsilon, to hold this value.

classdef dicePixelClassificationLayer < nnet.layer.ClassificationLayer

properties(Constant)
% Small constant to prevent division by zero.
Epsilon = 1e-8;

end

...
end

Create Constructor Function

Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.

Specify an optional input argument name to assign to the Name property at creation.

function layer = dicePixelClassificationLayer(name)
% layer =  dicePixelClassificationLayer(name) creates a Dice
% pixel classification layer with the specified name.

% Set layer name.
layer.Name = name;

% Set layer description.
layer.Description = 'Dice loss';
end

Create Forward Loss Function

Create a function named forwardLoss that returns the weighted cross entropy loss between the predictions made by the network and the training targets. The syntax for forwardLoss is loss = forwardLoss(layer, Y, T), where Y is the output of the previous layer and T represents the training targets.

For semantic segmentation problems, the dimensions of T match the dimension of Y, where Y is a 4-D array of size H-by-W-by-K-by-N, where K is the number of classes, and N is the mini-batch size.

The size of Y depends on the output of the previous layer. To ensure that Y is the same size as T, you must include a layer that outputs the correct size before the output layer. For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can include a fully connected layer of size K or a convolutional layer with K filters followed by a softmax layer before the output layer.

function loss = forwardLoss(layer, Y, T)
% loss = forwardLoss(layer, Y, T) returns the Dice loss between
% the predictions Y and the training targets T.

% Weights by inverse of region size.
W = 1 ./ sum(sum(T,1),2).^2;

intersection = sum(sum(Y.*T,1),2);
union = sum(sum(Y.^2 + T.^2, 1),2);

numer = 2*sum(W.*intersection,3) + layer.Epsilon;
denom = sum(W.*union,3) + layer.Epsilon;

% Compute Dice score.
dice = numer./denom;

% Return average Dice loss.
N = size(Y,4);
loss = sum((1-dice))/N;

end

Create Backward Loss Function

Create the backward loss function that returns the derivatives of the Dice loss with respect to the predictions Y. The syntax for backwardLoss is loss = backwardLoss(layer, Y, T), where Y is the output of the previous layer and T represents the training targets.

The dimensions of Y and T are the same as the inputs in forwardLoss.

function dLdY = backwardLoss(layer, Y, T)
% dLdY = backwardLoss(layer, Y, T) returns the derivatives of
% the Dice loss with respect to the predictions Y.

% Weights by inverse of region size.
W = 1 ./ sum(sum(T,1),2).^2;

intersection = sum(sum(Y.*T,1),2);
union = sum(sum(Y.^2 + T.^2, 1),2);

numer = 2*sum(W.*intersection,3) + layer.Epsilon;
denom = sum(W.*union,3) + layer.Epsilon;

N = size(Y,4);

dLdY = (2*W.*Y.*numer./denom.^2 - 2*W.*T./denom)./N;
end

Completed Layer

The completed layer is provided in dicePixelClassificationLayer.m.

classdef dicePixelClassificationLayer < nnet.layer.ClassificationLayer
% This layer implements the generalized Dice loss function for training
% semantic segmentation networks.

properties(Constant)
% Small constant to prevent division by zero.
Epsilon = 1e-8;
end

methods

function layer = dicePixelClassificationLayer(name)
% layer =  dicePixelClassificationLayer(name) creates a Dice
% pixel classification layer with the specified name.

% Set layer name.
layer.Name = name;

% Set layer description.
layer.Description = 'Dice loss';
end

function loss = forwardLoss(layer, Y, T)
% loss = forwardLoss(layer, Y, T) returns the Dice loss between
% the predictions Y and the training targets T.

% Weights by inverse of region size.
W = 1 ./ sum(sum(T,1),2).^2;

intersection = sum(sum(Y.*T,1),2);
union = sum(sum(Y.^2 + T.^2, 1),2);

numer = 2*sum(W.*intersection,3) + layer.Epsilon;
denom = sum(W.*union,3) + layer.Epsilon;

% Compute Dice score.
dice = numer./denom;

% Return average Dice loss.
N = size(Y,4);
loss = sum((1-dice))/N;

end

function dLdY = backwardLoss(layer, Y, T)
% dLdY = backwardLoss(layer, Y, T) returns the derivatives of
% the Dice loss with respect to the predictions Y.

% Weights by inverse of region size.
W = 1 ./ sum(sum(T,1),2).^2;

intersection = sum(sum(Y.*T,1),2);
union = sum(sum(Y.^2 + T.^2, 1),2);

numer = 2*sum(W.*intersection,3) + layer.Epsilon;
denom = sum(W.*union,3) + layer.Epsilon;

N = size(Y,4);

dLdY = (2*W.*Y.*numer./denom.^2 - 2*W.*T./denom)./N;
end
end
end

GPU Compatibility

For GPU compatibility, the layer functions must support inputs and return outputs of type gpuArray. Any other functions used by the layer must do the same.

The MATLAB functions used in forwardLoss and backwardLoss in dicePixelClassificationLayer all support gpuArray inputs, so the layer is GPU compatible.

Check Output Layer Validity

Create an instance of the layer.

layer = dicePixelClassificationLayer('dice');

Check the layer validity of the layer using checkLayer. Specify the valid input size to be the size of a single observation of typical input to the layer. The layer expects a H-by-W-by-K-by-N array inputs, where K is the number of classes and N is the number of observations in the mini-batch.

numClasses = 2;
validInputSize = [4 4 numClasses];
checkLayer(layer,validInputSize, 'ObservationDimension',4)
Running nnet.checklayer.OutputLayerTestCase
.......... .......
Done nnet.checklayer.OutputLayerTestCase
__________

Test Summary:
17 Passed, 0 Failed, 0 Incomplete, 0 Skipped.
Time elapsed: 1.6227 seconds.

The test summary reports the number of passed, failed, incomplete, and skipped tests.

Use Custom Layer in Semantic Segmentation Network

Create a semantic segmentation network that uses the dicePixelClassificationLayer.

layers = [
imageInputLayer([32 32 1])
reluLayer
maxPooling2dLayer(2,'Stride',2)
reluLayer
transposedConv2dLayer(4,64,'Stride',2,'Cropping',1)
convolution2dLayer(1,2)
softmaxLayer
dicePixelClassificationLayer('dice')]
layers =
10x1 Layer array with layers:

1   ''       Image Input              32x32x1 images with 'zerocenter' normalization
2   ''       Convolution              64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
3   ''       ReLU                     ReLU
4   ''       Max Pooling              2x2 max pooling with stride [2  2] and padding [0  0  0  0]
5   ''       Convolution              64 3x3 convolutions with stride [1  1] and padding [1  1  1  1]
6   ''       ReLU                     ReLU
7   ''       Transposed Convolution   64 4x4 transposed convolutions with stride [2  2] and output cropping [1  1]
8   ''       Convolution              2 1x1 convolutions with stride [1  1] and padding [0  0  0  0]
9   ''       Softmax                  softmax
10   'dice'   Classification Output    Dice loss

Load training data for semantic segmentation using imageDatastore and pixelLabelDatastore.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

imds = imageDatastore(imageDir);

classNames = ["triangle" "background"];
labelIDs = [255 0];
pxds = pixelLabelDatastore(labelDir, classNames, labelIDs);

Associate the image and pixel label data using pixelLabelImageDatastore.

ds = pixelLabelImageDatastore(imds,pxds);

Set the training options and train the network.

options = trainingOptions('sgdm', ...
'InitialLearnRate',1e-2, ...
'MaxEpochs',100, ...
'LearnRateDropFactor',1e-1, ...
'LearnRateDropPeriod',50, ...
'LearnRateSchedule','piecewise', ...
'MiniBatchSize',128);

net = trainNetwork(ds,layers,options);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:03 |       27.89% |       0.8346 |          0.0100 |
|      50 |          50 |       00:00:34 |       89.67% |       0.6384 |          0.0100 |
|     100 |         100 |       00:01:09 |       94.35% |       0.5024 |          0.0010 |
|========================================================================================|

Evaluate the trained network by segmenting a test image and displaying the segmentation result.

[C,scores] = semanticseg(I,net);

B = labeloverlay(I,C);
figure
imshow(imtile({I,B}))

Train a semantic segmentation network using dilated convolutions.

A semantic segmentation network classifies every pixel in an image, resulting in an image that is segmented by class. Applications for semantic segmentation include road segmentation for autonomous driving and cancer cell segmentation for medical diagnosis. To learn more, see Getting Started With Semantic Segmentation Using Deep Learning.

Semantic segmentation networks like DeepLab [1] make extensive use of dilated convolutions (also known as atrous convolutions) because they can increase the receptive field of the layer (the area of the input which the layers can see) without increasing the number of parameters or computations.

The example uses a simple dataset of 32-by-32 triangle images for illustration purposes. The dataset includes accompanying pixel label ground truth data. Load the training data using an imageDatastore and a pixelLabelDatastore.

dataFolder = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageFolderTrain = fullfile(dataFolder,'trainingImages');
labelFolderTrain = fullfile(dataFolder,'trainingLabels');

Create an imageDatastore for the images.

imdsTrain = imageDatastore(imageFolderTrain);

Create a pixelLabelDatastore for the ground truth pixel labels.

classNames = ["triangle" "background"];
labels = [255 0];
pxdsTrain = pixelLabelDatastore(labelFolderTrain,classNames,labels)
pxdsTrain =
PixelLabelDatastore with properties:

Files: {200×1 cell}
ClassNames: {2×1 cell}
AlternateFileSystemRoots: {}

Create Semantic Segmentation Network

This example uses a simple semantic segmentation network based on dilated convolutions.

Create a data source for training data and get the pixel counts for each label.

pximdsTrain = pixelLabelImageDatastore(imdsTrain,pxdsTrain);
tbl = countEachLabel(pximdsTrain)
tbl=2×3 table
Name        PixelCount    ImagePixelCount
____________    __________    _______________

'triangle'           10326       2.048e+05
'background'    1.9447e+05       2.048e+05

The majority of pixel labels are for background. This class imbalance biases the learning process in favor of the dominant class. To fix this, use class weighting to balance the classes. You can use several methods to compute class weights. One common method is inverse frequency weighting where the class weights are the inverse of the class frequencies. This method increases the weight given to under represented classes. Calculate the class weights using inverse frequency weighting.

numberPixels = sum(tbl.PixelCount);
frequency = tbl.PixelCount / numberPixels;
classWeights = 1 ./ frequency;

Create a network for pixel classification by using an image input layer with an input size corresponding to the size of the input images. Next, specify three blocks of convolution, batch normalization, and ReLU layers. For each convolutional layer, specify 32 3-by-3 filters with increasing dilation factors and pad the inputs so they are the same size as the outputs by setting the 'Padding' option to 'same'. To classify the pixels, include a convolutional layer with K 1-by-1 convolutions, where K is the number of classes, followed by a softmax layer and a pixelClassificationLayer with the inverse class weights.

inputSize = [32 32 1];
filterSize = 3;
numFilters = 32;
numClasses = numel(classNames);

layers = [
imageInputLayer(inputSize)

batchNormalizationLayer
reluLayer

batchNormalizationLayer
reluLayer

batchNormalizationLayer
reluLayer

convolution2dLayer(1,numClasses)
softmaxLayer
pixelClassificationLayer('Classes',classNames,'ClassWeights',classWeights)];

Train Network

Specify the training options.

options = trainingOptions('sgdm', ...
'MaxEpochs', 100, ...
'MiniBatchSize', 64, ...
'InitialLearnRate', 1e-3);

Train the network using trainNetwork.

net = trainNetwork(pximdsTrain,layers,options);
Training on single GPU.
Initializing image normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |       67.54% |       0.7098 |          0.0010 |
|      17 |          50 |       00:00:03 |       84.60% |       0.3851 |          0.0010 |
|      34 |         100 |       00:00:06 |       89.85% |       0.2536 |          0.0010 |
|      50 |         150 |       00:00:09 |       93.39% |       0.1959 |          0.0010 |
|      67 |         200 |       00:00:11 |       95.89% |       0.1559 |          0.0010 |
|      84 |         250 |       00:00:14 |       97.29% |       0.1188 |          0.0010 |
|     100 |         300 |       00:00:18 |       98.28% |       0.0970 |          0.0010 |
|========================================================================================|

Test Network

Load the test data. Create an imageDatastore for the images. Create a pixelLabelDatastore for the ground truth pixel labels.

imageFolderTest = fullfile(dataFolder,'testImages');
imdsTest = imageDatastore(imageFolderTest);
labelFolderTest = fullfile(dataFolder,'testLabels');
pxdsTest = pixelLabelDatastore(labelFolderTest,classNames,labels);

Make predictions using the test data and trained network.

pxdsPred = semanticseg(imdsTest,net,'WriteLocation',tempdir);
Running semantic segmentation network
-------------------------------------
* Processing 100 images.
* Progress: 100.00%

Evaluate the prediction accuracy using evaluateSemanticSegmentation.

metrics = evaluateSemanticSegmentation(pxdsPred,pxdsTest);
Evaluating semantic segmentation results
----------------------------------------
* Selected metrics: global accuracy, class accuracy, IoU, weighted IoU, BF score.
* Processing 100 images...
[==================================================] 100%
Elapsed time: 00:00:00
Estimated time remaining: 00:00:00
* Finalizing... Done.
* Data set metrics:

GlobalAccuracy    MeanAccuracy    MeanIoU    WeightedIoU    MeanBFScore
______________    ____________    _______    ___________    ___________

0.98334          0.99107       0.85869      0.97109        0.68197

Segment New Image

Read and display the test image triangleTest.jpg.

figure
imshow(imgTest)

Segment the test image using semanticseg and display the results using labeloverlay.

C = semanticseg(imgTest,net);
B = labeloverlay(imgTest,C);
figure
imshow(B)

## Tips

• The pixelLabelDatastore pxds and the imageDatastore imds store files that are located in a folder in lexicographical order. For example, if you have twelve files named 'file1.jpg', 'file2.jpg', … , 'file11.jpg', and 'file12.jpg', then the files are stored in this order:

'file1.jpg'
'file10.jpg'
'file11.jpg'
'file12.jpg'
'file2.jpg'
'file3.jpg'
...
'file9.jpg'
Files that are stored in a cell array are read in the same order as they are stored.

If the order of files in pxds and imds are not the same, then you may encounter a mismatch when you read a ground truth image and corresponding label data using a pixelLabelImageDatastore. If this occurs, then rename the pixel label files so that they have the correct order. For example, rename 'file1.jpg', … , 'file9.jpg' to 'file01.jpg', …, 'file09.jpg'.

• To extract semantic segmentation data from a groundTruth object generated by the Video Labeler or Ground Truth Labeler, use the pixelLabelTrainingData function.