Multiclass Object Detection Using YOLO v2 Deep Learning

Since R2024b

This example uses:

This example shows how to perform multiclass object detection on a custom data set.

Overview

Deep learning is a powerful machine learning technique that you can use to train robust multiclass object detectors such as YOLO v2, YOLO v4, YOLOX, SSD, and Faster R-CNN. This example trains a YOLO v2 multiclass object detector using the trainYOLOv2ObjectDetector function. The trained object detector is able to detect and identify multiple indoor objects. For more information about training other multiclass object detectors, such as YOLOX, YOLO v4, SSD, and Faster R-CNN, see Get Started with Object Detection Using Deep Learning and Choose an Object Detector.

This example first shows you how to detect multiple objects in an image using a pretrained YOLO v2 object detector. Then, you can optionally download a data set and train YOLO v2 on a custom data set using transfer learning.

Load Pretrained Object Detector

Download a pretrained YOLO v2 object detector, and load it into the workspace.

pretrainedURL = "https://www.mathworks.com/supportfiles/vision/data/yolov2IndoorObjectDetector23b.zip";
pretrainedFolder = fullfile(tempdir,"pretrainedNetwork");
pretrainedNetworkZip = fullfile(pretrainedFolder,"yolov2IndoorObjectDetector23b.zip"); 

if ~exist(pretrainedNetworkZip,"file")
    mkdir(pretrainedFolder)
    disp("Downloading pretrained network (6 MB)...")
    websave(pretrainedNetworkZip,pretrainedURL)
end

unzip(pretrainedNetworkZip,pretrainedFolder)

trainedNetwork = fullfile(pretrainedFolder,"yolov2IndoorObjectDetector.mat");
trainedNetwork = load(trainedNetwork);
trainedDetector = trainedNetwork.detector;

Detect Multiple Indoor Objects

Read a test image that contains objects of the target classes, run the object detector on the image, and display an image annotated with the detection results.

I = imread("indoorTest.jpg");
[bbox,score,label] = detect(trainedDetector,I);

LabelScoreStr = compose("%s-%.2f",label,score); 
annotatedImage = insertObjectAnnotation(I,"rectangle",bbox,LabelScoreStr,LineWidth=4,FontSize=24);
figure
imshow(annotatedImage)

Load Data for Training

This example uses the Indoor Object Detection Dataset created by Bishwo Adhikari [1]. The data set consists of 2213 labeled images collected from indoor scenes and contains 7 classes: fire extinguisher, chair, clock, trash bin, screen, and printer. Each image contains one or more labeled instances of these classes. Check whether the data set has already been downloaded and, if it is not, use websave to download it.

dsURL = "https://zenodo.org/record/2654485/files/Indoor%20Object%20Detection%20Dataset.zip?download=1"; 
outputFolder = fullfile(tempdir,"indoorObjectDetection"); 
imagesZip = fullfile(outputFolder,"indoor.zip");

if ~exist(imagesZip,"file")   
    mkdir(outputFolder)       
    disp("Downloading 401 MB Indoor Objects Dataset images...") 
    websave(imagesZip,dsURL)
    unzip(imagesZip,fullfile(outputFolder))  
end

Create an imageDatastore object to store the images from the data set.

datapath = fullfile(outputFolder,"Indoor Object Detection Dataset");
imds = imageDatastore(datapath,IncludeSubfolders=true, FileExtensions=".jpg");

The annotationsIndoor.mat file contains annotations for each of the images in the data, as well as vectors that specify the indices of the data set images to use for the training, validation, and test sets. Load the file into the workspace, and extract annotations and the indices corresponding to the training, validation, and test sets from the data variable. The indices specify 2207 images in total, instead of 2213 images, as 6 images have no labels associated with them. Use the indices of the images that contain labels to remove these 6 images from the image and annotations datastores.

data = load("annotationsIndoor.mat");
blds = data.BBstore;
trainingIdx = data.trainingIdx;
validationIdx = data.validationIdx;
testIdx = data.testIdx;
cleanIdx = data.idxs;

% Remove the 6 images with no labels.
imds = subset(imds,cleanIdx);
blds = subset(blds,cleanIdx);

Analyze Training Data

Analyze the distribution of object class labels and sizes to understand the data better. This analysis is critical because it helps you determine how to prepare the training data and how to configure an object detector for this specific data set.

Analyze Class Distribution

Measure the distribution of bounding box class labels in the data set by using the countEachLabel function.

tbl = countEachLabel(blds)

tbl=7×3 table
         Label          Count    ImageCount
    ________________    _____    __________

    exit                 545        504    
    fireextinguisher    1684        818    
    chair               1662        850    
    clock                280        277    
    trashbin             228        170    
    screen               115         94    
    printer               81         81

Visualize the counts by class.

bar(tbl.Label,tbl.Count)
ylabel("Frequency")

The classes in this data set are unbalanced. This imbalance can be detrimental to the learning process because the process is biased in favor of the dominant classes. To address the imbalance, use one or more of these complementary techniques: add more data, oversample the underrepresented classes, modify the loss function, or apply data augmentation. Regardless of which approach you use, you must perform empirical analysis to determine the optimal solution for your data set. In this example, you will use data augmentation to reduce bias in the learning process.

Analyze Object Sizes and Choose Object Detector

Read all the bounding boxes and labels within the data set, and calculate the diagonal length of the bounding box.

data = readall(blds);
bboxes = vertcat(data{:,1});
labels = vertcat(data{:,2});
diagonalLength = hypot(bboxes(:,3),bboxes(:,4));

Group the object lengths by class.

G = findgroups(labels);
groupedDiagonalLength = splitapply(@(x){x},diagonalLength,G);

Visualize the distribution of object lengths for each class.

figure
classes = tbl.Label;
numClasses = numel(classes);
for i = 1:numClasses
    len = groupedDiagonalLength{i};
    x = repelem(i,numel(len),1);
    plot(x,len,"o")
    hold on
end
hold off
ylabel("Object extent (pixels)")

xticks(1:numClasses)
xticklabels(classes)

This visualization highlights the important data set attributes that help you determine which type of object detector to configure:

The object size variance within each class
The object size variance across classes

This data set has a good amount of overlap between the size ranges across classes. In addition, the size variation within each class is not very large. This means that you can train one multiclass detector to handle a range of object sizes. If the size ranges do not overlap, or if the range of object sizes differs by an order of magnitude, then it is more practical to train multiple detectors for different size ranges.

You can determine which object detector to train based on the size variance. When size variance within each class is small, use a single-scale object detector such as YOLO v2. If each class contains large variance, choose a multi-scale object detector such as YOLO v4 or SSD. Since the object sizes in this data set are within the same order of magnitude, use YOLO v2 to start. Although advanced multi-scale detectors might perform better, training them can take more time and resources than YOLO v2. Use more advanced detectors when simpler solutions do not meet your performance requirements.

Use the size distribution information to select the training image size.A fixed size enables batch processing during training. The training image size dictates how large the batch size can be based on the resource constraints of your training environment, such as GPU memory. Process larger batches of data to improve throughput and reduce training time, especially when using a GPU. However, the training image size can impact the resolution of objects if you drastically resize the original data to a smaller size.

In the following section, configure a YOLO v2 object detector using the size analysis information for this data set.

Define YOLO v2 Object Detector Architecture

Configure a YOLO v2 object detector using these steps:

Choose a pretrained detector for transfer learning.
Choose a training image size.
Select which network features to use for predicting object locations and classes.
Estimate anchor boxes from the preprocessed data used to train the object detector.

Select a pretrained Tiny YOLO v2 detector for transfer learning. Tiny YOLO v2 is a lightweight network trained on COCO [2], a large object detection data set. Transfer learning using a pretrained object detector reduces training time compared to training a network from scratch. Alternatively, you can use the larger Darknet-19 YOLO v2 pretrained detector, but consider starting with a simpler network to establish a performance baseline before experimenting with a larger network. Using the Tiny or Darknet-19 YOLO v2 pretrained detector requires the Computer Vision Toolbox™ Model for YOLO v2 Object Detection.

pretrainedDetector = yolov2ObjectDetector("tiny-yolov2-coco");

Next, choose the size of the training images for YOLO v2. When choosing the training image size, consider these size parameters:

The distribution of object sizes in the images, and the impact of resizing the images on the object sizes.
The computational resources required to batch process data at the selected size.
The minimum input size required by the network.

Determine the input size of the pretrained Tiny YOLO v2 network.

pretrainedDetector.Network.Layers(1).InputSize

ans = 1×3

   416   416     3

The size of the images within the Indoor Object Detection Dataset is [720 1024 3]. Based on your object analysis, the smallest objects are approximately 20-by-20 pixels.

To maintain a balance between accuracy and the computational cost of running the example, specify a size of [720 720 3]. This size ensures that resizing each image does not drastically affect the spatial resolution of objects in this data set. If you adapt this example for your own data set, you must change the training image size based on your data. Determining the optimal input size requires empirical analysis.

inputSize = [720 720 3];

Combine the image and bounding box datastores.

ds = combine(imds,blds);

Use transform to apply a preprocessing function that resizes images and their corresponding bounding boxes. The function also sanitizes the bounding boxes to convert them to a valid shape.

preprocessedData = transform(ds,@(data)resizeImageAndLabel(data,inputSize));

Display one of the preprocessed images and its bounding box labels to verify that the objects in the resized images still have visible features.

data = preview(preprocessedData);
I = data{1};
bbox = data{2};
label = data{3};
imshow(I)
showShape("rectangle",bbox,Label=label)

YOLO v2 is a single-scale detector because it uses features extracted from one network layer to predict the location and class of objects in the image. The feature extraction layer is an important hyperparameter for deep learning based object detectors. When selecting the feature extraction layer, choose a layer that outputs features at a spatial resolution that is suitable for the range of object sizes in the data set.

Most networks used in object detection spatially downsample features by powers of two as the data flows through the network. For example, starting from the specified input size, networks can have layers that produce feature maps downsampled spatially by 4x, 8x, 16x, and 32x. If object sizes in the data set are small (for example, less than 10-by-10 pixels), feature maps downsampled by 16x and 32x might not have sufficient spatial resolution to locate the objects precisely. Conversely, if the objects are large, feature maps downsampled by 4x or 8x might not encode enough global context for those objects.

For this data set, specify the "leaky_relu_5" layer of the Tiny YOLO v2 network, which outputs feature maps downsampled by 16x. This amount of downsampling is a good trade-off between spatial resolution and the strength of the extracted features, as features extracted further down the network encode stronger image features at the cost of spatial resolution.

featureLayer = "leaky_relu_5";

You can use the analyzeNetwork (Deep Learning Toolbox) function to visualize the Tiny YOLO v2 network and determine the name of the layer that outputs features downsampled by 16x.

Next, use estimateAnchorBoxes to estimate anchor boxes from the training data. Estimating anchor boxes from the preprocessed data enables you to get an estimate based on the selected training image size. You can use the procedure defined in the Estimate Anchor Boxes From Training Data example to determine the number of anchor boxes suitable for the data set. Based on this procedure, five anchor boxes is a good trade-off between computational cost and accuracy. As with any other hyperparameter, you must optimize the number of anchor boxes for your data using empirical analysis.

numAnchors = 5;
aboxes = estimateAnchorBoxes(preprocessedData,numAnchors);

Finally, configure the YOLO v2 network for transfer learning on seven classes with the selected training image size, and estimated anchor boxes.

pretrainedNet = pretrainedDetector.Network;
classes = {'exit','fireextinguisher','chair','clock','trashbin','screen','printer'};

detector = yolov2ObjectDetector(pretrainedNet,classes,aboxes, ...
    DetectionNetworkSource=featureLayer,InputSize= inputSize);

You can visualize the network using the analyzeNetwork (Deep Learning Toolbox) function or Deep Network Designer (Deep Learning Toolbox) app.

Prepare Training Data

Initialize the random number generator with a seed of 0 using rng, and shuffle the data set for reproducibility using the shuffle function.

rng(0);
preprocessedData = shuffle(preprocessedData);

Split the data set into training, test, and validation subsets using the subset function.

dsTrain = subset(preprocessedData,trainingIdx);
dsVal = subset(preprocessedData,validationIdx);
dsTest = subset(preprocessedData,testIdx);

Data Augmentation

Use data augmentation to improve network accuracy by randomly transforming the original data during training. Data augmentation enables you to add more variety to the training data without increasing the number of labeled training samples. Use transform to augment the training data using these steps:

Randomly flip the image and associated bounding box labels horizontally.
Randomly scale the image and associated bounding box labels.
Jitter the image color.

augmentedTrainingData = transform(dsTrain,@augmentData);

Display one of the training images and box labels.

data = read(augmentedTrainingData);
I = data{1};
bbox = data{2};
label = data{3};
imshow(I)
showShape("rectangle",bbox,Label=label)

Train YOLOv2 Object Detector

Specify the network training options using the trainingOptions (Deep Learning Toolbox) function.

opts = trainingOptions("rmsprop", ...
        InitialLearnRate=0.001, ...
        MiniBatchSize=8, ...
        MaxEpochs=10, ...
        LearnRateSchedule="piecewise", ...
        LearnRateDropPeriod=5, ...
        VerboseFrequency=30, ...
        L2Regularization=0.001, ...
        ValidationData=dsVal, ...
        ValidationFrequency=50, ...
        OutputNetwork="best-validation-loss");

These training options have been selected using Experiment Manager. For more information on using Experiment Manager for hyperparameter tuning, see Train Object Detectors in Experiment Manager.

To use the trainYOLOv2ObjectDetector function to train a YOLO v2 object detector, set doTraining is set to true.

doTraining = false;
if doTraining
    [detector,info] = trainYOLOv2ObjectDetector(augmentedTrainingData,detector,opts);
else
    detector = trainedDetector;
end

This example was verified on an NVIDIA™ GeForce RTX 3090 Ti GPU with 24 GB of memory, which required approximately 45 minutes to complete training. Training time varies depending on the hardware you use. If your GPU has less memory, you may run out of memory. To use less memory, specify a lower MiniBatchSize value when using the trainingOptions (Deep Learning Toolbox) function.

Evaluate Object Detector

Evaluate the trained object detector on test images to measure the detector performance. The Computer Vision Toolbox™ provides an object detector evaluation function (evaluateObjectDetection) to measure common metrics such as average precision and precision recall, with an option to specify the overlap, or intersection-over-union (IoU), thresholds at which to compute the metrics.

Run the detector on the test data set using the detect object function. To evaluate the detector precision across the full range of recall values, set the detection threshold to a low value to detect as many objects as possible.

detectionThreshold = 0.01;
results = detect(detector,dsTest,MiniBatchSize=8,Threshold=detectionThreshold);

Compute Metrics at Specified Overlap Thresholds

Compute the object detection metrics at specified overlap, or IoU, thresholds using the evaluateObjectDetection function. The overlap threshold defines the amount of overlap required between a predicted bounding box and a ground truth bounding box for the predicted bounding box to count as a true positive. For example, an overlap threshold of 0.5 considers an overlap of 50% between boxes as a correct match, while an overlap threshold of 0.9 is stricter and requires the predicted bounding box to almost exactly coincide with the ground truth bounding box. Specify three overlap thresholds at which to compute metrics using the iouThresholds variable.

iouThresholds = [0.5 0.75 0.9];
metrics = evaluateObjectDetection(results,dsTest,iouThresholds);

Evaluate Object Detection Metrics Summary

Evaluate the summarized detector performance at the overall dataset level and at the individual class level using the summarize object function.

[datasetSummary,classSummary] = summarize(metrics)

datasetSummary=1×5 table
    NumObjects    mAPOverlapAvg    mAP0.5     mAP0.75     mAP0.9 
    __________    _____________    _______    _______    ________

       397           0.42971       0.77532    0.4759     0.037926

classSummary=7×5 table
                        NumObjects    APOverlapAvg     AP0.5     AP0.75      AP0.9  
                        __________    ____________    _______    _______    ________

    exit                    42          0.54586       0.97619      0.631    0.030382
    fireextinguisher       123          0.62289       0.98041    0.80168    0.086584
    chair                  167          0.58042       0.93661    0.72323    0.081403
    clock                   26          0.54996       0.96154    0.62122    0.067113
    trashbin                20           0.3004       0.79942    0.10177           0
    screen                  12          0.14779       0.27671    0.16667           0
    printer                  7          0.26068       0.49634    0.28571           0

By default, the summarize object function returns summarized data set and class metrics at all overlap thresholds. Detections with a higher IoU value correspond to a lower average precision (AP) or mean average precision (mAP) value. The AP metric provides a single number that incorporates the ability of the detector to make correct classifications (precision) and the ability of the detector to find all relevant objects (recall). The mAP is the average of the AP calculated for all the classes..

To evaluate the detector performance at the data set level, consider the mAP averaged over all overlap thresholds, returned in the mAPOverlapAvg column of the datasetSummary output. At the overlap threshold of 0.5, the mAP is 0.77, which indicates that the detector is able to find most objects without making too many spurious predictions. At higher overlap thresholds, such as 0.75 and 0.9, the mAP is lower since the matching condition between predicted and ground truth boxes is now stricter.

Similarly, to evaluate the detector performance at the class level, consider the mAP averaged over all IoU thresholds, returned in the mAPOverlapAvg column of the classSummary output.

Compute Average Precision

Compute the AP at each of the specified overlap thresholds for all classes using the averagePrecision object function. To visualize how the AP value at the specified thresholds values varies across all the classes in the data set, plot a bar plot.

figure
classAP = averagePrecision(metrics);
bar(classAP)
xticklabels(metrics.ClassNames)
ylabel("AP")
legend(string(iouThresholds))

The plot demonstrates that the detector performed poorly on three classes (printer, screen, and trash bin), which had fewer samples compared to the other classes in the training data. The detector performance also degraded at higher overlap thresholds. To improve performance, consider the class imbalance problem identified in the Analyze Class Distribution section. To address class imbalance, add more images that contain the underrepresented classes or replicate images with these classes and use data augmentation.

Compute Precision and Recall Metrics

Compute the precision and recall metrics using the precisionRecall object function. Plot the precision-recall (PR) curve and the detection confidence scores side-by-side. The PR curve highlights how precise a detector is at varying levels of recall for each class. By plotting the detector scores next to the PR curve, you can choose a detection threshold that achieves the precision and recall you require for your application.

Precision and Recall for a Single Class

Select a class, extract the precision and recall metrics for the class at the specified overlap thresholds, and plot the PR curves.

classes = metrics.ClassNames;
class = classes(3);

% Extract precision and recall values.
[precision,recall,scores] = precisionRecall(metrics,ClassName=class);

% Plot precision-recall curves.
figure
tiledlayout(1,3)
nexttile
plot(cat(1,recall{:})',cat(1,precision{:})')
ylim([0 1])
xlim([0 1])
xlabel("Recall")
ylabel("Precision")
grid on
axis square
title(class + " Precision/Recall")
legend(string(iouThresholds) + " IoU",Location="southoutside")

Plot the confidence scores for precision and recall at the specified overlap thresholds, to the right of the PR curve.

nexttile
plot(scores{:},cat(1,recall{:})')
ylim([0 1])
xlim([0 1])
ylabel("Recall")
xlabel("Score")
grid on
axis square
title(class + " Recall/Scores")
legend(string(iouThresholds) + " IoU",Location="southoutside")

nexttile
plot(scores{:},cat(1,precision{:})')
ylim([0 1])
xlim([0 1])
ylabel("Precision")
xlabel("Score")
grid on
axis square
title(class + " Precision/Scores")
legend(string(iouThresholds) + " IoU",Location="southoutside")

Fine-tuning the detection threshold enables you to trade precision for recall. As the value of the detection score increases, the corresponding precision also increases at the cost of reduced recall, as shown Precision/Scores plot and the Recall/Scores plots.

Select a detection threshold that produces the precision-recall characteristics best suited for your application. For example, at an overlap threshold of 0.5, specify the detection threshold as 0.4 to tune the precision to 0.9 at a recall level of 0.9 for the chair class. Before choosing a final detection threshold for your object detector, analyze the PR curves for all the classes, because the precision and recall characteristics might vary for each class.

Precision and Recall at a Single Overlap Threshold

Select an overlap threshold, extract the precision and recall metrics for a set of selected classes at the specified overlap threshold, and then plot the PR curves. To determine a final detection score threshold, use these plots to analyze the performance of a selected detection score threshold across multiple classes.

overlapThresholds = metrics.OverlapThreshold;
iou = overlapThresholds(1);
selectedClasses = ["chair","clock","trashbin"];

% Extract precision and recall values
[precision,recall,scores] = precisionRecall(metrics,OverlapThreshold=iou,ClassName=selectedClasses);

% Plot precision-recall curves.
figure
tiledlayout(1,3)
nexttile
% plot P-R curves for all classes at that IoU
for c = 1:length(selectedClasses)
    plot(recall{c},precision{c})
    hold on
end
ylim([0 1])
xlim([0 1])
xlabel("Recall")
ylabel("Precision")
grid on
axis square
title("Precision/Recall")
legend(selectedClasses,Location="southoutside")

Plot the confidence scores for precision and recall for the specified classes, to the right of the PR curve.

nexttile
for c = 1:length(selectedClasses)
    plot(scores{c},recall{c})
    hold on
end
ylim([0 1])
xlim([0 1])
ylabel("Recall")
xlabel("Score")
grid on
axis square
title("Recall/Scores")
legend(selectedClasses,Location="southoutside")

nexttile
for c = 1:length(selectedClasses)
    plot(scores{c},precision{c})
    hold on
end
ylim([0 1])
xlim([0 1])
ylabel("Precision")
xlabel("Score")
grid on
axis square
title("Precision/Scores")
legend(selectedClasses,Location="southoutside")

The figure shows the performance of the selected classes at each precision and recall value, and can help determine the detector threshold that results in sufficiently high precision while maintaining a minimum recall of 0.8. In this case, a detection score threshold of 0.5 is suitable for the chair and clock classes, but causes the recall for the trash bin class to fall below the desired limit, which you can see in the Recall/Scores plot. A lower threshold of 0.4 satisfies these conditions for the selected classes.

Evaluate Detector Errors Using Confusion Matrix

The confusion matrix enables you to quantify how well the object detector performs across different classes by providing a detailed breakdown of the detection errors. Investigate the types of classification errors made by the detector at a selected detection score threshold by using the confusionMatrix object function. Use the detection score threshold you determine from the precision recall analysis to discard predictions below that threshold value.

iou = overlapThresholds(1);
detectionThresh = 0.4;

% Compute the confusion matrix at a specified score and IoU threshold
[confMat,confusionClassNames] = confusionMatrix(metrics,scoreThreshold=detectionThresh,overlapThreshold=iou);

% Display the confusion matrix as a confusion chart
figure
confusionchart(confMat{1},confusionClassNames)

The confusion chart shows misclassification errors on the off-diagonal positions, and the "unmatched" row and column values correspond to the predictions and ground truth bounding boxes that were not matched correctly at the specified overlap threshold.

The confusion chart result plotted in the image above shows that the detector does not confuse two foreground classes at all. Rather, detection errors occur when the detector is not able to detect an object (false negative) or detects an object where it doesn't exist (false positive). For example, ten objects of the screen class are missed, creating ten false negatives. For the chair class, 56 detections are classified as chairs but are actually background regions, creating 56 false positives.

The confusion matrix results agree with the earlier analysis of per-class average precision, demonstrating that the detector performs poorly on three classes (printer, screen, and trash bin), which have fewer training samples compared to the other classes.

Evaluate Object Size Impact on Detector Performance

Investigate the impact of object size on detector performance by using the metricsByArea object function, which computes detector metrics for specific object size ranges. You can define the object size range based on a predefined set of size ranges for your application, or use the estimated anchor boxes as in this example. The anchor box estimation method automatically clusters the object sizes and provides a set of size ranges based on the input data.

Extract the anchor boxes from the detector, calculate their areas, and sort the areas.

areas = prod(detector.AnchorBoxes,2);
areas = sort(areas);

Define area range limits using the calculated areas. The upper limit for the last range is set to three times the size of the largest area, which is sufficient for the objects in this data set.

lowerLimit = [0; areas];
upperLimit = [areas; 3*areas(end)];
areaRanges = [lowerLimit upperLimit]

areaRanges = 6×2

           0        2774
        2774        9177
        9177       15916
       15916       47799
       47799      124716
      124716      374148

Evaluate the object detection metrics across the defined size ranges for the chair class by using the metricsByArea function. You can specify other class names to evaluate the object detection metrics for those classes interactively.

classes = string(detector.ClassNames);
areaMetrics = metricsByArea(metrics,areaRanges,ClassName=classes(1))

areaMetrics=6×6 table
           AreaRange            NumObjects    APOverlapAvg         AP            Precision           Recall     
    ________________________    __________    ____________    ____________    _______________    _______________

             0          2774        15          0.48468       {3×1 double}    {3×2321 double}    {3×2321 double}
          2774          9177        18           0.5569       {3×1 double}    {3×652  double}    {3×652  double}
          9177         15916         5          0.64667       {3×1 double}    {3×123  double}    {3×123  double}
         15916         47799         4          0.66667       {3×1 double}    {3×159  double}    {3×159  double}
         47799    1.2472e+05         0                0       {3×1 double}    {3×30   double}    {3×30   double}
    1.2472e+05    3.7415e+05         0              NaN       {3×1 double}    {3×1    double}    {3×1    double}

The NumObjects column shows how many objects in the test data set fall within the area range. Although the detector performed well on the chair class overall, there is a size range where the detector has low average precision compared to the other size ranges. The range where the detector does not perform well has only 11 samples. To improve the performance in this size range, add more samples of this size to the training data, or use data augmentation to create more samples across the set of size ranges.

You can examine the other classes for more insight into how to improve detector performance.

Deployment

Once the detector has been trained and evaluated, you can optionally generate code and deploy the yolov2ObjectDetector using GPU Coder™. For more information, see Code Generation for Object Detection by Using YOLO v2 (GPU Coder) example.

Summary

This example shows how to train and evaluate a multiclass object detector. When adapting this example to your own data, carefully assess the object class and size distribution in your data set. Your data might require using different hyperparameters or a different object detector, such as YOLO v4 or YOLOX, for optimal results.

Supporting Functions

function B = augmentData(A)
% Apply random horizontal flipping, and random X/Y scaling, and jitter image color.
% The function clips boxes scaled outside the bounds if the overlap is above 0.25.
B = cell(size(A));

I = A{1};
sz = size(I);
if numel(sz)==3 && sz(3) == 3
    I = jitterColorHSV(I, ...
        Contrast=0.2, ...
        Hue=0, ...
        Saturation=0.1, ...
        Brightness=0.2);
end

% Randomly flip and scale image.
tform = randomAffine2d(XReflection=true,Scale=[1 1.1]);  
rout = affineOutputView(sz,tform,BoundsStyle="CenterOutput");    
B{1} = imwarp(I,tform,OutputView=rout);

% Sanitize boxes, if needed. This helper function is attached to the example as a
% supporting file. Open the example in MATLAB to use this function.
A{2} = helperSanitizeBoxes(A{2});
    
% Apply same transform to boxes.
[B{2},indices] = bboxwarp(A{2},tform,rout,OverlapThreshold=0.25);    
B{3} = A{3}(indices);
    
% Return original data only when all boxes have been removed by warping.
if isempty(indices)
    B = A;
end
end

function data = resizeImageAndLabel(data,targetSize)
% Resize the images, and scale the corresponding bounding boxes.

    scale = (targetSize(1:2))./size(data{1},[1 2]);
    data{1} = imresize(data{1},targetSize(1:2));
    data{2} = bboxresize(data{2},scale);

    data{2} = floor(data{2});
    imageSize = targetSize(1:2);
    boxes = data{2};
    % Set boxes with negative values to have value 1.
    boxes(boxes <= 0) = 1;
    
    % Validate if bounding box in within image boundary.
    boxes(:,3) = min(boxes(:,3),imageSize(2) - boxes(:,1) - 1);
    boxes(:,4) = min(boxes(:,4),imageSize(1) - boxes(:,2) - 1);
    
    data{2} = boxes; 

end

References

[1] Adhikari, Bishwo; Peltomaki, Jukka; Huttunen, Heikki. (2019). Indoor Object Detection Dataset [Data set]. 7th European Workshop on Visual Information Processing 2018 (EUVIP), Tampere, Finland.

[2] Lin, Tsung-Yi, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. “Microsoft COCO: Common Objects in Context,” May 1, 2014. https://arxiv.org/abs/1405.0312v3.