Clear Filters
Clear Filters

DataStore setup when using trainFaste​rRCNNObjec​tDetector for multiclass bbox detection problem

4 views (last 30 days)
I have read through the documentation and wish to try to do a simple example where I detect and classify a few simple objects in images. There are three classes for my problem and there can be multiple instances of each class within a single image.
The example MATHWORKS supplies for trainFasterRCNNObjectDetector is of a single class and they have a table that they train on that looks like this.
imageFilename vehicle
____________________ ________________
{'vehicles/image_00001.jpg'} {[126 78 20 16]}
{'vehicles/image_00002.jpg'} {[100 72 35 26]}
The trainFasterRCNNObjectDetector documentation does say "You can train a Faster R-CNN detector to detect multiple object classes."
The documentation says that the trainingData input variable can be a datastore or a table. It also says " When the output contains three columns, the second column must contain the bounding boxes, and the third column must contain the labels. In this case, the first column can contain any type of data. For example, the first column can contain images or point cloud data."
I have a large amount of data so I am going to use a data store. I constructed a data store that uses a table that looks like this.
File_Location bboxs categories
___________________ __________________________ __________________
{'F:\COCO\val2017\val2017\000000397133.jpg'} { 2×4 double } { 2×1 categorical}
{'F:\COCO\val2017\val2017\000000252219.jpg'} { 3×4 double,} { 3×1 categorical}
{'F:\COCO\val2017\val2017\000000087038.jpg'} {14×4 double } {14×1 categorical}
{'F:\COCO\val2017\val2017\000000480985.jpg'} { 8×4 double } { 8×1 categorical}
So the table has 3 columns. First column is the file location, second column is a collection of bounding boxes, third column are the item labels associated with the bounding boxes. For example, looking at row one, the image 000000397133.jpg has two bounding boxes given by the 2x4 double entry and they are labeled by the { 2×1 categorical} entry. I believe this is a valid table based upon the documentation which says
"The first column must be images.
The second column must be M-by-4 matrices of bounding boxes of the form [x, y, width, height], where [x,y] represent the top-left coordinates of the bounding box.
The third column must be a cell array that contains M-by-1 categorical vectors containing object class names. All categorical data returned by the datastore must contain the same categories."
From my thinking the value M can change as we go from row to row due to their being different instances of the objects in each image. [However, that last sentence in the documentation is confusing to me, the one that says "All categorical data returned by the datastore must contain the same categories" What does that mean?
Assuming my table is OK, I create an AugmentedImageDataStore with the table using,
dataStore = augmentedImageDatastore(inputSize,dataTable,ColorPreprocessing="gray2rgb")
Here is where things go wrong. If I set
dataStore.MiniBatchSize = 1;
and do a read of the dataStore
cpy = copy(dataStore);
reset(cpy);
sampleData = read(cpy);
I get
input response
_________________ ____________
{224×224×3 uint8} {1×1×2 cell}
What has happened is we have the image as the input, and the response is the bbox and the labels packed into a {1x1x2 cell}. Why is it not returning three columns? trainFasterRCNNObjectDetector is very unhappy with this output from the datastore read. It is expecting to see three values returned as it looks for the labels to be in the third column. Deep in the bowels of checkGroundTruthDatastore.m there is a check
% Check whether data has enough columns for labels
ncols = size(sampleData,2);
if ~any(ncols == 3)
error( message('vision:ObjectDetector:readOutputNotCellTable'));
end
hasLabels = true;
Which thows an error when I try to use my dataStore in a call to trainFasterRCNNObjectDetector
Error using vision.internal.inputValidation.checkGroundTruthDatastore
The read method of the training input datastore must return an M-by-3
cell or table.
So I am confused why my datastore has squashed the labels and the bounding boxes together.
Another odd thing, likely related, is that if I increase dataStore.MiniBatchSize to something larger than 1, I get an error when doing a read from the dataStore. For example,
dataStore.MiniBatchSize = 8;
cpy = copy(dataStore);
reset(cpy);
sampleData = read(cpy);
Throws an error:
Error using table
All table variables must have the same number of rows.
Error in augmentedImageDatastore>datastoreDataToTable (line 778)
data = table(input,response);
Error in augmentedImageDatastore/read (line 317)
[data,info] = datastoreDataToTable(input,info);
This is a more fundamental error and clearly related to how I have constructed my table.
So my question is, what is it that I am misunderstanding and doing wrong? Thank you for the responses!
Michael
  1 Comment
Michael Vrhel
Michael Vrhel on 27 May 2022
% Sample code showing the issue with the data store read
% Create a simple table for data store as described for multi-class object
% classification documentation for trainFasterRCNNObjectDetector. See
% the input description for the input argument
% trainingData — Labeled ground truth datastore | table
% Per the documentation the data is
% "The first column must be images.
% The second column must be M-by-4 matrices of bounding boxes of the form
% [x, y, width, height], where [x,y] represent the top-left coordinates
% of the bounding box.
% The third column must be a cell array that contains M-by-1 categorical
% vectors containing object class names. All categorical data returned by
% the datastore must contain the same categories."
% So let me construct such an object
clearvars
inputSize = [224,224,3];
% Use 6 images in the database
numImages = 6;
% Lets say I have 3 classes consisting of person, dog, and cat, and each
% class could occur multiple times on each image. Just use one image
% that comes with MATLAB for the example.
peppers = char(fullfile(matlabroot,"toolbox","matlab","imagesci","peppers.png"));
% The bounding boxes and labels
images = cell(numImages, 1);
bboxes = cell(numImages, 1);
labels = cell(numImages, 1);
% First image has two objects
images{1} = peppers;
bboxes{1} = ...
[388.6600 109.4100 0 62.1600
69.9200 277.6200 262.8100 36.7700];
labels{1} = {"person"; "dog"};
% Second image has three objects
images{2} = peppers;
bboxes{2} = ...
[326.2800 197.2500 121.9400 171.2700
174.5600 9.7900 226.4500 123.6600
71.2400 167.0600 510.4400 215.7600];
labels{2} = {"person"; "cat"; "dog"};
% Third image has 14 objects
images{3} = peppers;
bboxes{3} = ...
[226.0400 28.2200 239.7200 17.1200
229.3100 51.1200 225.3800 34.9700
11.5900 98.4000 10.6400 204.1400
30.4100 234.2800 33.0600 229.0200
257.8500 19.5200 167.0200 7.3300
224.4800 46.4600 234.0000 34.9600
44.1300 326.8600 15.7800 195.3200
97.0000 223.4600 37.4600 228.0600
68.1800 13.1100 209.6800 10.6500
238.1900 38.6700 231.0800 37.1800
16.1800 345.4100 9.1500 1.0000
42.8800 173.4100 34.5300 190.0000
79.1600 72.9400 408.2900 638.0000
232.2600 185.4100 231.2500 101.0000];
labels{3} ={"person";"person";"cat";"person";"person";"person";"dog";...
"person";"person";"person";"person";"cat";"person";"person"};
% Fourth image has eight objects
images{4} = peppers;
bboxes{4} = ...
[47.1900 320.1600 266.3700 290.0300
296.1200 275.0500 293.1300 299.7900
28.3000 27.0600 23.9700 15.2400
33.1700 104.5300 88.9600 19.8700
32.7500 10.0500 369.5000 302.2000
298.9400 302.9600 278.5200 298.2200
16.5200 13.7000 5.5000 12.7300
29.2200 25.6900 45.6500 18.7300];
labels{4} ={"person";"person";"cat";"person";"person";"person";"dog";...
"person"};
% Fifth image has 13 objects
images{5} = peppers;
bboxes{5} = ...
[322.5700 270.5900 18.8100 17.7800
290.8100 107.4700 29.0600 556.2800
65.0900 129.5400 120.8600 309.3600
127.6200 259.2700 271.1200 32.0900
273.6400 281.0000 16.2800 46.1200
292.1100 26.0200 25.4000 494.9300
50.9500 35.1000 257.1400 276.5400
129.6500 281.0600 281.3200 92.0800
1.9200 276.4700 12.8000 126.0700
266.8800 15.5100 42.0200 300.0000
114.4500 41.3200 269.0000 280.0000
155.7900 104.7300 274.4500 25.0000
424.1200 267.5500 8.8900 54.0000];
labels{5} ={"person";"cat";"person";"person";"person";"dog";...
"person";"person";"person";"person";"cat";"person";"person"};
% Sixth image has 1 object
images{6} = peppers;
bboxes{6} = ...
[210.2700 143.2900 219.8200 276.1500];
labels{6} = {"cat"};
% Create the table
dataValTable = table(Size=[numImages 3], ...
VariableTypes=["string" "cell" "categorical"], ...
VariableNames=["data" "boxes" "labels"]);
dataValTable.data = images;
dataValTable.boxes = bboxes;
dataValTable.labels = labels;
% Create the data store from the table with a minibatch size of 2
valDataStore = augmentedImageDatastore(inputSize,dataValTable);
valDataStore.MiniBatchSize = 2;
% Perform a test read
cpy = copy(valDataStore);
reset(cpy);
sampleData = read(cpy);
% This read fails. In datastoreDataToTable we see
%
%function [data,info] = datastoreDataToTable(input,info)
%
%response = info.Response;
%info = rmfield(info,'Response');
%if isempty(response)
% data = table(input);
%else
% response = convert4DArrayToCell(response);
% data = table(input,response);
%end
% What happens is that the input to data = table(input, response) has
% a 2x1 cell array for the 2 images for the minibatch. The responses
% though are of size 1 x 1 x 2 x 2 cell array. That is not going to work
% for a table creation since the number of rows are different. It is like
% the response data is transposed (permuted) from what it should be. I
% have created it though in the manner described in the documentation. Any
% help would be appreciated

Sign in to comment.

Answers (1)

Vidip
Vidip on 23 Jan 2024
I understand that you are facing issues with the structure of datastore which is required for training ‘trainFasterRCNNObjectDetector’ for multiclass bounding box.
The error message you're receiving indicates that the ‘augmentedImageDatastore’ is not returning data in the format expected by trainFasterRCNNObjectDetector. Specifically, it expects the data to be in an M-by-3 format, where each row corresponds to an image, its associated bounding boxes, and its labels.
The sentence "All categorical data returned by the datastore must contain the same categories" means that the set of possible class labels (categories) should be consistent across all data. This does not mean that each image must have instances of all classes; rather, it means that the datastore should be aware of all potential classes that could be present in any image.
To resolve this issue, you may need to use a custom datastore that properly handles the data format for object detection. MATLAB provides ‘boxLabelDatastore’ to handle the bounding box and label data, which you can combine with an ‘imageDatastore’ using combine function to create a suitable datastore for ‘trainFasterRCNNObjectDetector’.
For further information, refer to the documentation link below:

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!