Faster R-CNN image input size & validation
9 views (last 30 days)
Show older comments
Hi there,
I'm trying to train Faster R-CNN on my own dataset. My images are 440x440x3 and the objects that I'm trying to detect are fairly small (16x16x3).
I based my code on this example (https://uk.mathworks.com/help/vision/examples/object-detection-using-faster-r-cnn-deep-learning.html). I don't really understand why the image input layer takes images of a size which is much smaller than the original image (in the example the input is 32x32x3 while the whole image is about 200x200x3). It just seems a bit different than in the original paper by Girshick (https://arxiv.org/abs/1506.01497) where we pass the whole image through VGG16 first and then we extract the proposals from the created set of feature maps.
The explanation in the documentation is not very clear about how the Faster R-CNN (RPN in this sense) finds those smaller patches. Other implementations that I looked at (Caffe, TensorFlow) pass the whole image through VGG16 (or ZF) which has the same input size of the whole image but MATLAB documentation says:
"Start with the imageInputLayer function, which defines the type and size of the input layer. For classification tasks, the input size is typically the size of the training images. For detection tasks, the CNN needs to analyze smaller sections of the image, so the input size must be similar in size to the smallest object in the data set. In this data set all the objects are larger than [16 16], so select an input size of [32 32]. This input size is a balance between processing time and the amount of spatial detail the CNN needs to resolve."
I also looked at the following entry on MATLAB Answers (https://uk.mathworks.com/matlabcentral/answers/332757-faster-rcnn-code-in-matlab?s_tid=answers_rc1-1_p1_MLT) but I still don't understand how the RPN finds those smaller 32x32x3 regions (as in the example) without passing the whole image through the network. Could someone give some more insight into what's going on with the RPN in this case if the ImageInputLayer takes images of size 32x32x3 instead of the size of the original image?
----------------------
And just a quick question about using the validation while training fasterRCNNObjectDetector. The documentation says (https://uk.mathworks.com/help/vision/ref/trainfasterrcnnobjectdetector.html#inputarg_options):
"trainFasterRCNNObjectDetector does not support these training options:
- The ExecutionEnvironment values: 'multi-gpu' or 'parallel'
- The Plots value: 'training-progress'
- The ValidationData, ValidationFrequency, or ValidationPatience options"
Is there any way to assess when our network is overfitting without using the validation set?
1 Comment
Keke Zhang
on 24 Jun 2019
Has anyone solved this problem? Train faster r-cnn can not use validation set.
Answers (4)
Tunai Marques
on 28 Oct 2019
Hi all,
I have been playing with Faster-RCNN in MATLAB for a couple of weeks now. The very first step is really to resize the training image to the required size of your feature extractor (e.g., usually 224x244 for Resnet50). However, if you have more processing capabilties (unfortunatelly I don't), scale it up to 448x448, and so forth.
That is extremely influential in the training process for a couple of reasons:
(1) The object you want to detect must have a considerable size when compared to the images you want to recognize it in (at least in the training phase). In other words, the object/imageSize ratio in the training image must be considerable (i.e., >0.3)
(2) The size of the anchors that are going to be trained/used in the detection cannot exceed the size of the resized training image. Therefore, if you resize (downscalling) the training images too much, your anchors will be small when used during deployment.
In my case, for example, I deal with the worst case scenario: I have to detect objects that range from 24x24 up to 2kx1k (!!!) in 3kx5k images. Therefore, if I was just to resize the original 3kx5k images and their respective bounding boxes, the small objects (i.e., 24x24) would just dissapear in the first few layers of the feature extractor (in my case, Resnet50).
That problem is not so preset in the examples they use (vehicle detection) or in the original Faster (imageNet) because in these datasets, the the object/imageSize ratio for the training images is quite high.
The solution I came up with is to do a pre-processing step first. I get my original bouding boxes, than add an "extra" of agiven percentage (50%, 200%, etc), creating images that just contain the object to be recognized and a little bit of background. I then resize these images to whatever size the feature extractor network requires (as I said, 224x224 in my case), and now the objects to be reconized, either super big or particularly small, will all occupy a considerable size in the training image (good object/imageSize ratio). That led to decent detection results.
The main drawback of this approach: my bounding boxes are very small (a maximum of 224x224), and in the original image I have some pretty gigantic objects to recognize. However, the detector does a good job in identifying sub-regions of these objects, and then you can concatenate them together using a clustering algorithm (K-NN, DBSCAN, etc.).
Hope that helps.
1 Comment
Hayat Bouchkouk
on 19 Feb 2020
Hi, I want to build a new training faster RCNN from scratch,is there anyone can give me ideas about that,plizz
numi khnax
on 7 Nov 2018
Were you able to solve your problem?
1 Comment
michel hillen
on 7 Dec 2018
I am also interested in knowing the answer to these questions. Anything you can share?
Serge Mooyman
on 9 Aug 2019
Edited: Serge Mooyman
on 20 Oct 2019
Hi, I am also puzzeled by the size the input images for Faster RCNN Object Detection should have.
For Classification CNNs it is pretty easy: the image should contain as much as possible of each example of an object class, and at least 60% of the space should be filled by it. From the original question above I get the impression that the imput images (in which there should be labeled bounding boxes around the class objects I believe) should be much smaller than the size of the biggest object and not bigger than twice (well, wide and high is four times) the size of the smalles class object. However, from my supervisor I understand that these input images for training and validation of a Faster RCNN Object Detector can be bigger that the bounding boxes/ class examples within it and contain various bounding boxes with class examples, even from different classes.
Is there anyone who can clarify this? I searched and asked a lot, but did not find clear guidelines yet.
Serge
0 Comments
Yopu
on 11 Oct 2019
I have a sneaky suspicion that it may be handled when passing the groundtruth table. There, the whole image is loaded from the image datastore. Chopping up the image into smaller fragments must be done implicitly somewhere between loading the image from the datastore and deploying the RCNN network..
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!