Clear Filters
Clear Filters

How can I make my neural network support any size of image input?

14 views (last 30 days)
There are three levels of code writing to do a vision-related deep learning task.
Highest Level: complete layerGraph and train with trainNetwork function.
Middle level: build a layerGraph without loss. Instead, calculate loss and gradient in an eval function. One can also specify customed learning rate schedule. This level allows some customization, and still exploits the easy-to-use highest level features.
Lowest level: this level has no concept of layer. Coders have to take care of the parameters themself. It's really messy and time-consuming to build and train a network in this way.
My question is: Highest level and middle level all requires a certain size of input, i.e, an imageInputLayer. But imageInputLayer only supports for fixed image size. I do not want to trouble myself with lowest level coding. So how could I make my NN take inputs of any size?

Accepted Answer

Ryan Comeau
Ryan Comeau on 10 May 2020
I wish it was possible to just dump images of multiple sizes as well. Unfortunately though each image would yield a differen size of convultion maps and a different number. How then would it make sense to pass these into a full connection layer and curve fit these convolutions maps? It would be like sorting oranges by size, but half of your input oranges are apples, it would be a strange task.
There is however a solution to this problem. Your input images need to be scaled to the size of your network input size. This is one of the preprocessing steps that is important. Here is some code that could resize all of your images:
number_rows=200; %depending on the input size of your network
number_cols=300; %depending on input size.
rescaled_image=imresize(image,[number_rows number_cols])
It may seem unintuitive, but computers don't see the same way we do and the scale of things doesn't always matter.
Hope this helps,
  1 Comment
Jacques Boutet de Monvel
Jacques Boutet de Monvel on 31 May 2022
If it is true that there is no way to feed an image of unprescribed size to a fully convolutional network, this is too bad! It misses one of the most attractive and elegant features of FCNs: the ability to process an image of any size in a seamless - translation invariant way at prediction time, while the network has been trained on much smaller image patches. This is very useful, and even crucial for segmentation applications.
Why not implement this feature at least to give the choice to users? This is one thing that could (still) make matconvnet more attractive than the matlab DL toolbox, despite all its impressive features.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!