MATLAB Answers

My convolutional network doesn`t learn properly. Can someone explain why and give me some advice?

9 views (last 30 days)
Joaquin Carrasco
Joaquin Carrasco on 4 Jan 2018
Answered: lukas mattheuwsen on 25 May 2020 at 14:51
Hi everyone, I`m pretty new at neural network, and I`m trying to make learn my cnn. From a set of about 15000 images i made a training set (70% using random images) and 30% as validation set. The problem is I usually get this kind of graph:
As you can see, the validation star to diverge and it can`t learn anything more. I tried various training options, actually I`m using Max Epochs at 70, validation frequency at 100 (tried with 30 and 50), initial learn rate set at 0.01 with a learn rate drop factor of 0.1; the learn rate drop period is about 15 (tried with different settings; 3, 5, 7, 10....) and my minibatch size is 60. I tried different layer configurations but I`m showing you the actual one. The num_clases value is 120 because it`s the number of different classes I`m trying to classify. Anyone can give me a hint what I`m doing wrong or which changes should I do to make it work properly?
layers = [
imageInputLayer([128 128 3])
convolution2dLayer(5,64,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(5,128,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,64,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,128,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,256,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,512,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(1,64,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(1,128,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(1,256,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(1,512,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
dropoutLayer
fullyConnectedLayer(2000)
dropoutLayer
fullyConnectedLayer(2000)
fullyConnectedLayer(num_clases)
softmaxLayer
classificationLayer];

  0 Comments

Sign in to comment.

Answers (2)

Don Mathis
Don Mathis on 19 Feb 2019
This question is over a year old but I'll post an answer anyway: It's a classic case of overfitting.
Your training accuracy is higher than your validation accuracy because your network is learning details of the training data that don't generalize to the validation data. Similarly, the training loss os lower than the validation loss.

  0 Comments

Sign in to comment.


lukas mattheuwsen
lukas mattheuwsen on 25 May 2020 at 14:51
While previous answers suggested that this was overfitting, I think the bigger problem is to do with the network he is using.
I'm suggesting that he just came up with the network structure himself as there as a few strange things in the network. You got to understand that sometimes adding convolutional layers will help your network to understand more higher level features as it goes deeper. A good example for this is the better performing network VGG compared to Alexnet in which both use the same general structure (repeating: convolutional layers, relu layers, pooling layer) but VGG just goes deeper compared to Alexnet. However like with so many things there is a limit to how deep you can go before the results become useless. Fortunally there are ways to go deeper as they do in networks such as inception and resnet. It is possible that this network is just to big. I would suggest trying a smaller network or maybe to try and see if a pre-trained network with a well known architecture may do the job as you won't need to retrain the whole model (especially the firt few layers which learn the basic features such as edges, lines, corners for which you need a lot a new data when you want to retrain from scratch)
Additionally this network just doesn't really make sense to me. It actually looks like you putted some convolutional layers, pooling layers after eachother 10 times in a row as this is a nice round number. You actually have to look at what the network is doing. You start with a 128x128x3 image which gets convolved by the different filters. In general this reduces the width and the height of the image but increases the depth. After 1 convolutional bundle (convolutional layer, relu layer and pooling layer) you get a feature map of 63x63x64 which is nothing unusual. However, in you network you keep convolving. After 4 convolutional bundles you have a 7x7x256 faeture map which in most other networks than continues with fully connected layers that learn from this feature map in order to make a prediction of the class. However you continue to convolve even further until all of a sudden you have a 1x1x512 vector which actually could maybe be used as input to you fully connected layers but instead you convolve even further after which this does not really make any sense. At a certain point you only have 64 features left in your feature map after layer 29. After these layers you enlarge the feature map until you have the 1x1x512 feature map to input in the fully connected layers. This make no sense to go from a 7x7x256 fature map to eventually a 1x1x64 feature map where you almost threw out all your usefull features after which you create aroud 450 feature from nothring as you enlarge the featuremap again. With this approach I just think that this network is doomed to never work and would suggest reducing the size of the network.
Just putting some convolutional layers after each other will rarly result in something if you go as deep as this example.

  0 Comments

Sign in to comment.

Sign in to answer this question.