why doesn't overfitting happen ?

6 views (last 30 days)
Fereshteh....
Fereshteh.... on 20 Jul 2015
Edited: Fereshteh.... on 30 Jul 2015
I wrote a code for classification, using a” patternnet “neural network to classify a dataset which is 2D two spiral dataset, all my data were 40 in two classes each class population was 20, I manually separated it in two parts, part one training and validation and part two testing, so 32 out of 40 is associated in training and validation phase and 8 for testing, the separation process is randomly, I give data of part one to the net 90% for training, 10% for validation.
net = patternnet(70);
net.divideParam.trainRatio=0.9
net.divideParam.valRatio=0.1
net.divideParam.testRatio=0
then I use the trained net for testing the data of part two, my problem is no matter how many neurons I use, the classification result is perfect , over fitting doesn’t happen , even if I use 70 neurons , how such thing is possible?
I should work with neural networks which are capable of over fitting easily, I use small size data sets which are distributed in a complex pattern, like spirals , or banana-shaped data sets , I would like to have them in higher dimension space but unfortunately I couldn’t generate them in higher spaces .

Accepted Answer

Greg Heath
Greg Heath on 21 Jul 2015
Edited: Greg Heath on 21 Jul 2015
You misunderstand. The sin is not overfitting. It is overtraining a net that is overfit.
Overfitting happens when there are more unknown weights, Nw, to estimate than there are training equations, Ntrneq.
This typically means that Nw-Ntrneq weights can have ARBITRARY values and the remaining Ntrneq weights can be chosen to EXACTLY solve the training equations.
The problem with this is that, for serious work, nets must be designed to perform well on unseen operational data. However, that usually does not happen when some of the weights are arbitrary.
There are three methods you can use separately or in combination with MATLAB to mitigate this problem: Non-overfitting, Validation Stopping and Regularization:
1. Non-overfitting: Make sure that Ntrneq >= Nw
2. Validation stopping (a default data division that prevents overtraining an overfit net):
total = design + test
design = training + validation
non-design = test
total = training + non-training
non-training = validation + testing
Use a non-training validation design set that stops training set weight estimation when performance on the non-training validation design subset fails to increase for a fixed number (default = 6) of epochs.
3. Regularization: Minimize the weighted sum of mean-square-error and mean-squared-weights via either
net.performParam.regularization = a ;% 0 < a < 1
or
net.trainFcn = 'trainbr';
Details can be obtained by searching for
a. Documentation on Validation stopping and regularization
b. Relevant posts in both the NEWSGROUP and ANSWERS. Suitable search words are overfitting, overtraining, regularization, trainbr and validation stopping
Hope this helps
Thank you for formally accepting my answer
Greg
  9 Comments
Fereshteh....
Fereshteh.... on 29 Jul 2015
Edited: Fereshteh.... on 29 Jul 2015
thanks again Greg :)
>I do not have time for a detailed answer now. However, it looks like you are still postulating that overfitting (defined as Nw > Ntrneq), itself is the big problem. It isn't. The problem is dealing with a training set that does not adequately characterize the salient properties of nontraining (seen and unseen) data. This happens most often with a small training set.
No, I have been saying that I am looking for such data sets which cannot adequately characterize, but I couldn’t find them, that is why I ended up using two spirals (assuming that its complex pattern of distribution would make it hard to be characterized but unfortunate here, by your help, I found my assumption was wrong), and also I am looking for high dimensional artificial data sets, again with this assumption that they will be hard to be characterized.
>You are using training data to design a net which has to work well on nontraining data. If training stops because of the default Valstop condition, and the resulting design (trn+val) performance is unsatisfactory, then discard the design, reinitialize the weights and try again. There is a high probability that training beyond the default Valstop condition is a waste of time.
So you agree that we should discard nets which are not trained well, I wonder how you do this if you don’t write a while loop , moreover how do you do cross-validation ? I suppose the only way is writing a “while loop” to check, discard, or accept the nets!
>>1. We have small size of data sets (I call this data sets actual data sets) => these number of data sets are not enough for a large neural network => overfitting happens => we decide to generate synthetic data
>No, No, No! Overfitting happens when you have many more weights than necessary. For stable solutions that generalize well to nontraining data you need Ntrneq >> Nw. That is why you are synthesizing more data.
I guess you misunderstood me, by “large” neural network I meant we deliberately make a situation where we get to Ntrneq >> Nw, we make sure our NW is greatly bigger than Ntrneq , as you saw in this code I wrote my code in way that this purpose happened Nw=281 while my NTrneq=29.
>For classification data set examples use the commands doc nndatasets
I have used them, such as thyroid or breast cancer data sets but they are real data sets I want ARTIFICIAL datasets.
Fereshteh....
Fereshteh.... on 30 Jul 2015
Edited: Fereshteh.... on 30 Jul 2015
Greg one further question, I said to myself okay if my datasets is able to be adequately characterized let’s try the data sets which are in matlab archive, I used “nnstart” to lunch the pattern recognition app, I loaded all example data sets one by one there I imported them, at first, I trained the net with 10 neurons in hidden layer and saw the plot confusion of data set, I wrote down the “test“confusion matrix results.
Then I changed the number of neurons to 1000, by my calculation, in all those data sets except for thyroid data sets, this number of neurons will lead NW becomes greater than Ntrneq, notably, and here we should expect overfitting happens and make our test result bad, but when I checked the plot confusion , and wrote down test confusion matrix result the results were not bad at all , and they were actually good!
I know, you have told me several times that “When overfitting occurs, there is the POTENTIAL for large error when the nontraining data is sufficiently different from the training data.”, my question is does this means in all those data sets, nontraining data is not sufficiently different from training data sets ? I found it soooo odd, if yes, so where I can find a data set which its nontraining data set is sufficiently different from its training dada sets, I am getting so frustrated by neural network :/

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!