You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
why doesn't overfitting happen ?
6 views (last 30 days)
Show older comments
I wrote a code for classification, using a” patternnet “neural network to classify a dataset which is 2D two spiral dataset, all my data were 40 in two classes each class population was 20, I manually separated it in two parts, part one training and validation and part two testing, so 32 out of 40 is associated in training and validation phase and 8 for testing, the separation process is randomly, I give data of part one to the net 90% for training, 10% for validation.
net = patternnet(70);
net.divideParam.trainRatio=0.9
net.divideParam.valRatio=0.1
net.divideParam.testRatio=0
then I use the trained net for testing the data of part two, my problem is no matter how many neurons I use, the classification result is perfect , over fitting doesn’t happen , even if I use 70 neurons , how such thing is possible?
I should work with neural networks which are capable of over fitting easily, I use small size data sets which are distributed in a complex pattern, like spirals , or banana-shaped data sets , I would like to have them in higher dimension space but unfortunately I couldn’t generate them in higher spaces .
Accepted Answer
Greg Heath
on 21 Jul 2015
Edited: Greg Heath
on 21 Jul 2015
You misunderstand. The sin is not overfitting. It is overtraining a net that is overfit.
Overfitting happens when there are more unknown weights, Nw, to estimate than there are training equations, Ntrneq.
This typically means that Nw-Ntrneq weights can have ARBITRARY values and the remaining Ntrneq weights can be chosen to EXACTLY solve the training equations.
The problem with this is that, for serious work, nets must be designed to perform well on unseen operational data. However, that usually does not happen when some of the weights are arbitrary.
There are three methods you can use separately or in combination with MATLAB to mitigate this problem: Non-overfitting, Validation Stopping and Regularization:
1. Non-overfitting: Make sure that Ntrneq >= Nw
2. Validation stopping (a default data division that prevents overtraining an overfit net):
total = design + test
design = training + validation
non-design = test
total = training + non-training
non-training = validation + testing
Use a non-training validation design set that stops training set weight estimation when performance on the non-training validation design subset fails to increase for a fixed number (default = 6) of epochs.
3. Regularization: Minimize the weighted sum of mean-square-error and mean-squared-weights via either
net.performParam.regularization = a ;% 0 < a < 1
or
net.trainFcn = 'trainbr';
Details can be obtained by searching for
a. Documentation on Validation stopping and regularization
b. Relevant posts in both the NEWSGROUP and ANSWERS. Suitable search words are overfitting, overtraining, regularization, trainbr and validation stopping
Hope this helps
Thank you for formally accepting my answer
Greg
9 Comments
Fereshteh....
on 21 Jul 2015
Edited: Fereshteh....
on 21 Jul 2015
Greg thank you very much for the answer , I want to say that in my code I did separate design data from test data , and I did used design data for training and validation , I put a condition for error of training and validation , then I tested my test data, the thing is the population of design data is only 32 , they are 2D, and if we assume I used 70 neurons in hidden layer and have just a neuron in output layer then it means 2*70+70*1=210 is number of weights in my network , 32 compared to 210 is a small number so I suppose it should make over fitting happens ,even after testing with my testing data I even did something new I made 400 new data( which my network has never seen them) using two spiral function and used them for testing too , the result was again 100% classification for both classes , why should it work this well ?
in my thesis i want a network which is overfit due to insufficient size of data sets and then i want to generate new synthesis data which can help my network get trained correctly and overcome the overfitting , (as you surely know in many problems which we have insufficient size of data sets we cannot use neural network as a tool for classification of such data set and we have to use less powerful ways like knn) that is why I intentionally am using a small data sets for designing , i would rather work with data set with complex pattern and in high dimension but i couldn't find good artificial data sets with those traits , so i am using two spiral datasets.
Greg Heath
on 23 Jul 2015
> Greg thank you very much for the answer , I want to say that in my code I did separate design data from test data , and I did used design data for training and validation
My point is that it is totally unnecessary for you to manually separate the data. MATLAB does it automatically.
> I put a condition for error of training and validation ,
I assume you meant that you set a nondefault MSE goal for the training, but what did you mean about an error condition for validation?
... that you set a nondefault max_fail value for Validation Stopping ?
> then I tested my test data, the thing is the population of design data is only 32 , they are 2D, and if we assume I used 70 neurons in hidden layer and have just a neuron in output layer then it means 2*70+70*1=210 is number of weights in my network ,
You forgot the biases
Nw = (I+1)*H+(H+1)*O = 3*70+71*1 = 281
whereas
N = 40, Ntst = 8,
Ndes = N - Ntst = 32
Nval = round(0.1*Ndes) = 3
Ntrn = Ndes -Nval = 29
Ntrneq = Ntrn*O = 29
> a small number so I suppose it should make overfitting happens ,
No "suppose":
Regardless of anything else:
Overfitting happens when Nw > Ntrneq.
> even after testing with my testing data I even did something new I made 400 new data( which my network has never seen them) using two spiral function and used them for testing too , the result was again 100% classification for both classes , why should it work this well ?
It works well because your training data was sufficient to characterize the general distribution.
If you have a zillion, zillion data points, you only need 2 training data points if the distribution is a straight line.
> in my thesis i want a network which is overfit due to insufficient size of data sets
Sometimes it is good to think of a net as overfit if it has more weights than necessary to characterize the input/output transformation.
> and then i want to generate new synthesis data which can help my network get trained correctly and overcome the overfitting ,
The typical approach to mitigate overfitting is to use Validation Stopping and/or regularization
Regardless of size, the training data has to sufficiently characterize the important characteristics of the general data population
(Slightly off-topic: I was able to design a much better classifier for the BioData dataset by generating synthetic data to balance class sizes. https://www.mathworks.com/matlabcentral/newsreader/view_thread/247790)
> (as you surely know in many problems which we have insufficient size of data sets we cannot use neural network as a tool for classification of such data set and we have to use less powerful ways like knn)
The KNN is a neural network ! Although the more popular MLP is known to have universal approximation capabilities, so do the EBF (elliptical basis function), the specialized RBF and other configurations.
> that is why I intentionally am using a small data sets for designing , i would rather work with data set with complex pattern and in high dimension but i couldn't find good artificial data sets with those traits , so i am using two spiral datasets.
I understand what you are trying to do. However, the importance is not so much data set size, but whether or not the training data adequately characterizes the I/O relation for nontraining data.
That is the purpose of the nontraining validation design subset.
The purpose of the nondesign test set is to obtain a totally unbiased estimate of performance on nontraining (especially, unseen) data.
Hope this helps.
Greg
Fereshteh....
on 23 Jul 2015
Edited: Fereshteh....
on 23 Jul 2015
thank you very much Greg again for answering :)
>My point is that it is totally unnecessary for you to manually separate the data. MATLAB does it automatically.
I have to do it because I am trying to introduce algorithms for generating synthesis data to overcome problems of small datasets (actual data sets ) , then I want to show that these synthesis data set could improve the classification results , so I should compare the results , for having a valid compromise I have to have a same test data sets for both actual data sets and synthesis data sets , so I separated test data sets from design manually till I know what my test data set is every time I run the program and I can use it for testing my synthesis data sets too.
>I assume you meant that you set a nondefault MSE goal for the training, but what did you mean about an error condition for validation? My prof says the only time we can actually accepts that a network is correctly trained, is when the error of the training and validation is smaller than the error we have already defined before we start training! He says if a network stops training due to one of default stopping conditions of MATLAB it doesn’t mean that the net is certainly correctly trained. So he wanted me to write a “while “loop, It only stops when we reach to that defined errors for training and validation, and if that condition didn’t happen the net should be again trained, and this repeats again and again till the net reach to defined errors. So I did this .
count=0;
while count~=1 % the "count" turns from 0 to 1 when the training
%and validation's error conditions meet
%net=patternnet(10);
net = patternnet(70);
net.divideParam.trainRatio=0.9
net.divideParam.valRatio=0.1
net.divideParam.testRatio=0
net.trainParam.max_fail=30;
%LEARNING PHASE
[net,tr]=train(net,desinput,designtarget);
net.trainparam.epochs=1000;
output=net(desinput);
%calculating the mse of training data------
trout=output(:,tr.trainInd); % training outputs
trtar=designtarget(:,tr.trainInd);% taraining targets
etrain=mse(trout-trtar);% mse of training data
%calculating the mse of validation data---
valout=output(:,tr.valInd);
valtar=designtarget(:,tr.valInd);
evaladation=mse(valout-valtar);% mse of validaton data
% TESTING PHASE
if (etrain<= 1e-10 & evaladation< 1e-10) % only if these conditions meet then
%i will accept that the network is well trained
count=1;
tstoutput=net(tstinput);
etst=mse(tstoutput-testtarget)
g1c=size(find(tstoutput(1,1:tesn1)<0.5 ),2)/tesn1
% g1c=classification result of class 1
g2c=size(find(tstoutput(1,tesn1+1:tesn1+tesn2)>0.5 ) ,2)/tesn2
%g2c=classification result of class 2
tstoutput=tstoutput';
testtarget=testtarget';
designtarget=designtarget';
end
end
>Overfitting happens when Nw > Ntrneq. I have always thought when over fitting happens we should expect big test errors because actually the network has learned only the exact training data sets and it cannot recognize data which it has never seen,and the network also will have big validation error but my networks have very small validation errors! Why is that for?? How it is possible?
%calculating the mse of training data------
trout=output(:,tr.trainInd); % training outputs
trtar=designtarget(:,tr.trainInd);% taraining targets
etrain=mse(trout-trtar);% mse of training data
%calculating the mse of validation data---
valout=output(:,tr.valInd);
valtar=designtarget(:,tr.valInd);
evaladation=mse(valout-valtar);% mse of validaton data
Error of validation = 5.528418144299875e-13
Error of training = 5.376670402341206e-12
so if my network is overfit I cannot understand why it can recognize all my datasets, also , actually I thought some data sets such as two spirals and half kernels, banana shaped data sets are having complex pattern and then are not so obvious for neural networks to be classified easily.
> I understand what you are trying to do. However, the importance is not so much data set size, but whether or not the training data adequately characterizes the I/O relation for nontraining data.
I know what you mean, but my thesis is about solving problems that small size of data is the reason they cannot be classified correctly. We “assumed” that the features we have extracted are enough to make classification if we had good size of data sets.
Greg Heath
on 24 Jul 2015
>>I assume you meant that you set a nondefault MSE goal for the training, but what did you mean about an error condition for validation?
>My prof says the only time we can actually accepts that a network is correctly trained, is when the error of the training and validation is smaller than the error we have already defined before we start training! He says if a network stops training due to one of default stopping conditions of MATLAB it doesn’t mean that the net is certainly correctly trained. So he wanted me to write a “while “loop, It only stops when we reach to that defined errors for training and validation, and if that condition didn’t happen the net should be again trained, and this repeats again and again till the net reach to defined errors.
GEH1: Validation Stopping ends training when the val subset error minimizes before the training subset performance reaches it's goal. The purpose is to save time when it is clear that the net will not perform well on nontraining data.
Now your classifier keeps on training hoping that the val performance will reverse it's trend. My immediate reaction is that it is a waste of time because the number of times that val performance reverses is very small. I'm probably in the majority on that assertion. Therefore I think that you should disprove that premise by keeping track of the instances where val set performance achieves a local min and record the post-minimum behaviour.
>So I did this . count=0; while count~=1 % the "count" turns from 0 to 1 when the training %and validation's error conditions meet %net=patternnet(10); net = patternnet(70); net.divideParam.trainRatio=0.9 net.divideParam.valRatio=0.1
GEH2: probably should increase the val set size
net.divideParam.testRatio=0
net.trainParam.max_fail=30;
%LEARNING PHASE
[net,tr]=train(net,desinput,designtarget);
net.trainparam.epochs=1000;
GEH3: max epochs should be specified before the call to train
output=net(desinput);
%calculating the mse of training data------
trout=output(:,tr.trainInd); % training outputs
trtar=designtarget(:,tr.trainInd);% taraining targets
etrain=mse(trout-trtar);% mse of training data
%calculating the mse of validation data---
valout=output(:,tr.valInd);
valtar=designtarget(:,tr.valInd);
evaladation=mse(valout-valtar);% mse of validaton data
GEH4: These are already tabulated in tr.perf and tr.vperf
% TESTING PHASE
if (etrain<= 1e-10 & evaladation< 1e-10) % only if these conditions meet then
%i will accept that the network is well trained
GEH5: Better to normalize by the respective target variances, e.g., etrain<= 0.01*var(trtar,1) & evalidation<= 0.01*var(valtar,1)
count=1;
tstoutput=net(tstinput);
etst=mse(tstoutput-testtarget)
g1c=size(find(tstoutput(1,1:tesn1)<0.5 ),2)/tesn1
% g1c=classification result of class 1
g2c=size(find(tstoutput(1,tesn1+1:tesn1+tesn2)>0.5 ) ,2)/tesn2
%g2c=classification result of class 2
GEH6: Can get a cleaner calculation using VEC2IND & IND2VEC
tstoutput=tstoutput';
testtarget=testtarget';
designtarget=designtarget';
end
end
>>Overfitting happens when Nw > Ntrneq.
> I have always thought when over fitting happens we should expect big test errors because actually the network has learned only the exact training data sets and it cannot recognize data which it has never seen, and the network also will have big validation error but my networks have very small validation errors! Why is that for?? How it is possible?
GEH7: When overfitting occurs, there is the POTENTIAL for large error when the nontraining data is sufficiently different from the training data.
> %calculating the mse of training data------ trout=output(:,tr.trainInd); % training outputs trtar=designtarget(:,tr.trainInd);% taraining targets etrain=mse(trout-trtar);% mse of training data %calculating the mse of validation data--- valout=output(:,tr.valInd); valtar=designtarget(:,tr.valInd); evaladation=mse(valout-valtar);% mse of validaton data
Error of validation = 5.528418144299875e-13
Error of training = 5.376670402341206e-12
>so if my network is overfit I cannot understand why it can recognize all my datasets, also , actually I thought some data sets such as two spirals and half kernels, banana shaped data sets are having complex pattern and then are not so obvious for neural networks to be classified easily.
>> I understand what you are trying to do. However, the importance is not so much data set size, but whether or not the training data adequately characterizes the I/O relation for nontraining data.
>I know what you mean, but my thesis is about solving problems that small size of data is the reason they cannot be classified correctly. We “assumed” that the features we have extracted are enough to make classification if we had good size of data sets.
GEH8: Obviously, even though the training set was small, it was large enough to adequately characterize the salient properties of the nontraining data.
GEH9: Your demo would go much, much better if you start with the negative result using 1. Larger Nval/Ntrn and Ntst/Ntrn (0.34/0.33/0.33) w/o valstop or regularization (msereg or trainbr) 2. Then, compare your method with valstop, msereg and trainbr. 3. Keep track of training times
Hope this helps.
Greg
Fereshteh....
on 25 Jul 2015
Edited: Fereshteh....
on 25 Jul 2015
Thank you very much Greg , I really appreciate your help.
>GEH1: Validation Stopping ends training when the val subset error minimizes before the training subset performance reaches it's goal. The purpose is to save time when it is clear that the net will not perform well on nontraining data.Now your classifier keeps on training hoping that the val performance will reverse it's trend. My immediate reaction is that it is a waste of time because the number of times that val performance reverses is very small. I'm probably in the majority on that assertion. Therefore I think that you should disprove that premise by keeping track of the instances where val set performance achieves a local min and record the post-minimum behavior.
So I suppose you mean that I should not write a “while loop” coz it would just waste the time without adding any improvement, the thing is I do 20-30 times cross validation using P.C.V (partial cross validation), as far as I am concerned in cross validations we should involve networks which are well trained (I mean networks which are truly trained), some stopping conditions, stop net’s training but it doesn’t mean the network is correctly trained, e.g. it might stop due to it reached maximum epoch but still have a big training error, so I guess in cross validation we shouldn’t use this network, I cannot check manually one by one of my networks to check if they are well- trained and then I put them in cross validation process, so I have to write a loop to control if my net is well trained or not, and only when they are well-trained, then I will involve them in cross validation. Is this wrong?
>GEH4: These are already tabulated in tr.perf and tr.vperf Performance function apparently is different from MSE, I have compared the results they were different that is why is used MSE. >GEH5: Better to normalize by the respective target variances, e.g., etrain<= 0.01*var(trtar,1) & evalidation<= 0.01*var(valtar,1) For choosing those numbers, I mean for example 1e-10, I ran my program several times and saw the amounts of etrain and evalidation , then I chose a number close to those amounts , I am afraid if I use for example 0.01*var(trtar,1) which I calculated it in my network, and it was 0.01 *0.2222= 0.002222 , this number would be so bigger than the errors which my networks make. So does having those small errors such as: (Error of validation = 5.528418144299875e-13 Error of training = 5.376670402341206e-12) mean my network are screwed ?
>GEH9: Your demo would go much, much better if you start with the negative result using 1. Larger Nval/Ntrn and Ntst/Ntrn (0.34/0.33/0.33) w/o valstop or regularization (msereg or trainbr) 2. Then, compare your method with valstop, msereg and trainbr. 3. Keep track of training times
I am afraid I didn’t get what u mean by “negative result” correctly. when I have data shortage, I guess using only 34% of all data for training my network, would lead to underfitting. I wonder by saying” comparing my method with valstop, msereg and trainbr “, you mean I shouldn’t compare my method result with actual data result? I guess I didn’t get you correctly, sorry.
As I told before the goal in my thesis is to introduce algorithms which can generate synthetic data sets to overcome the problem of insufficient data set size (DATA SHORTAGE), in classification problems. we investigate two phases in my thesis.
Phase one
1. We have small size of data sets (I call this data sets actual data sets) => these number of data sets are not enough for a large neural network => overfitting happens => we decide to generate synthetic data
2. Using one of our algorithms we generate synthetic data sets =>synthetic data sets
3. “Actual data sets + synthetic datasets” are given to a large neural network => the network is well trained and it could overcome the overfitting
Here we will show that our synthetic data sets could overcome the overfitting problem so if we faced with problems which duo to data shortage cannot be classified by neural network classifiers, our algorithms can solve these problems.
Phase two
Showing that synthetic data sets could even improve the classification results
1. Actual data are classified => saving the results
2. Actual data sets + synthetic data sets are classified => saving the results
3. Comparing results of step 1 and 2.
Greg I wonder if you have any suggestion for a high dimensional data set which can suite my purpose, a data sets which is not easy to be characterized, I have worked with real data sets like thyroid data sets but we aim, at first, show the power of all algorithms on artificial data sets which we have information of their generating functions and about the process of how they are distributed and after that we will work on real data sets such as thyroid.
Thanks a million times.
Greg Heath
on 27 Jul 2015
I do not have time for a detailed answer now. However, it looks like you are still postulating that overfitting (defined as Nw > Ntrneq), itself is the big problem.
It isn't.
The problem is dealing with a training set that does not adequately characterize the salient properties of nontraining (seen and unseen) data. This happens most often with a small training set.
Greg Heath
on 27 Jul 2015
Thank you very much Greg , I really appreciate your help.
>>GEH1: Validation Stopping ends training when the val subset error minimizes before the training subset performance reaches it's goal. The purpose is to save time when it is clear that the net will not perform well on nontraining data. Now your classifier keeps on training hoping that the val performance will reverse it's trend. My immediate reaction is that it is a waste of time because the number of times that val performance reverses is very small. I'm probably in the majority on that assertion. Therefore I think that you should disprove that premise by keeping track of the instances where val set performance achieves a local min and record the post-minimum behavior.
>So I suppose you mean that I should not write a “while loop” coz it would just waste the time without adding any improvement, the thing is I do 20-30 times cross validation using P.C.V (partial cross validation), as far as I am concerned in cross validations we should involve networks which are well trained (I mean networks which are truly trained), some stopping conditions, stop net’s training but it doesn’t mean the network is correctly trained, e.g. it might stop due to it reached maximum epoch but still have a big training error, so I guess in cross validation we shouldn’t use this network, I cannot check manually one by one of my networks to check if they are well- trained and then I put them in cross validation process, so I have to write a loop to control if my net is well trained or not, and only when they are well-trained, then I will involve them in cross validation. Is this wrong?
You are using training data to design a net which has to work well on nontraining data. If training stops because of the default Valstop condition, and the resulting design (trn+val) performance is unsatisfactory, then discard the design, reinitialize the weights and try again. There is a high probability that training beyond the default Valstop condition is a waste of time.
>>GEH4: These are already tabulated in tr.perf and tr.vperf
CORRECTION: tr.best_perf and tr.best_vperf
> Performance function apparently is different from MSE, I have compared the results they were different that is why is used MSE.
OH, RIGHT: The default performance function for classifiers is CROSSENTROPY!! Now the question is whether or not Valstop operates on MSE or CROSSENTROPY. You can figure this out if you wish OR just change the performance function to MSE. Remember, what you are really interested in is CORRECT CLASSIFICATION RATE. However, it is not a continuous function. Although crossentropy is, theoretically, the most appropriate error function for classification, mse with default validation stopping should work just fine.
>>GEH5: Better to normalize by the respective target variances, e.g., etrain<= 0.01*var(trtar,1) & evalidation<= 0.01*var(valtar,1)
>For choosing those numbers, I mean for example 1e-10, I ran my program several times and saw the amounts of etrain and evalidation , then I chose a number close to those amounts , I am afraid if I use for example 0.01*var(trtar,1) which I calculated it in my network, and it was 0.01 *0.2222= 0.002222 , this number would be so bigger than the errors which my networks make. So does having those small errors such as: (Error of validation = 5.528418144299875e-13 Error of training = 5.376670402341206e-12) mean my network are screwed ?
No. It just means you have wasted your time trying to make a sufficiently low error smaller. In general, you are trying to model the target variance. If you want to model 99 percent of it then choose MSEgoal = 0.01*var(t,1), 0.001 for 99.9 percent etc. This is easier to understand if you realize that near convergence mean(y) ~ mean(t). Therefore, since e=t-y, mean(e) ~ 0 and mse(e) ~ var(e) .
>>GEH9: Your demo would go much, much better if you start with the negative result using
1. Larger Nval/Ntrn and Ntst/Ntrn (0.34/0.33/0.33) w/o valstop or regularization (msereg or trainbr)
2. Then, compare your method with valstop, msereg and trainbr.
3. Keep track of training times
>I am afraid I didn’t get what u mean y “negative result” correctly.
Good performance on training data BUT lousy performance on nontraining data.
> when I have data shortage, I guess using only 34% of all data for training my network, would lead to underfitting.
No, No, No! ... Underfitting means you don't have enough weights!
>I wonder by saying” comparing my method with valstop, msereg and trainbr “, you mean I shouldn’t compare my method result with actual data result? I guess I didn’t get you correctly, sorry.
Except for using more training data, the three standard methods used to prevent overtraining an overfit net are valstop, msereg and trainbr. Therefore, you MUST compare your method with those.
>As I told before the goal in my thesis is to introduce algorithms which can generate synthetic data sets to overcome the problem of insufficient data set size (DATA SHORTAGE), in classification problems. we investigate two phases in my thesis.
>Phase one
>1. We have small size of data sets (I call this data sets actual data sets) => these number of data sets are not enough for a large neural network => overfitting happens => we decide to generate synthetic data
No, No, No! Overfitting happens when you have many more weights than necessary. For stable solutions that generalize well to nontraining data you need Ntrneq >> Nw. That is why you are synthesizing more data.
>2. Using one of our algorithms we generate synthetic data sets =>synthetic data sets > 3. “Actual data sets + synthetic datasets” are given to a large neural network => the network is well trained and it could overcome the overfitting >Here we will show that our synthetic data sets could overcome the overfitting problem so if we faced with problems which duo to data shortage cannot be classified by neural network classifiers, our algorithms can solve these problems.
BUT you MUST compare your results with the standard methods I mentioned above!
> Phase two
Showing that synthetic data sets could even improve the classification results
1. Actual data are classified => saving the results
2. Actual data sets + synthetic data sets are classified => saving the results
3. Comparing results of step 1 and 2.
>Greg I wonder if you have any suggestion for a high dimensional data set which can suite my purpose, a data sets which is not easy to be characterized, I have worked with real data sets like thyroid data sets but we aim, at first, show the power of all algorithms on artificial data sets which we have information of their generating functions and about the process of how they are distributed and after that we will work on real data sets such as thyroid.
For classification data set examples use the commands
help nndatasets
doc nndatasets
Finally, do not forget to compare the ynthesized data technique to val stopping, regularization and trainbr
Hope this helps.
Greg
Fereshteh....
on 29 Jul 2015
Edited: Fereshteh....
on 29 Jul 2015
thanks again Greg :)
>I do not have time for a detailed answer now. However, it looks like you are still postulating that overfitting (defined as Nw > Ntrneq), itself is the big problem. It isn't. The problem is dealing with a training set that does not adequately characterize the salient properties of nontraining (seen and unseen) data. This happens most often with a small training set.
No, I have been saying that I am looking for such data sets which cannot adequately characterize, but I couldn’t find them, that is why I ended up using two spirals (assuming that its complex pattern of distribution would make it hard to be characterized but unfortunate here, by your help, I found my assumption was wrong), and also I am looking for high dimensional artificial data sets, again with this assumption that they will be hard to be characterized.
>You are using training data to design a net which has to work well on nontraining data. If training stops because of the default Valstop condition, and the resulting design (trn+val) performance is unsatisfactory, then discard the design, reinitialize the weights and try again. There is a high probability that training beyond the default Valstop condition is a waste of time.
So you agree that we should discard nets which are not trained well, I wonder how you do this if you don’t write a while loop , moreover how do you do cross-validation ? I suppose the only way is writing a “while loop” to check, discard, or accept the nets!
>>1. We have small size of data sets (I call this data sets actual data sets) => these number of data sets are not enough for a large neural network => overfitting happens => we decide to generate synthetic data
>No, No, No! Overfitting happens when you have many more weights than necessary. For stable solutions that generalize well to nontraining data you need Ntrneq >> Nw. That is why you are synthesizing more data.
I guess you misunderstood me, by “large” neural network I meant we deliberately make a situation where we get to Ntrneq >> Nw, we make sure our NW is greatly bigger than Ntrneq , as you saw in this code I wrote my code in way that this purpose happened Nw=281 while my NTrneq=29.
>For classification data set examples use the commands doc nndatasets
I have used them, such as thyroid or breast cancer data sets but they are real data sets I want ARTIFICIAL datasets.
Fereshteh....
on 30 Jul 2015
Edited: Fereshteh....
on 30 Jul 2015
Greg one further question, I said to myself okay if my datasets is able to be adequately characterized let’s try the data sets which are in matlab archive, I used “nnstart” to lunch the pattern recognition app, I loaded all example data sets one by one there I imported them, at first, I trained the net with 10 neurons in hidden layer and saw the plot confusion of data set, I wrote down the “test“confusion matrix results.
Then I changed the number of neurons to 1000, by my calculation, in all those data sets except for thyroid data sets, this number of neurons will lead NW becomes greater than Ntrneq, notably, and here we should expect overfitting happens and make our test result bad, but when I checked the plot confusion , and wrote down test confusion matrix result the results were not bad at all , and they were actually good!
I know, you have told me several times that “When overfitting occurs, there is the POTENTIAL for large error when the nontraining data is sufficiently different from the training data.”, my question is does this means in all those data sets, nontraining data is not sufficiently different from training data sets ? I found it soooo odd, if yes, so where I can find a data set which its nontraining data set is sufficiently different from its training dada sets, I am getting so frustrated by neural network :/
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)