You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Matlab: Split and Plot the training data set and test data set.
22 views (last 30 days)
Show older comments
Hi Everyone
Thanks for any Help
i need some solution for this practice.
First, divide the data into two parts: training data (Train) and test data. Consider 30% of the data for the test set and 70% as the training set.
Second, This segmentation should be completely random without duplicate data. In other words, none of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate. (Use efficient functions in MATLAB for this purpose)
--> Perform the regression for the polynomial degree from degree 1 to degree 100 and display the results of these 100 experiments in the plot below.
--> This example plot shows the MSE error for each degree of polynomial for both the training set and test set.
i have generated the data in one dimension using the following code :
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
4 Comments
Adam Danz
on 7 May 2021
Not sure what your question is. What part of the assignment are you stuck on? What specifically do you need help with?
arash Moha
on 7 May 2021
i need output same as this picture but i am new in MATLAB and i have no idea about that.
this pic:
Adam Danz
on 7 May 2021
Do you have any data to work with? Surely your assignment wasn't to just make up data that looks like those curves.
arash Moha
on 7 May 2021
this picture just an example shape of the output I want.
i have generated the data in one dimension using the following code :
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
To generate data, only this code should be used for the given practice.
Answers (1)
Adam Danz
on 7 May 2021
I'll point you in the right direction toward the tools you need to complete your assignment.
>First, divide the data into two parts: training data (Train) and test data. Consider 30% of the data for the test set and 70% as the training set. Second, This segmentation should be completely random without duplicate data. In other words, none of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate. (Use efficient functions in MATLAB for this purpose)
I suggest using cvpartition to break up the data into training and testing set. It's not the most straightforward function but hopefully the documentation page and examples on that page will help to learn it.
Alternatively you use use randperm to create a vector of indicies that can be used to break up the data.
> Perform the regression for the polynomial degree from degree 1 to degree 100 and display the results of these 100 experiments in the plot below.
22 Comments
arash Moha
on 10 May 2021
i use this code for split training and test data set but when i run the program , show me this error :
"" Error using cvpartition (line 160)
The number of observations must be a positive integer greater than one.
Error in Code (line 8)
c = cvpartition(size(x,1),'HoldOut',0.3); ""
what's the problem ??
this is my code:
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
c = cvpartition(size(x,1),'HoldOut',0.3);
idx= c.test;
Train= x(~idx,:);
Test= x(idx,:);
Adam Danz
on 10 May 2021
x is a 1x42 vector so size(x,1) returns 1. You either need size(x,2) or numel(x).
arash Moha
on 11 May 2021
Edited: arash Moha
on 11 May 2021
Is this the right way to split with this warning ->> This segmentation should be completely random without duplicate data ( without overlap ). In other words, none of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate.
If the code is incorrect, please send the correct form of code.
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(size(x));
[i,j]=size(Data);
idx=randperm(j);
Train=Data(:,1:round(j*0.70));
Test=Data(:,1:round(j*0.70)+1:end);
arash Moha
on 11 May 2021
Edited: arash Moha
on 11 May 2021
And what am i doing for this part that you said ?
According to the above code, How to plot TRAIN and TEST data set so that the x-axis is regression from degree 1 to 100 and the y-axis is proportional to the MSE value?
Please Help me and tell me the code for this section, thank you very much.
Adam Danz
on 11 May 2021
No, these two lines are very wrong,
Train=Data(:,1:round(j*0.70));
Test=Data(:,1:round(j*0.70)+1:end);
Here's the concept you need to understand,
idx = randperm(numel(x));
In the line above, idx contains values 1 to n where n is the number of values in x. These are indices and they are randomized without repeating.
You can use this vector to split up the data. For example, the first 30% (approximate) is data(idx(1:floor(n*.3))) where n is the number of values in x and the remaining 70% is data(idx(floor(n*.3)+1:n)).
arash Moha
on 11 May 2021
Edited: arash Moha
on 11 May 2021
Put j instead of n in your example? - Is this part of the code correct so far?
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(size(x));
[i,j]=size(Data);
idx=randperm(numel(x));
Train=Data(idx(1:floor(j*.3)+1:j));
Test=Data(idx(1:floor(j*.3)));
p=polyfit(temp,x,1);
k=p(1)*temp+p(2);
Thanks a lot.
what am i doing for this part that you said ?
According to the above code, How to plot TRAIN and TEST data set so that the x-axis is regression from degree 1 to 100 and the y-axis is proportional to the MSE value?
Please Help me and tell me the code for this section, thank you very much.
Adam Danz
on 11 May 2021
If x is a vector, numel(x) is better to use than size(x,2).
I don't see where you're using the Train and Test data. I'm guessing that your instructor wants you to fit the training data and then using the coeficient estimates to compute the error between the training set and test set. So you'll need to compute the mean squared error of the residuals for each partition of data. That will give you 2 values for each polynomial degree.
The loop will look like this.
degrees = 1:100;
MSE = nan(numel(degrees),2); % 2 columns: one for training and one for testing set
for i = 1:numel(degrees)
% 1) Fit the data here using polyfit, the polynomial degree
% should change on each iteration.
% 2) Compute the MSE for the test set and training set
MSE(i,1) = ___
MSE(i,2) = ___
end
Now you'll have a 100x2 matrix of MSE. All you have to do is plot each column.
See your text book or other online resources to remind yourself how to compute MSE, if needed. You'll use the coefficient estimates (1st output in polyfit) and the actual y-values for both data sets.
arash Moha
on 11 May 2021
Edited: arash Moha
on 11 May 2021
i'm so sorry,I am new to using MATLAB
Is this part of the code wrong for fit data with polyfit and compute mse ?
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(numel(x));
[i,j]=size(Data);
idx=randperm(numel(x));
Train=Data(idx(1:floor(j*.3)+1:end));
y_Train=sin(Train + .2*randn(size(Train)));
Test=Data(idx(1:floor(j*.3)));
y_Test=sin(Test + .2*randn(size(Test)));
Degrees = 1:100;
MSE = nan(numel(Degrees),2);
for i=1:numel(Degrees)
p=polyfit(Train,y_Train,1);
pval=polyval(p,Train);
MSE(i,1) = mean((Train - pval).^2);
MSE(i,2) = mean((Train - pval).^2);
end
Adam Danz
on 11 May 2021
In these 3 lines below, I assume temp is the x-values, x are the y-values, but I don't know what Data is supposed to be.
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(numel(x));
Then, in these 2 lines below you're generating even more random data so I'm really lost.
Train=Data(idx(1:floor(j*.3)+1:end));
y_Train=sin(Train + .2*randn(size(Train)));
You're not understanding the logic behind this. idx should be used to extract data from the x and y variables so they remain paired. For example, let's say your raw data are (x,y) coordinates stored in variables X and Y,
n = numel(x);
idxTrain = idx(1:floor(n*.7));
xTrain = X(idxTrain);
yTrain = Y(idxTrain);
% Then repeate that process for test data.
This line needs to use the loop variable to change the polynomial degree. Right now you're always using 1! Also, make sure the first two inputs are all from the training data.
p=polyfit(Train,y_Train,1);
arash Moha
on 11 May 2021
Edited: arash Moha
on 11 May 2021
Thank you very much for your effort. finally, considering that I have bothered you a lot, how is the plot test error and train error to look like the following figure, ie the x-axis is a regression from 1 to 100 and the y-axis is mse?
i think this is the correct code according to the explanation you kindly provided but instead of 1 in polyfit What should I put? I put i or Degrees, but it gives an error ?
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
idx=randperm(numel(x));
n = numel(x);
idxTrain = idx(1:floor(n*.7));
xTrain = temp(idxTrain);
yTrain = x(idxTrain);
idxTest = idx(1:floor(n*.3));
xTest = temp(idxTest);
yTest = x(idxTest);
Degrees = 1:100;
MSE = nan(numel(Degrees),2);
for i=1:numel(Degrees)
p=polyfit(xTrain,yTrain,1);
pval=polyval(p,xTrain);
MSE(i,1) = mean((xTrain - pval).^2);
MSE(i,2) = mean((xTrain - pval).^2);
hold on
p=polyfit(xTest,yTest,1);
pval=polyval(p,xTest);
MSE(i,1) = mean((xTest - pval).^2);
MSE(i,2) = mean((xTest - pval).^2);
end
Adam Danz
on 11 May 2021
> how is the plot test error and train error to look like the following figure
With the command plot(x,y), x will be the degrees vector and y will be the vector of MSE values.
arash Moha
on 11 May 2021
Exactly, this state and shape is impossible.
Is the code above correct?
Thank you so much for your help.
Adam Danz
on 11 May 2021
Edited: Adam Danz
on 11 May 2021
Your fitting the polynomial using both the training the test data sets (see section of your code below).
for i=1:numel(Degrees)
p=polyfit(xTrain,yTrain,1); % Fit training data
pval=polyval(p,xTrain);
MSE(i,1) = mean((xTrain - pval).^2);
MSE(i,2) = mean((xTrain - pval).^2);
hold on % why is this here?
p=polyfit(xTest,yTest,1); % Fit testing data
pval=polyval(p,xTest);
MSE(i,1) = mean((xTest - pval).^2);
MSE(i,2) = mean((xTest - pval).^2);
end
I believe your assignment is to fit the data with the training set and to compute the MSE on both the training and test sets using the same coefficients returned by the training-fit. That's what cross validation is. If you fit the test set separately, that's not cross validation.
arash Moha
on 12 May 2021
Edited: arash Moha
on 12 May 2021
Exactly you are right, 30% of the test set should be randomly separated and then fit to 70% data of the training set.
This is all the code I wrote, but I do not know exactly where the problem is? - I really thank you for telling me what the correct code is here and how MSE and Polyfit are calculated here, I'm so confused please help me for that.
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
idx=randperm(numel(x));
n = numel(x);
idxTrain = idx(1:floor(n*.7));
xTrain = temp(idxTrain);
yTrain = x(idxTrain);
idxTest = idx(1:floor(n*.3));
xTest = temp(idxTest);
yTest = x(idxTest);
Degrees = 1:100;
MSE = nan(numel(Degrees),2);
for i=1:numel(Degrees)
p=polyfit(xTrain,yTrain,i);
pval=polyval(p,xTrain);
MSE(i,1) = immse(pval,xTrain);
%MSE(i,2) = mean((idxTest - pval).^2);
end
plot(Degrees,MSE(i,1), 'b.-', 'LineWidth', 3);
hold on
%plot(Degrees,MSE(i,2), 'r.-', 'LineWidth', 3);
%title('x as a function of index', 'FontSize', 18);
xlabel('Regression', 'FontSize', 15);
ylabel('MSE', 'FontSize', 15);
grid on;
arash Moha
on 12 May 2021
Edited: arash Moha
on 12 May 2021
please help me, i don't have any time to practice end, thanks a lot.
Adam Danz
on 13 May 2021
1. Is this the actual data you're supposed to be fitting?
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
plot(temp,x, '-o')
idxTrain = idx(1:floor(n*.7));
idxTest = idx(1:floor(n*.3));
3. This is not correct, although it's close. But it shows that you don't understand the concept of mean squared error, though. The error is the difference between the estimated y-values and the actual y-values.
MSE(i,1) = immse(pval,xTrain);
4. You're also need the mean squared error for the test-set using the same polynmial fit values you used with the training data (p with the x-test values).
Lastly here are what the firs 28 polynomial fits looklike when I fit the entire data (not the training/test sets). Note how the fits start to become better but the the fits start adapting too much and then at the end, the fits become really bad. The text label shows the loop-number.
The plot was created using
plot(temp, x, 'bo')
hold on
And then within the loop,
plot(temp, pval); % after fitting *all* of the data
arash Moha
on 13 May 2021
Edited: arash Moha
on 13 May 2021
- No No, i have generated the data in one dimension,you should just plot x without temp, x is my actual data.
- what is the correct expression of this code? - Although you mentioned in a few previous messages that it should be written like this.
- what is the correct expression of this code?
- how i fit the test set using the same polynominal fit ? - you mean this : pvall = polyval(p,xTest);
Note : Data Generated is in the one dimension, x is just my actual data without temp.
Note : None of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate
arash Moha
on 13 May 2021
Note : In this exercise, we intend to examine the effect of polynomial degrees on regression and overfit on training samples.
arash Moha
on 13 May 2021
Thank you very much for your time. If you please send me the changes to correct the code I sent, I do not have much time to deliver the exercise, please help.I beg you.
Adam Danz
on 13 May 2021
Everything looks ok but you're not plotting the results correctly.
plot(Degrees,MSE(i,1), 'b.-', 'LineWidth', 3);
% ^
This is only plotting the last row of results. Instead, you want MSE(:,1) and then repeate for the second column.
The results will not look like the example in your image. Those lines must be from a different dataset.
arash Moha
on 13 May 2021
Is this code correct? - Please, if there is a problem somewhere in the code, please tell me the correct code, thank you very much.
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
Data = sin(temp)+.2*randn(size(temp));
[i,j]=size(Data);
p=0.30;
idx=randperm(j);
Data_Trainset=Data(:,idx(1:round(p*j)));
Data_Testset=Data(:,idx(round(p*j)+1:end));
Data_Trainset_y=randn(size(Data_Trainset));
Data_Testset_y=randn(size(Data_Testset));
l1=[];
l2=[];
for i1=1:100
p=polyfit(Data_Trainset,Data_Trainset_y,i1);
pval= polyval(p,Data_Trainset_y);
pvall=polyval(p,Data_Testset_y);
MSE_Trainset=immse(pval,Data_Trainset);
MSE_Testset=immse(pvall,Data_Testset);
g=inv(MSE_Trainset);
g1=inv(MSE_Testset);
l1=[l1,g];
l2=[l2,g1];
plot(i1,g,'b.-', 'LineWidth', 3);
legend('Train error','Test error');
hold on
plot(i1,g1,'r.-', 'LineWidth', 3);
legend('Train error','Test error');
hold on
end
xlabel('Regression', 'FontSize', 15);
ylabel('MSE', 'FontSize', 15);
grid on;
Adam Danz
on 13 May 2021
No, the plotting should stay out of the loop.
To plot a column z of matrix m, plot(m(:,z))
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)