Removing Unwanted data from a bunch of data
40 views (last 30 days)
Show older comments
Hey there everyone
I have a set of data of an eye digram (which I attached its .mat file) and clustered it by color as shown below. However, there are some data (lines) that are disturbing and out of range that makes the diagram too much clumsy and I intend to remove them (as I indicated in the second shape with black highlighter) but I do not know how to do it. I attached the code by the way to get the main output but removing those unwanted data and make the diagram looks clean is my problem
Can anyone help me do this?
0 Comments
Accepted Answer
Alex Pedcenko
on 4 Jul 2020
Edited: Alex Pedcenko
on 4 Jul 2020
It is because the number of data points is not exactly the same in all the curves, "missing" points are padded as zeros.
E.g. look at your initial data. Each "curve" is separated by NaN value, right? This is what I use to I split them into multicolumn matrices. But the 1st curve has 1 less data points then others, as well as the couple of last curves (see content of xdata and ydata matrices). If column has less data points than all others, these missing cells are filled with zeros by MATLAB. If we get rid of those 2 last curves and add one extra point in the very beginning of 1st curve, it seems to work OK, so the problem is with your initial data or whatever you are doing to it in a meantime.
I added 1st line below and couple of lines at the end of the code here to delete last two curves:
%% filtering
data(2:end+1,:)=data(1:end,:); % one extra point in 1 st curve (corecting missing data)
n = size(data,1);
col=1;
j=1;
xdata=[];
ydata=[];
for i=1:n
if isnan(data(i,1))
col=col+1;
j=1;
end
xdata(j,col)=data(i,1);
ydata(j,col)=data(i,2);
j=j+1;
end
[i j]=find(ydata<-0.06)
cols_to_delete=unique(j);
xdata(:,cols_to_delete)=[];
ydata(:,cols_to_delete)=[];
xdata(:,end)=[]; ydata(:,end)=[]; % getting rid of bogus last curve
xdata(:,end)=[]; ydata(:,end)=[]; % getting rid of another bogus last curve
data=[];
data=[reshape(xdata,[numel(xdata),1]),reshape(ydata,[numel(xdata),1])];
n = size(data,1); % size has now changed
More Answers (1)
Alex Pedcenko
on 3 Jul 2020
I think you need to split your [n x 2] data array into separate curves, then perform find on all curves when e.g. y-value exceed thresholds, identifying to which curve this found value belongs and eliminating that curve.
E.g. in the beginning of your 1st function:
clc;clear;close;
d = load('data.mat');
data = d.data;
%% filtering
n = size(data,1);
col=1;
j=0;
xdata=[];
ydata=[];
for i=1:n % if number of points in each curve is the same can just use reshape instead of this
j=j+1;
if isnan(data(i,1))
col=col+1;
j=1;
end
xdata(j,col)=data(i,1);
ydata(j,col)=data(i,2);
end
[i j]=find(ydata<-0.06) % here you set your threshold for y-values
cols_to_delete=unique(j);
xdata(:,cols_to_delete)=[];
ydata(:,cols_to_delete)=[];
data=[];
data=[reshape(xdata,[numel(xdata),1]),reshape(ydata,[numel(xdata),1])];
%% flitering ends
...
3 Comments
Alex Pedcenko
on 4 Jul 2020
Edited: Alex Pedcenko
on 4 Jul 2020
because of one curve ends and next begins. these lines seem to connect end of line with beginning of next line. check your raw data and matrices xdata and ydata , you may want adjust how the original 2columns are chopped into individual curves. When I looked i wasn’t sure whether zeros are genuine data points or not, you may want to replace them perhaps with NaN or adjust the 2nd part of the code.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!