# Removing Unwanted data from a bunch of data

Hey there everyone

I have a set of data of an eye digram (which I attached its .mat file) and clustered it by color as shown below. However, there are some data (lines) that are disturbing and out of range that makes the diagram too much clumsy and I intend to remove them (as I indicated in the second shape with black highlighter) but I do not know how to do it. I attached the code by the way to get the main output but removing those unwanted data and make the diagram looks clean is my problem

Can anyone help me do this?

### Accepted Answer

Alex Pedcenko
on 4 Jul 2020

Edited: Alex Pedcenko
on 4 Jul 2020

It is because the number of data points is not exactly the same in all the curves, "missing" points are padded as zeros.

E.g. look at your initial data. Each "curve" is separated by NaN value, right? This is what I use to I split them into multicolumn matrices. But the 1st curve has 1 less data points then others, as well as the couple of last curves (see content of xdata and ydata matrices). If column has less data points than all others, these missing cells are filled with zeros by MATLAB. If we get rid of those 2 last curves and add one extra point in the very beginning of 1st curve, it seems to work OK, so the problem is with your initial data or whatever you are doing to it in a meantime.

I added 1st line below and couple of lines at the end of the code here to delete last two curves:

%% filtering

data(2:end+1,:)=data(1:end,:); % one extra point in 1 st curve (corecting missing data)

n = size(data,1);

col=1;

j=1;

xdata=[];

ydata=[];

for i=1:n

if isnan(data(i,1))

col=col+1;

j=1;

end

xdata(j,col)=data(i,1);

ydata(j,col)=data(i,2);

j=j+1;

end

[i j]=find(ydata<-0.06)

cols_to_delete=unique(j);

xdata(:,cols_to_delete)=[];

ydata(:,cols_to_delete)=[];

xdata(:,end)=[]; ydata(:,end)=[]; % getting rid of bogus last curve

xdata(:,end)=[]; ydata(:,end)=[]; % getting rid of another bogus last curve

data=[];

data=[reshape(xdata,[numel(xdata),1]),reshape(ydata,[numel(xdata),1])];

n = size(data,1); % size has now changed

### More Answers (1)

Alex Pedcenko
on 3 Jul 2020

I think you need to split your [n x 2] data array into separate curves, then perform find on all curves when e.g. y-value exceed thresholds, identifying to which curve this found value belongs and eliminating that curve.

E.g. in the beginning of your 1st function:

clc;clear;close;

d = load('data.mat');

data = d.data;

%% filtering

n = size(data,1);

col=1;

j=0;

xdata=[];

ydata=[];

for i=1:n % if number of points in each curve is the same can just use reshape instead of this

j=j+1;

if isnan(data(i,1))

col=col+1;

j=1;

end

xdata(j,col)=data(i,1);

ydata(j,col)=data(i,2);

end

[i j]=find(ydata<-0.06) % here you set your threshold for y-values

cols_to_delete=unique(j);

xdata(:,cols_to_delete)=[];

ydata(:,cols_to_delete)=[];

data=[];

data=[reshape(xdata,[numel(xdata),1]),reshape(ydata,[numel(xdata),1])];

%% flitering ends

...

Alex Pedcenko
on 4 Jul 2020

Edited: Alex Pedcenko
on 4 Jul 2020

