MATLAB Answers


Data Analysis advance filtering

Asked by Devansh Patel on 19 Feb 2019
Latest activity Edited by Sean de Wolski
on 19 Feb 2019
I have table which has 200 variables. 4 different types. A1,A2...A50,B1,B2...B50,C1,C2....C50,D1,D2.....D50. "THESE ARE THE COLUMN NAMES." I want to apply multiple filters to multiple columns.
(( B1,B2... 50 > 3) & (A1, A2....A50 > 0.1) & (C1,C2....C50 < 1))
And I want to remove each row which does not fit the criteria from the table.
Please help.


Sign in to comment.




2 Answers

Answer by Bob Nbob
on 19 Feb 2019

You can do this with logic indexing.
data = ... % Your main data block
conditions = data(:,1)>0.1 & data(:,2)>3 & data(:,3)<1;
data2 = dat(conditions,:);


Show 1 older comment
T2 = T1(T1{:,7:8}>3 & T1{:,2:3}<0.1,:);
I tried this for filtering multiple columns and am stuck here. Please help.
Error - Row index exceeds table dimensions.
I'm a bit confused on what you mean by 'columns.' Do you have multiple T variables? You're T1 variable already contains multiple rows and columns which are being filtered, so I'm going to assume you're referring to different tables, rather than different columns within a table. If this is incorrect, please feel free to say so and explain where I am wrong.
If your data is in multiple variables, is it possible to put it into a larger matrix? Possibly using a cell matrix, where each cell is a variable? This will make it possible to index through the different variables, and then you are able to use a for loop to filter all of the data.
T = {T1,T3};
for i = 1:size(T,2)
T2{i} = T{i}(T{i}{:,2}>3 & T{i}{:,3}<0.1,:);
If you do actually mean 'columns' of a single variable, and all of your data is just doubles, I would strongly advise turning your cell array into a regular matrix.
T1 = cell2mat(T1);
This will make the logical operations much easier, and you will be less likely to run into issues with concatonating that may occur with cells.
I think that you will be better off keeping your logic checks as single rows, and using more checks for a single statement. This is my personal opinion, but I think it is better because while the statements themselves do get longer and more convoluted it is easier to make appropriate adjustments to columns or conditions to be met.
T2 = T1(T1(:,7)>3 & T1(:,8)>3 & T1(:,2)<0.1 & T1(:,3)<0.1,:);
All the columns are in the same table. A1...A2... D49,D50 are all columns. What I'm trying to do is filter A1,A2...A50 with same conditions like
(( B1,B2... 50 > 3) & (A1, A2....A50 > 0.1) & (C1,C2....C50 < 1))
And filter out all row of a table. Even if single criteria does not match.

Sign in to comment.

Answer by Sean de Wolski
on 19 Feb 2019
Edited by Sean de Wolski
on 19 Feb 2019

Tested in R2018b. May need to convert strings to cellstrs in older releases.
%% Sample data
Data = array2table(rand(2,200));
names = ["A";"B";"C";"D"]+(1:50);
Data.Properties.VariableNames = names(:);
Thresh = array2table(rand(1,4), 'VariableNames',["A"; "B"; "C"; "D"]);
%% engine
rix = false(height(Data), 1); % Remove rows
for ii = 1:width(Thresh)
% Corresponding variables
vars = startsWith(Data.Properties.VariableNames, Thresh.Properties.VariableNames{ii});
% previously set to remove or matches criteria here.
rix = rix | any(Data{:, vars} > Data{1, ii}, 2);
Data(rix,:) = [] %Remove them
Here I use > for every operation. You could use an operation function handle as well if necessary. Store the operations in a cell array {@le, @ge, @lt} and then use the corresponding one on that iteration of the loop.


Sign in to comment.