Find index with multiple condition, using find function

484 views (last 30 days)
Hi all,
stuck again, search for solutions but with no help. I have large csv files(millions rows, 200 columns- text & numbers) that i could open with "datastore" (for now i work only on he first chunk), i want to create a new file with the whole rows that answer some conditions (by comparing only 4 columns which are <=, >= vector with 4 elements:min_range and max_range) so I wrote this:
ds= datastore(file_r);
new_data=table;
index=[];
while hasdata (ds)
datachunk= read (ds);
index= find (datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1)) & (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2)) & (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) & (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
new_data=[new_data;datachunk(index,:);
with the line index i got the error message: Error using & Inputs must have the same size.
each vector has different elements and i'm looking for the intersection between the 4, because its an index i used "find" to look for the rows that match the 4 conditions, so how can i fix that??
if i split it :
z1= find (datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1));
z2= find (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2));
z3= find (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) ;
z4= find (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
z5=intersect(z4,intersect(intersect(z1,z2),z3))
it works, but then i have to rest the values in each run, which not seems to be beneficial way to do it
any help with that will be appreciated :)
  4 Comments
Shayma
Shayma on 22 Sep 2016
@George you are right! now it works @Stephen i manage to do it without this function, but thanks for your reply
chakradhar Reddy Vardhireddy
@shayma, could you suggest the method you used, where you didn't use the above function. I have a similar issue, your method may be helpful.

Sign in to comment.

Accepted Answer

George
George on 22 Sep 2016
The first thing I would try is to be more liberal with your use of parenthesis. In your statement:
index = find(datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1)) & (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2)) & (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) & (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
you're closing the find after datachunk.lip_acc<max_range(1), and then logical anding it with the other statements. I think you want the entire statement encapsulated in find().

More Answers (1)

Steven Lord
Steven Lord on 21 Sep 2016
I would avoid using find here. Write each of your conditions as separate logical arrays. When you need to index, combine those individual conditions with and, or, not, etc. This way if you encounter unexpected results you can set a breakpoint on the line where you perform the indexing and examine each individual condition to determine whether or not that logical array matches the rows you expect in your array.
M = magic(100);
largeEnough = M >= 40;
smallEnough = M <= 70;
result1 = M(largeEnough & smallEnough)
Once you have debugged your code, you may want to comment out the definition of those individual logical arrays and assemble the conditions all in one statement. If you do, I would consider splitting them among multiple lines for readability. In this example that's probably overkill because the conditions are so simple, but it's a good habit to develop for when your conditions aren't so simple.
result2 = M((M >= 40) & ...
(M <= 70))
isequal(result1, result2)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!