Find index with multiple condition, using find function
470 views (last 30 days)
Show older comments
Shayma
on 21 Sep 2016
Commented: chakradhar Reddy Vardhireddy
on 25 Sep 2018
Hi all,
stuck again, search for solutions but with no help. I have large csv files(millions rows, 200 columns- text & numbers) that i could open with "datastore" (for now i work only on he first chunk), i want to create a new file with the whole rows that answer some conditions (by comparing only 4 columns which are <=, >= vector with 4 elements:min_range and max_range) so I wrote this:
ds= datastore(file_r);
new_data=table;
index=[];
while hasdata (ds)
datachunk= read (ds);
index= find (datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1)) & (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2)) & (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) & (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
new_data=[new_data;datachunk(index,:);
with the line index i got the error message: Error using & Inputs must have the same size.
each vector has different elements and i'm looking for the intersection between the 4, because its an index i used "find" to look for the rows that match the 4 conditions, so how can i fix that??
if i split it :
z1= find (datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1));
z2= find (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2));
z3= find (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) ;
z4= find (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
z5=intersect(z4,intersect(intersect(z1,z2),z3))
it works, but then i have to rest the values in each run, which not seems to be beneficial way to do it
any help with that will be appreciated :)
4 Comments
chakradhar Reddy Vardhireddy
on 25 Sep 2018
@shayma, could you suggest the method you used, where you didn't use the above function. I have a similar issue, your method may be helpful.
Accepted Answer
George
on 22 Sep 2016
The first thing I would try is to be more liberal with your use of parenthesis. In your statement:
index = find(datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1)) & (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2)) & (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) & (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
you're closing the find after datachunk.lip_acc<max_range(1), and then logical anding it with the other statements. I think you want the entire statement encapsulated in find().
0 Comments
More Answers (1)
Steven Lord
on 21 Sep 2016
I would avoid using find here. Write each of your conditions as separate logical arrays. When you need to index, combine those individual conditions with and, or, not, etc. This way if you encounter unexpected results you can set a breakpoint on the line where you perform the indexing and examine each individual condition to determine whether or not that logical array matches the rows you expect in your array.
M = magic(100);
largeEnough = M >= 40;
smallEnough = M <= 70;
result1 = M(largeEnough & smallEnough)
Once you have debugged your code, you may want to comment out the definition of those individual logical arrays and assemble the conditions all in one statement. If you do, I would consider splitting them among multiple lines for readability. In this example that's probably overkill because the conditions are so simple, but it's a good habit to develop for when your conditions aren't so simple.
result2 = M((M >= 40) & ...
(M <= 70))
isequal(result1, result2)
See Also
Categories
Find more on Control Flow in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!