Select columns from a matrix by treshold value
Show older comments
Hi,
i have some data stored in a 7200*600 matrix (600 could also be 300 or 1000 or something in this range). Somewhere in the middle there are some relevant columns (about 20) which contains higher values than the other. I would like to extract these columns for some calculation (row mean only for these columns etc.). So far i do this by:
M = mean(data);
B = (M > min(M)*1.1); % Treshold of 110 %
data_new = B.*data;
data_new(:,all(data_new == 0))=[]; % Removes columns if the entire column is zero
Then i can calculate the mean like:
data_new_mean = mean(data_new(:,3:end),2); % First two columns are suspect data
This works fine so far but i'm looking for a way to get the indices and work with the original matrix instead of building a new one. In the next step i have to extract the data in the following columns (If the relevant columns before were 200, 210, 220... for example, i need 201, 211, 221... in the next step) and that's why the first way isn't convenient anymore. Do you have some ideas?
3 Comments
Walter Roberson
on 29 Aug 2021
We need to know that data does not have negative values in it -- or at least that the mean() of each column is definitely positive.
If there are any columns with mean() that is negative, then min(M) is going to be negative, and negative times 1.1 is more negative, which is probably not what you want.
If not then if there are any columns of data that are all-zero, then their mean() is 0, and 0*1.1 is 0, which is probably not what you want.
I am trying to work out of your B.*data can zero out a column "accidentally", knocking out positive values but leaving zeros. I believe the answer to that is NO. But it is possible for an entire non-zero column to be knocked out. For example if the column were all 50, mean() of it would be 50, that might happen to be the min(), threshold would then be 55, but 50 < 55 in all positions, so the column could get knocked out.
... A bunch of this is trying to figure out the consequences if there are negative values or 0 already in the data, which is something we are not told.
Esma Hadu
on 30 Aug 2021
Walter Roberson
on 30 Aug 2021
M = mean(data);
B = (M > min(M, 1)*1.1);
mask = any(M >= B, 1);
mask can now be used as a column index.
Accepted Answer
More Answers (1)
find( any(M > min(M)*1.1 ,1) )
2 Comments
Walter Roberson
on 30 Aug 2021
That would find the columns that contain only data less than the threshold; the user wanted to find columns that contain at least one datapoint greater than the threshold.
Matt J
on 30 Aug 2021
Yep. I fixed it.
Categories
Find more on Graph and Network Algorithms in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!