Clear Filters
Clear Filters

Get indeces of any quantile of a column

30 views (last 30 days)
A. Goeh
A. Goeh on 26 Aug 2016
Commented: Image Analyst on 27 Aug 2016
Hello everybody,
as of now I´m trying to sort a large (101x1168) matrix. I am always sorting the first column, on which the following three columns depend upon. I want to be able to get any of the indeces of, for example the top 10 % cent of the values, or the values between the .3 and .4 quantile of the first column, to adress those with a function. As of now I have used several sortrows(), but it takes a long time to run. It is important to know that the length of the columns may vary ( Some of the columns have more NaNs than others) and thus it would be amazing if it was a function that ignores NaNs (maybe a combination of quantile() and find()?)
Here an example of what I need:
Col. 1 Col. 2 Col. 3 Col. 4
15 18 12 32
14 23 19 12
10 7 18 12
9 34 12 13
11 19 3 17
I know want to know the Index and the value of the top 20% values a in the first column. In this case it would be 1. and 15. If implemented correctly I would be able to get a vector output with all the data.
Any help is truly appeciated! Many thanks and kind regards, A.Goe

Answers (1)

Image Analyst
Image Analyst on 26 Aug 2016
If you have the Statistics and Machine Learning Toolbox, there is prctile(). Would that help?
Y = prctile(X,p) returns percentiles of the values in a data vector or matrix X for the percentages p in the interval [0,100]. If X is a vector, then Y is a scalar or a vector with the same length as the number of percentiles required (length(p)). Y(i) contains the p(i) percentile.
If X is a matrix, then Y is a row vector or a matrix, where the number of rows of Y is equal to the number of percentiles required (length(p)). The ith row of Y contains the p(i) percentiles of each column of X.
For multidimensional arrays, prctile operates along the first nonsingleton dimension of X.
  2 Comments
A. Goeh
A. Goeh on 27 Aug 2016
Hello , first of all thank you for your answer. I tried prctile(), problem here is that the results don`t necessarily have to be values that can be found in the original dataset, thus I can´t search for the indeces of the results...I´m thinking about being able to split the vector ( column) in same length pieces and search for the first and last index, altough not very successful, to be honest.
Image Analyst
Image Analyst on 27 Aug 2016
If the values must be in your data, then you can use cumsum() to create the cdf, then use find to find the value. Untested code:
col1 = sort(data(:, 1), 'ascend');
cdf = cumsum(col1); % Compute cdf
cdf = cdf/cdf(end); % Normalize
% Find index of top 20 %
index = find(cdf >= 0.8, 1, 'first');
dataValue = col1(index);

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!