How to replace values above a given percentile by nans
3 views (last 30 days)
Show older comments
Hi,
I am trying to replace values above the 99th percentile in a table by nans by using Star Strider excellent little function (https://www.mathworks.com/matlabcentral/answers/127944-how-to-pick-the-j-th-percentile-of-a-vector). An example of my code looks like so:
% Table with nan
t = array2table(vertcat(horzcat([1:1000]', [1001:2000]'), NaN(10, 2)));
% Replace values above 99th percentile by nan
pct = @(v,p) interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))', sort(v), p*0.01, 'spline');
t.Var1(t.Var1 > pct(t.Var1, 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2, 99)) = nan;
The problem of course is that there are nan values scattered across multiple variables in the table and I therefore receive the following error message: Warning: Columns of data containing NaN values have been ignored during interpolation.
Does anyone has an idea as to how I could replace values above the 99th percentile in a table by nans in a table where there are already multiple nans ? Please note that I do not have the Statistic toolbox.
0 Comments
Accepted Answer
Adam Danz
on 15 Aug 2019
Edited: Adam Danz
on 15 Aug 2019
Here is an alternative function that ignores NaN values. It's not a nice one-liner like Star Striders but it does produce the same output as the prctile function in the stats toolbox.
function pctl = percentile(v,p)
% v is a vector of data; example v = [8 9 nan 13 15 11 nan 3 5 7];
% p is a percentile; example p = 75;
% pctl is the exact p_th percentile of v.
vsort = reshape(sort(v(~isnan(v))),1,[]); %row vector
pd = p/100 * numel(vsort);
idx = floor(pd + 0.5) + [0,1];
md = pd - idx(1);
idx(idx<1) = 1;
idx(idx>numel(vsort)) = numel(vsort);
pctl = sum(vsort(idx) .* (0.5+[-md,md]));
end % <-- only needed if this function is within a script
Comparison
v = [8 9 nan 13 15 11 nan 3 5 7];
p = 82;
percentile(v,p) % = 13.120000 this function
prctile(v,p) % = 13.120000 stat's toolbox
pct(v(~isnan(v)),p) % = 13.115995 the one-liner approximation
To apply it to your data,
t = array2table(vertcat(horzcat((1:1000)', (1001:2000)'), NaN(10, 2)));
t.Var1(t.Var1 > percentile(t.Var1, 99)) = nan;
t.Var2(t.Var2 > percentile(t.Var2, 99)) = nan;
More Answers (1)
per isakson
on 15 Aug 2019
Edited: per isakson
on 15 Aug 2019
Ignore the NaNs explicitely when calculating the percentile. Try
%%
% Table with nan
t = array2table(vertcat(horzcat([1:1000]', [1001:2000]'), NaN(10, 2)));
% Replace values above 99th percentile by nan
t.Var1(t.Var1 > pct(t.Var1, 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2, 99)) = nan;
function z = pct(v,p)
v( isnan(v) ) = [];
z = interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))' ...
, sort(v), p*0.01, 'spline');
end
1 Comment
Adam Danz
on 15 Aug 2019
Edited: Adam Danz
on 15 Aug 2019
If you're going to use the one-liner appoximation, you can keep the function handle and explicitly ignore the NaNs from within the call to the function.
t = array2table(vertcat(horzcat((1:1000)', (1001:2000)'), NaN(10, 2)));
pct = @(v,p) interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))', sort(v), p*0.01, 'spline');
t.Var1(t.Var1 > pct(t.Var1(~isnan(t.Var1)), 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2(~isnan(t.Var2)), 99)) = nan;
% {_____here____}
See Also
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!