MATLAB Answers

0

Calculating Percentile from a pdf

Asked by Rohit Goel on 5 Jun 2019
Latest activity Commented on by Rohit Goel on 7 Jun 2019
Accepted Answer by dpb
Hi,
I have a data with two columns: Column 1 is the variable, and Column 2 is the probability density. I am pasting a sample of the data, but overall cumsum(COlumn2) = 100, as it should be.
Snap4.png
Question is, how do I get the 5th percentile of Column 1 (given the probabilities associated with each number). I have tried a number of things but coming at the dead-end. APologies in advance in case its too naive.

  0 Comments

Sign in to comment.

Products


Release

R2017b

2 Answers

Answer by dpb
on 6 Jun 2019
 Accepted Answer

If I interpret the want correctly...let z,v be your two columns--then
ecfn=ecdf(v); % empirical cumulative distribution function values
N=fix(numel(v)/2); % first half--assume symmetric distribution
P=0.05; % desired percentile (less 50th percentile)
z05=interp1(v(1:N),z(1:N),P); % find the Pth percentile

  3 Comments

Thank you for the reply. The excel file is attached alongwith.
The issue with the solution is that an empirical cumulative distribution doesn't always fit the actual distribution right ? It smooths it out, which distorts the result. Since I already have the probability densities on in the 'v' column - any fast way to just do a rolling cumulative sum throughout ? That will be my CDF.
Thanks again.
That is what ECDF is just in convenient wrapper...the interp1 is just the prepackaged lookup for the location of the actual P requested rather than nearest.
If that's all you're looking for, then sure, just find cumsum()>P excepting you'll still have to build the summation vector to find the location as ML doesn't support syntax to search a temporary result in an expression.
Thank you for your help. Its done.

Sign in to comment.


Answer by John D'Errico
on 5 Jun 2019
Edited by John D'Errico
on 5 Jun 2019

Pretty simple actually, though it is far easier as I can give you an example, than if you posted your actual data rather than a blasted picture of numbers. A picture of numbers is not worth a thousand words. Sorry, but I refuse to type in numbers from a picture.
But do this:
  1. Set the point at -9.42 to be zero.
  2. Use cumsum.
  3. Normalize the sum to 1.
  4. Interpolate (actually reverse interpolation.) at 0.05. You can do that using interp1, where x will be the cumulative probability, and y is the column 1 variable. Linear interpolation seems right.
You could also use the 'pchip' or 'makima' options in interp1 to interpolate. Do NOT use 'spline'.

  1 Comment

Thank you for the reply. The excel file is attached alongwith.
The issue with the solution is that its not a linear interpolation right ? This is a t-skew distribution so the middle elements need to be given a much more weight. Sorry if I am missing something.

Sign in to comment.