Function 'pdf' doesn't return pdf values
14 views (last 30 days)
Show older comments
I have a problem with the function pdf. I have this code:
estim_KDE = fitdist(data, 'kernel');
x = low:(abs(low-high)/(obs-1)):high;
y = pdf(estim_KDE,x);
plot(x,y,'r'), xlabel('xxx'), ylabel('yyy'),...
title('title'), legend('xyz');
but the function pdf returns values that have no sense for me: not comprised between 0 and 1, nor numbers between zero and one multiplied by the length of x (one of this two options is what i expected from the function pdf); for example: it gives me numbers like 20.something or 5.something, with length(x) = 1000 or more, numbers that have no sense for me. This happens for all the distributions i tried to have the pdf (always by the function fitdist). I discovered this problem only because i have plotted an histogram of the frequencies versus the Kernel Density Estimator.
Can someone help me, please?
0 Comments
Answers (2)
John D'Errico
on 6 Feb 2015
Edited: John D'Errico
on 6 Feb 2015
I think you are under a common misperception about the PDF of a random variable. My guess is it is because of the letter P in PDF that confuses people, and yes, it is called a Probability Density Function.
The thing is, it does not actually return a probability. Consider a PDF with a very narrow spread. Here, a Gaussian with mean 0 and std deviation of 0.001.
normpdf(0,0,.001)
ans =
398.94
See that the PDF at 0 is 398.94, vastly larger than 1.
What matters is that the PDF integrates to 1. The integral of that function over the domain is 1.
It is the CDF that actually returns something you can interpret as a probability. Or, you can form the integral of the PDF to compute a probability. That is what the CDF gives you though.
4 Comments
John D'Errico
on 10 Feb 2015
A plot of the PDF IS a graph of the relative frequency, to the extent that this makes any sense. Why do you care about the y-axis scaling? If that is what bothers you, then just turn off the y-axis labels.
The fact is, you CAN create a histogram, of the frequency in each "bin". You would do this by either an integration of the PDF over that sub-interval, or by subtracting successive values of the CDF, to get the relative fraction that would occur in that bin.
If you used a tiny enough bin interval, then the curve would look very nice and smooth. But the probability of a point falling in any single such tiny bin would be vanishingly small. So the y-axis scaling would be all tiny numbers. This reflects the fact that any single number has probability ZERO of arising.
So, just plot the PDF, and don't worry about the y-axis, or turn it off completely.
Rob Keeton
on 3 Sep 2019
Multiply by the bandwidth of the pdf.
y = pdf(estim_KDE,x)*;estim_KDE.BandWidth;
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!