Normalization of probability distribution function

24 views (last 30 days)
I'm trying to obtain a probability distribution curve with an area equal to 1. I have a dataset of about 3 million values ranging from 0 to 3.5 but this range changes depending on other input parameters irrelevant to my question. I'm basically trying to assign probabilities from 0 to 1 based on experimental data to later apply a Monte Carlo but for some reason I can't figure out why the area under my distribution curve is much larger than 1 and I can't seem to find a way to normalize it. Here is my code pertaining the issue and a snippet of the figure I obtain. Thank you so much in advance to anyone that helps.
EDIT: The value for the variable "area" is actually 1, I just can't seem to visualize how the probabilities amount to 1 in the y-axis.
% Compute the kernel density estimation for electron energy distribution
% from main ion
[f,xi] = ksdensity(Electron_info(:,2));
% Compute the area under the curve
area = trapz(xi,f);
% Normalize the kernel density estimate by the area
f_norm = f/area;
% plot the KDE curve
figure(1)
plot(xi, f_norm,'r')
hold on
% Set plot properties
legend('Ion excitation')
xlabel('Electron Energy (eV)')
ylabel('Probability Density')
title('Electron Energy Distribution')
xlim([0 3.5])
set(gcf, 'Color','w')

Accepted Answer

the cyclist
the cyclist on 25 Apr 2023
It might help if you uploaded your data, so that we can run your code.
To me, eyeballing your curve does look like it has area 1. It's a little bit of guesswork to try to figure out why you don't perceive that. Is it because you have a peak near 0.8? Remember, that peak is sharp and narrow -- running over a range of x from about 2.6 to 3 (and not all the y values there are as high as 0.8). That peak contributes perhaps about 0.25 to the area.
The broad, flattish region contributies about 0.3*(2.5-0.5) = 0.6.
The left peak contributes about 0.3*0.5 = 0.15.
I see no problem.
  2 Comments
Jorge Fernandez
Jorge Fernandez on 25 Apr 2023
First off thank you for the answer and second you are correct. What I was trying to do was to obtain a function where the values of y correspond to the probability of finding a value of x. i.e. If I look at x = 5 and see where this value intercepts with the function, the corresponding y would be the probability.
the cyclist
the cyclist on 25 Apr 2023
Edited: the cyclist on 25 Apr 2023
For continuous functions, the probability of getting any exact, individual point (e.g. x=5) is zero. This can be a tricky point to grasp at first. It might help to realize to that there are an infinite number of x values, so if the each had a finite probability, then the total probability would be infinite.
Instead, you use the probability density function (which is what you have), and estimate the probability of a range of points, but using the area under the probability density.
If you have a discrete function, then you could plot the probablity itself, such as
x = [1 2 3];
p = [0.2 0.5 0.3];
bar(x,p)

Sign in to comment.

More Answers (1)

Torsten
Torsten on 25 Apr 2023
Edited: Torsten on 25 Apr 2023
I think the kernel density is already normalized ...
From the documentation:
[f,xi] = ksdensity(x) returns a probability density estimate, f, for the sample data in the vector or two-column matrix x. The estimate is based on a normal kernel function, and is evaluated at equally-spaced points, xi, that cover the range of the data in x. ksdensity estimates the density at 100 points for univariate data, or 900 points for bivariate data.
If you want to see the cumulated area under the curve, use
[f,xi] = ksdensity(Electron_info(:,2),'Function','cdf');
  2 Comments
Jorge Fernandez
Jorge Fernandez on 25 Apr 2023
First off thanks for the answer. You are indeed correct, however what I'm trying to plot (if possible) was to obtain a function where the values of y correspond to the probability of finding a value of x. i.e. If I look at x = 5 and see where this value intercepts with the function, the corresponding y would be the probability.
Torsten
Torsten on 25 Apr 2023
The probability density function gives information about the probability for an interval of x-values. The probability to get a single x-value for a continuous distribution is always 0.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!