get parameters of gaussian distributions from ksdensity function

24 views (last 30 days)
Marc Laub
Marc Laub on 29 Sep 2019
Hey there,
I am locking for a possibilty do get an analytical solution of the distribution of my numbers. Since my numbers are generated by a simulation i can't say for sure which distribution would describe them the best at any time.
The best results i got to describe my data is with the ksdensity funcionmatlab ks density, but the results from ks density are only x and y point of a curve that fits the data.
Is there a possibility to get the parameters of the gaussian distributions from the ksdensity function? Like in the first example here:
where you can clearly see the bimodality of the data. Would it be possible to get the parameters of the 2 gaussian distribution that are superimposed here?

Answers (1)

Thiago Henrique Gomes Lobato
Edited: Thiago Henrique Gomes Lobato on 29 Sep 2019
The ksdensity uses a nonparametric representation to calculate the probabilities, so there's no parameters to get from the function self. If, however, you know which distribution may be underlying it (or can make a good visual estimation), you can do a later parametric optimization of your data to get the parameters. An example based in the two gaussians that you mentioned:
rng('default') % For reproducibility
x = [randn(30,1); 5+randn(30,1)];
[f,xi] = ksdensity(x);
% Here I generate a function from two Gaussians and output
% the rms of the estimation error from the values obtained from ksdensity
fun = @(xx,t,y)rms(y-(xx(5)*1./sqrt(xx(1)^2*2*pi).*exp(-(t-xx(2)).^2/(2*xx(1)^2))+...
xx(6)*1./sqrt(xx(3)^2*2*pi).*exp(-(t-xx(4)).^2/(2*xx(3)^2)) ) );
% Get the parameters with the minimum error. To improve convergence,choose reasonable initial values
[x,fval] = fminsearch(@(r)fun(r,xi,f),[2,0.5,2,4,0.5,0.5]);
% Make sure sigmas are positive
x([1,3]) = abs(x([1,3]));
% Generate the Parametric functions
pd1 = makedist('Normal','mu',x(2),'sigma',x(1));
pd2 = makedist('Normal','mu',x(4),'sigma',x(3));
% Get the probability values
y1 = pdf(pd1,xi)*x(5); % x(5) is the participation factor from pdf1
y2 = pdf(pd2,xi)*x(6); % x(6) is the participation factor from pdf2
% Plot
hold on;
legend({'ksdensity',['\mu : ',num2str(x(2)),'. \sigma :',num2str(x(1))],...
['\mu : ',num2str(x(4)),'. \sigma :',num2str(x(3))],'pdf1+pdf2'})
you can see a list of possible distributions from matlab here in the parameter 'name': .
A good aproach for you might then be:
  1. Plot the distribution data with ksdensity
  2. Verify what does it looks like and search for the distribution that most resemble it
  3. Do a parametric fit in the data as shown above
If you want to fully automatize it you can generate optimization functions for multiple distributions and then choose the one with the lowest fit error. I hope it helped and if something is not clear you can ask it.
Thiago Henrique Gomes Lobato
You can adjust your optimization as you get new data, if you know from experience that they will always be 2-4 superimposed gaussians you don't need to test Poisson distributions, for example. An idea would be maybe to use findpeaks to get possible values from the mean and then perform the optimization based in the number of peaks you find:
[PeakValues,meanGuesses] = findpeaks(f);
NOfGaussians = length(meanGuesses);
meanGuesses = xi(meanGuesses);
[x,fval] = fminsearch(@(r)fun(r,xi,f,NOfGaussians),[2,meanGuesses(1),2,meanGuesses(2),0.5,0.5]);
Then just make sure the function do the right thing with the number of gaussians parameter and adjust the initial values vector.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!