Calculating mean and SD of histogram

14 views (last 30 days)
Tim Fulcher
Tim Fulcher on 31 Jul 2020
Edited: Adam Danz on 5 Aug 2020
Hi All,
I have a histogram in CSV form consisting of 1000 bars and representing energy (GeV). The vertical scale is the number of particles in the energy range of a bar. The expected distibution is Gaussian and I need to determine the mean and SD? I tried using fitdist but it only accepts one vector and I should like to preserve both axes. How might I do this?
Regards
Tim
  7 Comments
Tim Fulcher
Tim Fulcher on 2 Aug 2020
Hi Image Analyst,
in this case the fitted Gaussian.
Thanks and regards
Tim
Adam Danz
Adam Danz on 2 Aug 2020
Edited: Adam Danz on 5 Aug 2020
The csv file contains 3 columns of 100 numeric values.
What are we supposed to do with that? What's MeV?
When you say the 3rd col is error, what does that mean?

Sign in to comment.

Answers (1)

Adam Danz
Adam Danz on 2 Aug 2020
Edited: Adam Danz on 3 Aug 2020
Guessing that column 1 of the data are x-values to the bar plot and column 2 of the data are the bar heights, you can fit a guassian distribution to the (x,y) data with three parameters: mean (mu), standard deviation (sigma), and amplitude.
The solution below uses a bar plot but depending on what the x-values mean, a histogram may be more appropriate.
It produces a plot containing the original distribution, the fit, and the fit parameters in the title.
See inline comments for details.
% Read in the 1000x3 matrix
data = readmatrix('200_10.0_A-150_8mm_Steps_1-analysis_Primary_Energy.csv');
% Define the (x,y) values to fit
x = data(:,1); % must match x var in gausFcn
y = data(:,2); % must match y var in gausFcn
% Define the guassian function with 3 params: [mean, std, amplitude]
% The variables "x" and "y" MUST be defined with those names prior
% to these two lines!
gausFcn = @(p)p(3)*exp(-(((x-p(1)).^2)/(2*p(2).^2)));
gausFitFcn = @(p)gausFcn(p)-y; % function to fit
% Initial guesses
initGuess = [mean(data(:,1)),std(data(:,2)), max(data(:,2))];
% Fit the curve. See documentation for lsqnonlin for additonal constraints.
b = lsqnonlin(gausFitFcn, initGuess);
% Plot results.
clf()
h = bar(data(:,1),data(:,2));
yHat = gausFcn(b);
hold on
plot(x, yHat, 'r--', 'LineWidth', 2)
title(sprintf('\\mu=%.3f \\sigma=%.3f amp=%.3f', b))
Cautionary word #1: This assumes your x values are the centers of the bins and that assumption matters. If the x values are bin edges, then the estimate of the mean contains an error equal to or less than +/-0.00025 which is half of the bin widths ( diff(x(:,1))/2 ). If the x values are bin edges and x(1) is the left edge of the first bin, then half the bin width must be added to each x-value to approximate the bin centers.
Cautionary word #2: the fits of the distribution do not equal the fits to your raw data. Bin sizes affect bar heights and spread, so what you're fitting does not tell you the parameters of the underlying population of data.
The plot below shows 5 sets of histograms at 5 different bin-widths using the same raw data (dots).

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!