function hist3 number of bins with 'Edges' option doesn't count the bins right

8 views (last 30 days)
I am using hist3 to histogram 2D scattered data. I intended to use the 'Edges' option the same way as in histogram. But it doesn't do the same. I have defined an edges cell to histogram my data in 40x40 bins using something like {0:1/40:1 0:1/40:1}. This gives me 41 edges on each axes and thus should be 40x40 bins. But I get 41x41. If I use the same cell as centers with the 'Ctrs' option I also get 41x41, which is right. Something is fishy here. The 'Edges' option doesn't seem to work right. Is that a bug or am I doing something wrong?
The code looks something like that:
%define bin edges
%limits = [xmin xmax ymin ymax]
%bin = [binNumX binNumY]
binSizeX = (limits(2)-limits(1))/bins(1);
binSizeY = (limits(4)-limits(3))/bins(2);
edgsX = limits(1):binSizeX:limits(2);
edgsY = limits(3):binSizeY:limits(4);
edgs = {edgsX edgsY}; %what I wanted to use in hist3, but doesn't work as expected, 1 bin too much
%create histograms
%2D histogram
%define bin centers (edges option does some crap and they are needed anyway)
ctrsX = edgsX-binSizeX/2; ctrsX(1) = [];
ctrsY = edgsY-binSizeY/2; ctrsY(1) = [];
%add two bins to be removed afterwards (unwanted open bins)
ctrs = {[ctrsX(1)-binSizeX ctrsX ctrsX(length(ctrsX))+binSizeX] [ctrsY(1)-binSizeY ctrsY ctrsY(length(ctrsY))+binSizeY]};
%scattered data points = [Ax1 Ax2]
%Hist2Dbin = hist3([Ax1 Ax2],'Edges',edgs)'; %gives one bin too much
Hist2Dbin = hist3([Ax1 Ax2],'Ctrs',ctrs)';
%remove first and last bins (open bins, because of 'Ctrs' option?)
Hist2Dbin = Hist2Dbin(2:size(Hist2Dbin,1)-1,2:size(Hist2Dbin,2)-1);
Ax1bin = [ctrsX; sum(Hist2Dbin,1)]';
Ax2bin = [ctrsY' sum(Hist2Dbin,2)];
%1D histograms
Ax1binTotal = [ctrsX; histcounts(Ax1,edgsX)]';
Ax2binTotal = [ctrsY; histcounts(Ax2,edgsY)]';
  8 Comments
Adam Danz
Adam Danz on 16 Aug 2018
I see. When I asked for a section of your code, I was assuming it included the lines that were problematic to you. I'll provide an answer below.
Johann Thurn
Johann Thurn on 16 Aug 2018
I included them. They are just commented out. I have received an explanation for this behaviour in the meantime. I will just copy the mail here:
"I understand that you are using 'hist3' with 'Edges' option and are observing some discrepancies compare to 'edges' of 'histogram'
This behavior is caused because when data points fall on the upper bound of the edges supplied, the "hist3" function will create an additional bin for these points. So the binning is as follows for points strictly inside the range:
edges{1}(i) <= X(k,1) < edges{1}(i+1)
edges{2}(j) <= X(k,2) < edges{2}(j+1)
and there is a new bin created for points
X(k,1) = edges{1}(I+1) or X(k,2) = edges{2}(J+1).
The "histcounts" function on the other hand behaves differently. For this function, the binning is as follows for points strictly inside the range:
edges(i) <= X(k) < edges(i+1)
and as follows for bins that fall on the upper bound:
edges(I-1) <= X(k) <= edges(I)
The two functions are part of different toolboxes: "histcounts" is part of the MATLAB toolbox, while "hist3" is part of the Statistics toolbox, which might explain the different choice of implementation. Our development teams is aware of this discrepancy and they can consider changing it in a future release of MATLAB.
As a workaround and to ensure that you get consistent behavior between the functions, you can set the upper limit of your edges to be the maximum value in your data set plus eps. The "eps" function will add a small value to the upper edge so that points that fall on the upper edge are correctly binned."

Sign in to comment.

Accepted Answer

Adam Danz
Adam Danz on 16 Aug 2018
Edited: Adam Danz on 6 Feb 2020
The reason why the 2nd output of hist3() provides 1 extra value is because it includes the last outer edge for any data that extends past the last bin. This is explained in the documentation .
Look at the values of your 'edges' and the values of the output bins.
edges = 0 : 0.025 : 1;
length(edges)
ans = 41
[N, c] = hist3(data, edges);
length(c)
ans = 41
c
ans = [0.0125 : 0.025 : 1.0125];
Notice that the last bin is greater than 1 which was your last edge.
Read more about this in the link I provided under 'edges'.
  1 Comment
Johann Thurn
Johann Thurn on 16 Aug 2018
Ah, this is consistent with what I got from MATLAB staff in the meantime. I copied the mail above. Now I get it. I just assumed the behaviour was the same as in histogram or histocount. My mistake.
Thanks a lot :)

Sign in to comment.

More Answers (1)

Steven Lord
Steven Lord on 16 Aug 2018
Instead of using hist3 I recommend you use histogram2.
  3 Comments
Steven Lord
Steven Lord on 16 Aug 2018
Ah, if you just want the binned data and not the figure then use histcounts2 instead.
Johann Thurn
Johann Thurn on 16 Aug 2018
Oh, I must have overlooked that one. Thanks! Nevertheless, I still believe, hist3 with edges is faulty.

Sign in to comment.

Categories

Find more on Data Distribution Plots in Help Center and File Exchange

Products


Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!