histcounts
Histogram bin counts
Syntax
Description
counts
only the elements in N
= histcounts(C
,Categories
)C
whose value is equal to
the subset of categories specified by Categories
.
[
also returns the categories
that correspond to each count in N
,Categories
]
= histcounts(___)N
using either
of the previous syntaxes for categorical arrays.
[___] = histcounts(___,
specifies additional parameters using one or more namevalue arguments. For
example, you can specify Name,Value
)'BinWidth'
and a scalar to adjust
the width of the bins for numeric data. For categorical data, you can specify
'Normalization'
and either 'count'
,
'countdensity'
, 'probability'
,
'pdf'
, 'cumcount'
, or
'cdf'
.
Examples
Bin Counts and Bin Edges
Distribute 100 random values into bins. histcounts
automatically chooses an appropriate bin width to reveal the underlying distribution of the data.
X = randn(100,1); [N,edges] = histcounts(X)
N = 1×7
2 17 28 32 16 3 2
edges = 1×8
3 2 1 0 1 2 3 4
Specify Number of Bins
Distribute 10 numbers into 6 equally spaced bins.
X = [2 3 5 7 11 13 17 19 23 29]; [N,edges] = histcounts(X,6)
N = 1×6
2 2 2 2 1 1
edges = 1×7
0 4.9000 9.8000 14.7000 19.6000 24.5000 29.4000
Specify Bin Edges
Distribute 1,000 random numbers into bins. Define the bin edges with a vector, where the first element is the left edge of the first bin, and the last element is the right edge of the last bin.
X = randn(1000,1); edges = [5 4 2 1 0.5 0 0.5 1 2 4 5]; N = histcounts(X,edges)
N = 1×10
0 24 149 142 195 200 154 111 25 0
Normalized Bin Counts
Distribute all of the prime numbers less than 100 into bins. Specify 'Normalization'
as 'probability'
to normalize the bin counts so that sum(N)
is 1
. That is, each bin count represents the probability that an observation falls within that bin.
X = primes(100); [N,edges] = histcounts(X, 'Normalization', 'probability')
N = 1×4
0.4000 0.2800 0.2800 0.0400
edges = 1×5
0 30 60 90 120
Determine Bin Placement
Distribute 100 random integers between 5 and 5 into bins, and specify 'BinMethod'
as 'integers'
to use unitwidth bins centered on integers. Specify a third output for histcounts
to return a vector representing the bin indices of the data.
X = randi([5,5],100,1); [N,edges,bin] = histcounts(X,'BinMethod','integers');
Find the bin count for the third bin by counting the occurrences of the number 3
in the bin index vector, bin
. The result is the same as N(3)
.
count = nnz(bin==3)
count = 8
Categorical Bin Counts
Create a categorical vector that represents votes. The categories in the vector are 'yes'
, 'no'
, or 'undecided'
.
A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1]; C = categorical(A,[1 0 NaN],{'yes','no','undecided'})
C = 1x27 categorical
no no yes yes yes no no no no undecided undecided yes no no no yes no yes no yes no no no yes yes yes yes
Determine the number of elements that fall into each category.
[N,Categories] = histcounts(C)
N = 1×3
11 14 2
Categories = 1x3 cell
{'yes'} {'no'} {'undecided'}
Input Arguments
X
— Data to distribute among bins
vector  matrix  multidimensional array
Data to distribute among bins, specified as a vector, matrix,
or multidimensional array. If X
is not a vector,
then histcounts
treats it as a single column vector, X(:)
.
histcounts
ignores all NaN
values.
Similarly, histcounts
ignores Inf
and Inf
values
unless the bin edges explicitly specify Inf
or Inf
as
a bin edge.
Data Types: single
 double
 int8
 int16
 int32
 int64
 uint8
 uint16
 uint32
 uint64
 logical
 datetime
 duration
C
— Categorical data
categorical array
Categorical data, specified as a categorical array. histcounts
ignores
undefined categorical values.
Data Types: categorical
nbins
— Number of bins
positive integer
Number of bins, specified as a positive integer. If you do not
specify nbins
, then histcounts
automatically
calculates how many bins to use based on the values in X
.
Example: [N,edges] = histcounts(X,15)
uses
15 bins.
edges
— Bin edges
vector
Bin edges, specified as a vector. The first vector element specifies the leading edge of the first bin. The last element specifies the trailing edge of the last bin. The trailing edge is only included for the last bin.
For datetime and duration data, edges
must
be a datetime or duration vector in monotonically increasing order.
Categories
— Categories included in count
all categories (default)  string vector  cell vector of character vectors  pattern
scalar  categorical vector
Categories included in count, specified as a string vector, cell vector of character vectors,
pattern
scalar, or categorical vector. By default,
histcounts
uses a bin for each category in
categorical array C
. Use Categories
to
specify a unique subset of the categories instead.
Example: h = histcounts(C,["Large","Small"])
counts only the categorical
data in the categories Large
and
Small
.
Example: h = histcounts(C,"Y" + wildcardPattern)
counts
categorical data in all the categories whose names begin with the letter
Y
.
Data Types: string
 cell
 pattern
 categorical
NameValue Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Namevalue arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: [N,edges] = histcounts(X,'Normalization','probability')
normalizes
the bin counts in N
, such that sum(N)
is
1.
BinWidth
— Width of bins
positive scalar
Width of bins, specified as a positive scalar. If you specify BinWidth
,
then histcounts
can use a maximum of 65,536 bins (or 2^{16}). If the specified bin width requires more bins, then
histcounts
uses a larger bin width
corresponding to the maximum number of bins.
For
datetime
andduration
data,BinWidth
can be a scalar duration or calendar duration.If you specify
BinWidth
withBinMethod
,NumBins
, orBinEdges
,histcounts
only honors the last parameter.This option does not apply to categorical data.
Example:
uses
bins with a width of 5.histcounts
(X,'BinWidth',5)
BinEdges
— Edges of bins
numeric vector
Edges of bins, specified as a numeric vector. The first element specifies the leading edge of the first bin. The last element specifies the trailing edge of the last bin. The trailing edge is only included for the last bin.
If you do not specify the bin edges, then histcounts
automatically determines the bin edges.
If you specify
BinEdges
withBinMethod
,BinWidth
,NumBins
, orBinLimits
,histcounts
only honorsBinEdges
andBinEdges
must be specified last.This option does not apply to categorical data.
BinLimits
— Bin limits
twoelement vector
Bin limits, specified as a twoelement vector, [bmin,bmax]
. The first
element indicates the first bin edge. The second element indicates the last bin
edge.
This option computes using only the data that falls within the bin limits
inclusively, X>=bmin & X<=bmax
.
This option does not apply to categorical data.
Example:
bins only the values in histcounts
(X,'BinLimits',[1,10])X
that are between 1
and
10
inclusive.
BinMethod
— Binning algorithm
'auto'
(default)  'scott'
 'fd'
 'integers'
 'sturges'
 'sqrt'
 ...
Binning algorithm, specified as one of the values in this table.
Value 
Description 


The default 

Scott’s rule is optimal if the data is close
to being normally distributed. This rule is
appropriate for most other distributions, as well.
It uses a bin width of

 The FreedmanDiaconis rule is less sensitive to outliers in the data, and might be more
suitable for data with heavytailed distributions. It uses a bin width of

 The integer rule is useful with integer data, as it creates a bin for each integer. It uses a bin width of 1 and places bin edges halfway between integers. To avoid accidentally creating too many bins, you can use this rule to create a limit of 65536 bins (2^{16}). If the data range is greater than 65536, then the integer rule uses wider bins instead.


Sturges’ rule is popular due to its
simplicity. It chooses the number of bins to be


The Square Root rule is widely used in other
software packages. It chooses the number of bins
to be

histcounts
adjusts the number of bins slightly so that
the bin edges fall on "nice" numbers, rather than using these exact formulas.
For datetime
or duration
data, specify the bin width as
one of these units of time.
Value  Description  Data Type 

"second"  Each bin is 1 second.  datetime and duration 
"minute"  Each bin is 1 minute.  datetime and duration 
"hour"  Each bin is 1 hour.  datetime and duration 
"day"  Each bin is 1 calendar day. This value accounts for daylight saving time shifts.  datetime and duration 
"week"  Each bin is 1 calendar week.  datetime only 
"month"  Each bin is 1 calendar month.  datetime only 
"quarter"  Each bin is 1 calendar quarter.  datetime only 
"year"  Each bin is 1 calendar year. This value accounts for leap days.  datetime and duration 
"decade"  Each bin is 1 decade (10 calendar years).  datetime only 
"century"  Each bin is 1 century (100 calendar years).  datetime only 
If you specify
BinMethod
fordatetime
orduration
data, thenhistcounts
can use a maximum of 65,536 bins (or 2^{16}). If the specified bin duration requires more bins, thenhistcounts
uses a larger bin width corresponding to the maximum number of bins.If you specify
BinLimits
,NumBins
,BinEdges
, orBinWidth
, thenBinMethod
is set to'manual'
.If you specify
BinMethod
withBinWidth
,NumBins
orBinEdges
,histcounts
only honors the last parameter.This option does not apply to categorical data.
Example:
centers the bins on
integers.histcounts
(X,'BinMethod','integers')
Normalization
— Type of normalization
'count'
(default)  'probability'
 'percentage'
 'countdensity'
 'cumcount'
 'pdf'
 'cdf'
Type of normalization, specified as one of the values in this table. For each bin
i
:
$${v}_{i}$$ is the bin value.
$${c}_{i}$$ is the number of elements in the bin.
$${w}_{i}$$ is the width of the bin.
$$N$$ is the number of elements in the input data. This value can be greater than the binned data if the data contains missing values, such as
NaN
, or if some of the data lies outside the bin limits.
Value  Bin Values  Notes 

'count' (default) 
$${v}_{i}={c}_{i}$$


'probability' 
$${v}_{i}=\frac{{c}_{i}}{N}$$


'percentage' 
$${v}_{i}=100*\frac{{c}_{i}}{N}$$


'countdensity' 
$${v}_{i}=\frac{{c}_{i}}{{w}_{i}}$$


'cumcount' 
$${v}_{i}={\displaystyle \sum _{j=1}^{i}{c}_{j}}$$


'pdf' 
$${v}_{i}=\frac{{c}_{i}}{N\text{\hspace{0.17em}}\text{\hspace{0.17em}}\cdot \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{w}_{i}}$$


'cdf' 
$${v}_{i}={\displaystyle \sum _{j=1}^{i}\text{\hspace{0.17em}}\frac{{c}_{j}}{N}}$$


Example:
bins the data using an estimate of
the probability density function.histcounts
(X,'Normalization','pdf')
NumBins
— Number of bins
positive integer
Number of bins, specified as a positive integer. If you do not specify
NumBins
, then histcounts
automatically calculates how many bins to use based on the input data.
If you specify
NumBins
withBinMethod
,BinWidth
orBinEdges
,histcounts
only honors the last parameter.This option does not apply to categorical data.
Output Arguments
N
— Bin counts
row vector
Bin counts, returned as a row vector.
edges
— Bin edges
vector
Bin edges, returned as a vector. The first element is the leading edge of the first bin. The last element is the trailing edge of the last bin.
bin
— Bin indices
array
Bin indices, returned as an array of the same size as X
.
Each element in bin
describes which numbered bin
contains the corresponding element in X
.
A value of 0
in bin
indicates
an element which does not belong to any of the bins (for example,
a NaN
value).
Categories
— Categories included in count
cell vector of character vectors
Categories included in count, returned as a cell vector of character
vectors. Categories
contains the categories in C
that
correspond to each count in N
.
Tips
The behavior of
histcounts
is similar to that of thediscretize
function. Usehistcounts
to find the number of elements in each bin. On the other hand, usediscretize
to find which bin each element belongs to (without counting).
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Usage notes and limitations:
Some input options are not supported. The allowed options are:
'BinWidth'
'BinLimits'
'Normalization'
'BinMethod'
— The'auto'
and'scott'
bin methods are the same. The'fd'
bin method is not supported.
The
Categories
input argument does not support pattern expressions.
For more information, see Tall Arrays.
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
Code generation does not support sparse matrix inputs for this function.
If you do not supply bin edges, then code generation might require variablesize arrays and dynamic memory allocation.
The
Categories
input argument does not support pattern expressions.The
Normalization
namevalue argument does not support the'percentage'
option.
GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.
Usage notes and limitations:
Code generation does not support sparse matrix inputs for this function.
If you do not supply bin edges, then code generation might require variablesize arrays and dynamic memory allocation.
The
Categories
input argument does not support pattern expressions.
ThreadBased Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports threadbased environments. For more information, see Run MATLAB Functions in ThreadBased Environment.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Usage notes and limitations:
64bit integers are not supported.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2014bR2023b: Normalize using percentages
You can normalize histogram values as percentages by specifying the
Normalization
namevalue argument as
'percentage'
.
R2023a: Improved performance with small numeric and logical input data
The histcounts
function shows improved performance for
numeric and logical data due to faster input parsing. The performance improvement is
more significant when input parsing is a greater portion of the computation time.
This situation occurs when the size of the data to distribute among bins is smaller
than 2000 elements.
For example, this code calculates histogram bin counts for a 1000element vector. The code is about 3x faster than in the previous release.
function timingHistcounts X = rand(1,1000); for k = 1:3e3 histcounts(X,"BinMethod","auto"); end end
The approximate execution times are:
R2022b: 0.62 s
R2023a: 0.21 s
The code was timed on a Windows^{®} 10, Intel^{®}
Xeon^{®} CPU E51650 v4 @ 3.60 GHz test system using the
timeit
function.
timeit(@timingHistcounts)
See Also
histogram
 histogram2
 discretize
 histcounts2
 kde
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)