clustergram
Object containing hierarchical clustering analysis data
Description
The clustergram
function creates a
clustergram
object. The object contains hierarchical clustering analysis
data that you can view in a heatmap and dendrogram.
Creation
Description
cgObj
=
clustergram(
performs hierarchical
clustering analysis on the values in data
)data
. The returned clustergram
object cgObj
contains analysis data and displays a dendrogram and
heatmap.
cgObj
=
clustergram(
sets the object properties using
name-value pairs. For example, data
,Name,Value
)clustergram(data,'Standardize','column')
standardizes the values along the columns of data. You can specify multiple name-value
pairs. Enclose each property name in quotes.
Input Arguments
data
— Source data
DataMatrix object | numeric matrix
Source data, specified as a DataMatrix object or numeric matrix. Typically, if the matrix contains gene expression data, each row corresponds to a gene and each column corresponds to a sample.
Use comma-separated name-value pair arguments to set the object properties. Enclose each property name in single quotes.
Example: cg =
clustergram(data,'Colormap',redbluecmap,'Annotate',true)
Properties
Standardize
— Dimension for standardizing data values
'none'
(default) | 'row'
| 'column'
| 3
| 2
| 1
Dimension for standardizing data values, specified as a character vector, string, or positive integer. Choices are:
'column'
or1
— Standardize along the columns of data.'row'
or2
— Standardize along the rows of data.'none'
or3
— Do not standardize.
If you specify 'column'
or 'row'
, the function transforms the standardized values so that the mean is 0 and the standard deviation is 1 in the specified dimension.
Example: 'column'
Data Types: double
| char
| string
Symmetric
— Flag to make the heatmap color scale symmetric around zero
true
(default) | false
Flag to make the heatmap color scale symmetric around zero, specified as
true
or false
.
Example: false
Data Types: logical
ImputeFun
— Name of function or function handle to impute missing data
character vector | cell array
Name of a function or function handle to impute missing data, specified as a character vector or cell array. If you specify a cell array, the first element must be the name of a function or function handle, and the remaining elements must be name-value pairs used as inputs to the function. Missing data points are colored gray in the heatmap.
If data points are missing, use this property to impute the missing values..
Otherwise, the clustergram
function errors.
Example: 'func1'
Data Types: char
Colormap
— Heatmap colors
redgreencmap
(default) | matrix | name of function handle
heatmap colors, specified as a three-column (M-by-3) matrix of
red-green-blue (RGB) values or the name of a function handle that returns a colormap,
such as redgreencmap
or redbluecmap
.
The default colormap is redgreencmap
, in which red represents
values above the mean, black represents the mean, and green represents values below the
mean of a row (gene) across all columns (samples).
Example: redbluecmap
Data Types: double
| char
ColumnLabels
— Column labels
[1x0 double]
(default) | string vector | cell array of character vectors | numeric vector
Column labels, specified as a string vector, cell array of character vectors, or
numeric vector. The size of the vector must match the number of columns in the input
data
.
If the number of column labels is 200 or more, the labels do not appear in the clustergram plot.
Example: ["sample1","sample2","sample3"]
Data Types: double
| string
| cell
RowLabels
— Row labels
[]
(default) | string vector | cell array of character vectors | numeric vector
Row labels, specified as a string vector, cell array of character vectors, or
numeric vector. The size of the vector must match the number of rows in the input
data
.
If the number of row labels is 200 or more, the labels do not appear in the clustergram plot.
Example: ["gene1","gene2","gene3"]
Data Types: double
| string
| cell
ColumnLabelsRotate
— Orientation of column labels
90
(default) | numeric scalar
Orientation of column labels, specified as a numeric scalar. Specify the value of rotation in degrees (positive angles cause counterclockwise rotation).
Example: 30
Data Types: double
RowLabelsRotate
— Orientation of row labels
0 (default) | numeric scalar
Orientation of row labels, specified as a numeric scalar. Specify the value of rotation in degrees (positive angles cause counterclockwise rotation).
Example: 30
Data Types: double
Annotate
— Flag to display data values in heatmap
false
(default) | true
Flag to display data values in the heatmap, specified as true
or false
.
Example: true
Data Types: logical
AnnotPrecision
— Display precision of data values
2
(default) | numeric scalar
Display precision of data values in the heatmap, specified as a numeric scalar. The default number of digits of precision is 2
.
Example: 3
Data Types: double
LabelsWithMarkers
— Flag to display colored markers for row and column labels
false
(default) | true
Flag to display colored markers instead of colored text for the row and column labels,
specified as true
or false
.
Example: true
Data Types: logical
AnnotColor
— Text color of displayed data values
'w'
(default) | character vector | string | three-element numeric vector
Text color of displayed data values in the heatmap, specified as a character vector,
string, or three-element numeric vector. For example, to use cyan, you can enter
[0 1 1]
, 'c'
, "c"
,
"cyan"
, or 'cyan'
. For details, see Color Options.
Example: 'red'
Data Types: char
| string
| double
DisplayRange
— Display range of standardize values
3 (default) | positive scalar
Display range of standardize values, specified as a positive scalar.
The default value 3
means that there is a color variation for
values between -3
and 3
, but values greater than
3
are the same color as 3
, and values less than
-3
are the same color as -3
.
For example, if you specify redgreencmap
for the
'Colormap'
property, pure red represents values greater than or
equal to the specified display range value and pure green represents values less than or
equal to the negative of the specified display range value.
Example:
3
Data Types: double
ColumnLabelsColor
— Color information for column labels
[]
(default) | structure | structure array
Warning
This property will be removed in a future release. Set
LabelsWithMarkers
to true
for colored
markers instead of colored texts.
Color information for column labels, specified as a structure or structure array.
For a single structure, you must specify the following fields.
Labels
— Cell array of character vectors specifying column labels listed in theColumnLabels
property.Colors
— Character vector or string specifying a color for the column labels. If this field is empty, the default color (black) is used.
For a structure array, you must specify a single element in each field for each structure.
Labels
— Character vector or string specifying a column label listed in theColumnLabels
property.Colors
— Character vector or string specifying a color for the column labels. If this field is empty, the default color (black) is used.
For more information on specifying colors, see Color Options.
Data Types: struct
RowLabelsColor
— Color information for row labels
[]
(default) | structure | structure array
Warning
This property will be removed in a future release. Set
LabelsWithMarkers
to true
for colored
markers instead of colored texts.
Color information for row labels, specified as a structure or structure array.
For a single structure, you must specify the following fields.
Labels
— Cell array of character vectors specifying row labels listed in theRowLabels
property.Colors
— Character vector or string specifying a color for the row labels. If this field is empty, the default color (black) is used.
For a structure array, you must specify a single element in each field for each structure.
Labels
— Character vector or string specifying a row label listed in theRowLabels
property.Colors
— Character vector or string specifying a color for the row labels. If this field is empty, the default color (black) is used.
For more information on specifying colors, see Color Options.
Cluster
— Dimension for data clustering
'all'
(default) | 1
| 2
| 3
| 'column'
| 'row'
Dimension for data clustering, specified as a positive integer, character vector, or string. Choices are:
'column'
or1
— Cluster along the columns of data only, which results in clustered rows.'row'
or2
— Cluster along the rows of data only, which results in clustered columns.'all'
or3
— Cluster along the columns of data, then cluster along the rows of row-clustered data.
Example: 2
Data Types: double
| char
| string
ColumnGroupMarker
— Information for annotating groups of columns
structure | structure array
Information for annotating groups of columns, specified as a structure or structure array.
If you specify a single structure, each field must contain a cell array of elements. If you specify a structure array, each structure must have a single element in each field.
The fields are :
GroupNumber
— Scalar specifying the column group number to annotate.Annotation
— Character vector specifying text to annotate the column group.Color
— Character vector or three-element vector of RGB values specifying a color to label the column group. For more information on specifying colors, see Color Options. If this field is empty, the default value is'blue'
.
Data Types: struct
ColumnPDist
— Distance metric to pass to pdist
function
'euclidean'
(default) | character vector | cell array
Distance metric to pass to the pdist
function to calculate the pairwise distances between columns,
specified as a character vector or cell array. Specify a cell array if the distance
metric requires extra arguments. For example, to use the Minkowski distance with an
exponent p, specify
{'minkowski',p}
.
Example: 'jaccard'
Data Types: char
| cell
Dendrogram
— Color threshold information to pass to dendrogram
function
scalar | two-element numeric vector | character vector | cell array of character vectors
Color threshold information to pass to the dendrogram
function to create a dendrogram plot, specified as a scalar,
two-element numeric vector, character vector, or cell array of character vectors. This
option sets the 'ColorThreshold'
property of the dendrogram plot. If
you specify a two-element numeric vector or cell array, the first element is for the
rows, and the second element is for the columns.
Data Types: double
| cell
DisplayRatio
— Ratio of space that row and column dendrograms occupy
1/5
(default) | scalar between 0
and 1
| two-element vector
Ratio of space that the row and column dendrograms occupy relative to the heatmap,
specified as a scalar between 0
and 1
or
two-element vector. If you specify a scalar, the function uses it as the ratio for both
row and column dendrograms. If you specify a two-element vector, the function uses the
first element for the ratio of the row dendrogram width to the heatmap width, and the
second element for the ratio of the column dendrogram height to the heatmap height. The
second element is ignored for one-dimensional clustergrams.
Example: 0.5
Data Types: double
Linkage
— Linkage method to create hierarchical cluster tree
'average'
(default) | character vector | two-element cell array of character vectors
Linkage method passed to the linkage
function to create the hierarchical cluster tree for rows and
columns, specified as a character vector or two-element cell array of character vectors.
If you specify a cell array, the function uses the first element for linkage between
rows, and the second element for linkage between columns.
Example: 'centroid'
Data Types: char
| cell
LogTrans
— Flag to log2 transform data
false
(default) | true
Flag to log2 transform the data from natural scale, specified
as true
or false
.
Example: true
Data Types: logical
OptimalLeafOrder
— Flag to calculate optimal leaf order
true
| false
Flag to calculate the optimal leaf order that maximizes the similarity between
neighboring leaves, specified as true
or false
.
The default value depends on the size of the input data
. If the
number of rows or columns in data
exceeds 1500, the default value
is false
. Otherwise, the default value is
true
.
Disabling the optimal leaf ordering calculation can be useful when working with large datasets because this calculation consumes a lot of memory and time.
Example: true
Data Types: logical
RowGroupMarker
— Information for annotating groups of rows
structure | structure array
Information for annotating groups of rows, specified as a structure or structure array.
If you specify a single structure, each field must contain a cell array of elements. If you specify a structure array, each structure must have a single element in each field.
The fields are
GroupNumber
— Scalar specifying the column group number to annotate.Annotation
— Character vector specifying text to annotate the column group.Color
— Character vector or three-element vector of RGB values specifying a color to label the column group. For more information on specifying colors, see Color Options. If this field is empty, the default value is'blue'
.
Data Types: struct
RowPDist
— Distance metric to pass to pdist
function
'euclidean'
(default) | character vector | cell array
Distance metric to pass to the pdist
function to calculate the pairwise distances between rows, specified
as a character vector or cell array. Specify a cell array if the distance metric
requires extra arguments. For example, to use the Minkowski distance with an exponent
p, specify
{'minkowski',p}
.
Example: 'jaccard'
Data Types: char
| cell
ShowDendrogram
— Flag to show dendrogram tree diagrams with clustergram
'on'
(default) | 'off'
Flag to show the dendrogram tree diagrams with the clustergram, specified as
'on'
or 'off'
.
Example: 'off'
Data Types: char
Object Functions
Examples
Perform Hierarchical Clustering on Gene Expression Data
Load microarray data containing gene expression levels of Saccharomyces cerevisiae (yeast) during the metabolic shift from fermentation to respiration [1].
load filteredyeastdata
This MAT file includes three variables, which are added to the MATLAB® workspace:
- yeastvalues
- A matrix of gene expression data from Saccharomyces -_cerevisiae_ during the metabolic shift from fermentation to respiration - genes
- A cell array of GenBank® accession numbers for labeling the rows in yeastvalues
- times
- A vector of time values for labeling the columns in yeastvalues
Create a clustergram object to display the heat map from the gene expression data in the first 30 rows of the yeastvalues
matrix and standardize along the rows of data.
cgo = clustergram(yeastvalues(1:30,:),'Standardize','Row')
Clustergram object with 30 rows of nodes and 7 columns of nodes.
Use the set
method and the genes
and times
vectors to add meaningful row and column labels to the clustergram.
set(cgo,'RowLabels',genes(1:30),'ColumnLabels',times)
Add a color bar to the clustergram by clicking the Insert Colorbar
button on the toolbar.
View a data tip containing the intensity value, row label, and column label for a specific area of the heat map by clicking the Data Cursor
button on the toolbar, then clicking an area in the heat map. To delete this data tip, right-click it, then select Delete Current Datatip
.
Display intensity values for each area of the heat map by clicking the Annotate button on the toolbar. Click the Annotate button again to remove the intensity values.
Tip: If the amount of data is large enough, the cells within the clustergram are too small to display the intensity annotations. Zoom in to see the intensity annotations.
Remove the dendrogram tree diagrams from the figure by clicking the Show Dendrogram button on the toolbar. Click it again to display the dendrograms.
Use the get
method to display the properties of the clustergram object, cgo
.
get(cgo)
Cluster: 'ALL' RowPDist: {'Euclidean'} ColumnPDist: {'Euclidean'} Linkage: {'Average'} Dendrogram: {} OptimalLeafOrder: 1 LogTrans: 0 DisplayRatio: [0.2000 0.2000] RowGroupMarker: [] ColumnGroupMarker: [] ShowDendrogram: 'on' Standardize: 'ROW' Symmetric: 1 DisplayRange: 3 Colormap: [11x3 double] ImputeFun: [] ColumnLabels: {1x7 cell} RowLabels: {30x1 cell} ColumnLabelsRotate: 90 RowLabelsRotate: 0 Annotate: 'off' AnnotPrecision: 2 AnnotColor: 'w' ColumnLabelsColor: [] RowLabelsColor: [] LabelsWithMarkers: 0
Change the clustering parameters by changing the linkage method and changing the color of the groups of nodes in the dendrogram whose linkage is less than a threshold of 3.
set(cgo,'Linkage','complete','Dendrogram',3)
Place the cursor on a branch node in the dendrogram to highlight (in blue) the group associated with it. Press and hold the mouse button to display a data tip listing the group number and the nodes (genes or samples) in the group.
Right-click a branch node in the dendrogram to display a menu of options.
The following options are available:
- Set Group Color - Change the cluster group color. - Print Group to Figure - Print the group to a figure window. - Copy Group to New Clustergram - Copy the group to a new clustergram window. - Export Group to Workspace - Create a clustergram object of the group in the MATLAB workspace. - Export Group Info to Workspace - Create a structure containing information about the group in the MATLAB workspace. The structure contains these fields:
- GroupNames
- Cell array of character vectors containing the names of the row or column groups. - RowNodeNames
- Cell array of character vectors containing the names of the row nodes. - ColumnNodeNames
- Cell array of character vectors containing the names of the column nodes. - ExprValues
- An M-by-N matrix of intensity values, where M and N are the number of row nodes and of column nodes respectively. If the matrix contains gene expression data, typically each row corresponds to a gene and each column corresponds to sample.
Create a clustergram object for Group 18 in the MATLAB workspace. Right-click Group 18, then select Export Group to Workspace. In the Export to Workspace dialog box, type Group18
, then click OK.
Use the view
method to view the clustergram object, Group18
.
view(Group18)
View all the gene expression data using a diverging red and blue colormap and standardize along the rows of data.
cgo_all = clustergram(yeastvalues,'Colormap',redbluecmap,'Standardize','Row')
Clustergram object with 614 rows of nodes and 7 columns of nodes.
Create structure arrays to specify marker colors and annotations for two groups of rows (510 and 593) and two groups of columns (4 and 5).
rm = struct('GroupNumber',{510,593},'Annotation',{'A','B'},... 'Color',{'b','m'}); cm = struct('GroupNumber',{4,5},'Annotation',{'Time1','Time2'},... 'Color',{[1 1 0],[0.6 0.6 1]});
Use the RowGroupMarker
and ColumnGroupMarker
properties to add the color markers and annotations to the clustergram.
set(cgo_all,'RowGroupMarker',rm,'ColumnGroupMarker',cm)
More About
Color Options
The following lists the predefined colors and their RGB triplet equivalents. The short names and long names are character vectors that specify one of eight preset colors. The RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color; the intensities must be in the range [0 1].
RGB Triplet | Short Name | Long Name |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
[1] DeRisi, J. L. “Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale.” Science 278, no. 5338 (October 24, 1997): 680–86.
Version History
Introduced before R2006a
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)