Cluster Data
Cluster data using k-means algorithm in the Live Editor
Description
The Cluster Data Live Editor Task enables you to interactively perform k-means clustering. The task generates MATLAB® code for your live script and returns the resulting cluster indices and the cluster centroid locations to the MATLAB workspace.
You can:
Determine the optimal number of clusters for your data manually by selecting the number of clusters or automatically by specifying criteria such as gap values, silhouette values, Davies-Bouldin index values, and Calinski-Harabasz index values.
Customize the parameters for clustering your data, including the distance metric and the number of replicates.
Automatically visualize the clustered data.
For general information about Live Editor tasks, see Add Interactive Tasks to a Live Script.
Open the Task
To add the Cluster Data task to a live script:
On the Live Editor tab, select Task > Cluster Data.
In a code block in the live script, type a relevant keyword, such as
clustering
orkmeans
. Select Cluster Data from the suggested command completions.
Parameters
Input data
— Data to cluster
numeric matrix
Specify the data to cluster by selecting a variable from the available workspace variables. The variable must be a numeric matrix to appear in the list.
Selection Method
— Cluster selection method
Manual
(default) | Optimal
Specify the method for determining the optimal number of clusters for your data.
Manual
— Specify the number of clusters to group your data into manually.Optimal
— Use theevalclusters
function to find the optimal number of clusters based on criteria such as gap values, silhouette values, Davies-Bouldin index values, and Calinski-Harabasz index values.
Range
— List of number of clusters to evaluate
2:5
(default) | min and max positive integer values
Specify the list of number of clusters to evaluate as a range consisting of a min
value and a max value. For example, if you specify a min value of 2
and a max value of 6
, the task evaluates the number of clusters 2, 3,
4, 5, and 6 to determine the optimal number.
Plots to show
— Plots to show results with
check boxes
To display the clustered data, select from the available options:
Select 2D scatter plot (PCA) to display the principle components of the clustered data in a 2D scatter plot. The Cluster Data task uses the
gscatter
function to create the scatter plot.Select Matrix of scatter plots to display the clustered data in a matrix of scatter plots. When you select Matrix of scatter plots, a list appears to the right of the check box. Each item in the list represents a column in the specified input data. Press the Ctrl key and select a maximum of four input data columns from the list. The Cluster Data task uses the
pca
andgplotmatrix
functions to create the matrix of scatter plots from the selected columns.The scatter plots in the matrix compare the selected input data columns across cluster indices. The diagonal plots in the matrix are histograms showing the distribution of the selected columns for each cluster indices.
Tips
By default, the Cluster Data task does not automatically run when you modify the task parameters. To have the task run automatically after any change, select the autorun
button at the top-right of the task. If your dataset is large, do not enable this option.
Version History
Introduced in R2021b