Rank features for unsupervised learning using Laplacian scores
ranks features (variables) in idx
= fsulaplacian(X
)X
using the Laplacian scores. The
function returns idx
, which contains the indices of features ordered by
feature importance. You can use idx
to select important features for
unsupervised learning.
specifies additional options using one or more namevalue pair arguments. For example, you
can specify idx
= fsulaplacian(X
,Name,Value
)'NumNeighbors',10
to create a similarity graph using 10 nearest neighbors.
Load the sample data.
load ionosphere
Rank the features based on importance.
[idx,scores] = fsulaplacian(X);
Create a bar plot of the feature importance scores.
bar(scores(idx)) xlabel('Feature rank') ylabel('Feature importance score')
Select the top five most important features. Find the columns of these features in X
.
idx(1:5)
ans = 1×5
15 13 17 21 19
The 15th column of X
is the most important feature.
Compute a similarity matrix from Fisher's iris data set and rank the features using the similarity matrix.
Load Fisher's iris data set.
load fisheriris
Find the distance between each pair of observations in meas
by using the pdist
and squareform
functions with the default Euclidean distance metric.
D = pdist(meas); Z = squareform(D);
Construct the similarity matrix and confirm that it is symmetric.
S = exp(Z.^2); issymmetric(S)
ans = logical
1
Rank the features.
idx = fsulaplacian(meas,'Similarity',S)
idx = 1×4
3 4 1 2
Ranking using the similarity matrix S
is the same as ranking by specifying 'NumNeighbors'
as size(meas,1)
.
idx2 = fsulaplacian(meas,'NumNeighbors',size(meas,1))
idx2 = 1×4
3 4 1 2
X
— Input dataInput data, specified as an nbyp numeric
matrix. The rows of X
correspond to observations (or points), and the
columns correspond to features.
The software treats NaN
s in X
as missing
data and ignores any row of X
containing at least one
NaN
.
Data Types: single
 double
Specify optional
commaseparated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'NumNeighbors',10,'KernelScale','auto'
specifies the number
of nearest neighbors as 10 and the kernel scale factor as
'auto'
.'Similarity'
— Similarity matrix[]
(empty matrix) (default)  symmetric matrixSimilarity matrix, specified as the commaseparated pair consisting of
'Similarity'
and an
nbyn symmetric matrix, where
n is the number of observations. The similarity matrix (or
adjacency matrix) represents the input data by modeling local neighborhood
relationships among the data points. The values in a similarity matrix represent the
edges (or connections) between nodes (data points) that are connected in a similarity graph. For more information,
see Similarity Matrix.
If you specify the 'Similarity'
value, then you cannot
specify any other namevalue pair argument. If you do not specify the
'Similarity'
value, then the software computes a similarity
matrix using the options specified by the other namevalue pair arguments.
Data Types: single
 double
'Distance'
— Distance metricDistance metric, specified as the commaseparated pair consisting of
'Distance'
and a character vector, string scalar, or function
handle, as described in this table.
Value  Description 

'euclidean'  Euclidean distance (default) 
'seuclidean'  Standardized Euclidean distance. Each coordinate difference between
observations is scaled by dividing by the corresponding element of the
standard deviation computed from 
'mahalanobis'  Mahalanobis distance using the sample covariance of

'cityblock'  City block distance 
'minkowski'  Minkowski distance. The default exponent is 2. Use the

'chebychev'  Chebychev distance (maximum coordinate difference) 
'cosine'  One minus the cosine of the included angle between observations (treated as vectors) 
'correlation'  One minus the sample correlation between observations (treated as sequences of values) 
'hamming'  Hamming distance, which is the percentage of coordinates that differ 
'jaccard'  One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ 
'spearman'  One minus the sample Spearman's rank correlation between observations (treated as sequences of values) 
@  Custom distance function handle. A distance function has the form function D2 = distfun(ZI,ZJ) % calculation of distance ...
If your data is not sparse, you can generally compute distance more quickly by using a builtin distance instead of a function handle. 
For more information, see Distance Metrics.
When you use the 'seuclidean'
, 'minkowski'
,
or 'mahalanobis'
distance metric, you can specify the additional
namevalue pair argument 'Scale'
, 'P'
, or
'Cov'
, respectively, to control the distance metrics.
Example: 'Distance','minkowski','P',3
specifies to use the
Minkowski distance metric with an exponent of 3
.
'P'
— Exponent for Minkowski distance metric2
(default)  positive scalarExponent for the Minkowski distance metric, specified as the commaseparated pair consisting of 'P'
and a positive scalar.
This argument is valid only if 'Distance'
is 'minkowski'
.
Example: 'P',3
Data Types: single
 double
'Cov'
— Covariance matrix for Mahalanobis distance metricnancov(X)
(default)  positive definite matrixCovariance matrix for the Mahalanobis distance metric, specified as the commaseparated pair
consisting of 'Cov'
and a positive definite matrix.
This argument is valid only if 'Distance'
is 'mahalanobis'
.
Example: 'Cov',eye(4)
Data Types: single
 double
'Scale'
— Scaling factors for standardized Euclidean distance metricnanstd(X)
(default)  numeric vector of nonnegative valuesScaling factors for the standardized Euclidean distance metric, specified as the
commaseparated pair consisting of 'Scale'
and a numeric vector of
nonnegative values.
Scale
has length p (the number of columns in
X
), because each dimension (column) of X
has a
corresponding value in Scale
. For each dimension of
X
, fsulaplacian
uses the corresponding value
in Scale
to standardize the difference between observations.
This argument is valid only if 'Distance'
is 'seuclidean'
.
Data Types: single
 double
'NumNeighbors'
— Number of nearest neighborslog(size(X,1))
(default)  positive integerNumber of nearest neighbors used to construct the similarity
graph, specified as the commaseparated pair consisting of 'NumNeighbors'
and a positive integer.
Example: 'NumNeighbors',10
Data Types: single
 double
'KernelScale'
— Scale factor1
(default)  'auto'
 positive scalarScale factor for the kernel, specified as the commaseparated pair consisting of 'KernelScale'
and 'auto'
or a positive scalar. The software uses the scale factor to transform distances to similarity measures. For more information, see Similarity Graph.
The 'auto'
option is supported only for the 'euclidean'
and 'seuclidean'
distance metrics.
If you specify 'auto'
, then the software selects an appropriate
scale factor using a heuristic procedure. This heuristic procedure uses subsampling,
so estimates can vary from one call to another. To reproduce results, set a random
number seed using rng
before calling
fsulaplacian
.
Example: 'KernelScale','auto'
idx
— Indices of features ordered by feature importanceIndices of the features in X
ordered by feature importance,
returned as a numeric vector. For example, if idx(3)
is
5
, then the third most important feature is the fifth column in
X
.
scores
— Feature scoresFeature scores, returned as a numeric vector. A large score value in
scores
indicates that the corresponding feature is important. The
values in scores
have the same order as the features in
X
.
A similarity graph models the local neighborhood relationships between data points in X
as an undirected graph. The nodes in the graph represent data points, and the edges, which are directionless, represent the connections between the data points.
If the pairwise distance Dist_{i,j} between any two nodes i and j is positive (or larger than a certain threshold), then the similarity graph connects the two nodes using an edge [2]. The edge between the two nodes is weighted by the pairwise similarity S_{i,j}, where $${S}_{i,j}=\mathrm{exp}\left({\left(\frac{Dis{t}_{i,j}}{\sigma}\right)}^{2}\right)$$, for a specified kernel scale σ value.
fsulaplacian
constructs a similarity graph using the nearest
neighbor method. The function connects points in X
that are nearest
neighbors. Use 'NumNeighbors'
to specify the number of nearest
neighbors.
A similarity matrix is a matrix representation of a similarity graph. The nbyn matrix $$S={({S}_{i,j})}_{i,j=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}n}$$ contains pairwise similarity values between connected nodes in the similarity graph. The similarity matrix of a graph is also called an adjacency matrix.
The similarity matrix is symmetric because the edges of the similarity graph are
directionless. A value of S_{i,j} =
0
means that nodes i and j of the
similarity graph are not connected.
A degree matrix D_{g} is an nbyn diagonal matrix obtained by summing the rows of the similarity matrix S. That is, the ith diagonal element of D_{g} is $${D}_{g}(i,i)={\displaystyle \sum _{j=1}^{n}{S}_{i,j}}.$$
A Laplacian matrix, which is one way of representing a similarity graph, is defined as the difference between the degree matrix D_{g} and the similarity matrix S.
$$L={D}_{g}S.$$
The fsulaplacian
function ranks features using Laplacian
scores[1] obtained from a nearest
neighbor similarity graph.
fsulaplacian
computes the values in scores
as follows:
For each data point in X
, define a local neighborhood using
the nearest neighbor method, and find pairwise distances $$Dis{t}_{i,j}$$ for all points i and j in the
neighborhood.
Convert the distances to the similarity matrix
S using the kernel transformation $${S}_{i,j}=\mathrm{exp}\left({\left(\frac{Dis{t}_{i,j}}{\sigma}\right)}^{2}\right)$$, where σ is the scale factor for the kernel as
specified by the 'KernelScale'
namevalue pair argument.
Center each feature by removing its mean.
$${\tilde{x}}_{r}={x}_{r}\frac{{x}_{r}^{T}{D}_{g}1}{{1}^{T}{D}_{g}1}1,$$
where x_{r} is the rth feature, D_{g} is the degree matrix, and $${1}^{T}={[1,\cdots ,1]}^{T}$$.
Compute the score s_{r} for each feature.
$${s}_{r}=\frac{{\tilde{x}}_{r}^{T}S{\tilde{x}}_{r}}{{\tilde{x}}_{r}^{T}{D}_{g}{\tilde{x}}_{r}}.$$
Note that [1] defines the Laplacian score as
$${L}_{r}=\frac{{\tilde{x}}_{r}^{T}L{\tilde{x}}_{r}}{{\tilde{x}}_{r}^{T}{D}_{g}{\tilde{x}}_{r}}=1\frac{{\tilde{x}}_{r}^{T}S{\tilde{x}}_{r}}{{\tilde{x}}_{r}^{T}{D}_{g}{\tilde{x}}_{r}},$$
where L is the Laplacian matrix, defined as
the difference between D_{g} and
S. The fsulaplacian
function uses only the second
term of this equation for the score value of scores
so that a large
score value indicates an important feature.
Selecting features using the Laplacian score is consistent with minimizing the value
$$\frac{{\displaystyle {\sum}_{i,j}{\left({x}_{ir}{x}_{jr}\right)}^{2}{S}_{i,j}}}{Var({x}_{r})},$$
where x_{ir} represents the ith observation of the rth feature. Minimizing this value implies that the algorithm prefers features with large variance. Also, the algorithm assumes that two data points of an important feature are close if and only if the similarity graph has an edge between the two data points.
[1] He, X., D. Cai, and P. Niyogi. "Laplacian Score for Feature Selection." NIPS Proceedings. 2005.
[2] Von Luxburg, U. “A Tutorial on Spectral Clustering.” Statistics and Computing Journal. Vol.17, Number 4, 2007, pp. 395–416.
A modified version of this example exists on your system. Do you want to open this version instead?
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
Select web siteYou can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.