relieff

Rank importance of predictors using ReliefF or RReliefF algorithm

Syntax

[idx,weights] = relieff(X,y,k)

[idx,weights] = relieff(X,y,k,Name,Value)

Description

[idx,weights] = relieff(X,y,k) ranks predictors using either the ReliefF or RReliefF algorithm with k nearest neighbors. The input matrix X contains predictor variables, and the vector y contains a response vector. The function returns idx, which contains the indices of the most important predictors, and weights, which contains the weights of the predictors.

If y is numeric, relieff performs RReliefF analysis for regression by default. Otherwise, relieff performs ReliefF analysis for classification using k nearest neighbors per class. For more information on ReliefF and RReliefF, see Algorithms.

example

[idx,weights] = relieff(X,y,k,Name,Value) specifies additional options using one or more name-value pair arguments. For example, 'updates',10 sets the number of observations randomly selected for computing weights to 10.

example

Examples

collapse all

Determine Important Predictors

Open Live Script

Load the sample data.

load fisheriris

Find the important predictors using 10 nearest neighbors.

[idx,weights] = relieff(meas,species,10)

idx = 1×4

     4     3     1     2

weights = 1×4

    0.1399    0.1226    0.3590    0.3754

idx shows the predictor numbers listed according to their ranking. The fourth predictor is the most important, and the second predictor is the least important. weights gives the weight values in the same order as the predictors. The first predictor has a weight of 0.1399, and the fourth predictor has a weight of 0.3754.

Rank Predictors by Importance

Open Live Script

Load the sample data.

load ionosphere

Rank the predictors based on importance using 10 nearest neighbors.

[idx,weights] = relieff(X,Y,10);

Create a bar plot of predictor importance weights.

bar(weights(idx))
xlabel('Predictor rank')
ylabel('Predictor importance weight')

Figure contains an axes object. The axes object with xlabel Predictor rank, ylabel Predictor importance weight contains an object of type bar.

Select the top 5 most important predictors. Find the columns of these predictors in X.

idx(1:5)

ans = 1×5

    24     3     8     5    14

The 24th column of X is the most important predictor of Y.

Determine Important Categorical Predictors

Open Live Script

Rank categorical predictors using relieff.

Load the sample data.

load carbig

Convert the categorical predictor variables Mfg, Model, and Origin to numerical values, and combine them into an input matrix. Specify the response variable MPG.

X = [grp2idx(Mfg) grp2idx(Model) grp2idx(Origin)];
y = MPG;

Find the ranks and weights of the predictor variables using 10 nearest neighbors and treating the data in X as categorical.

[idx,weights] = relieff(X,y,10,'categoricalx','on')

idx = 1×3

     2     3     1

weights = 1×3

   -0.0019    0.0501    0.0114

The Model predictor is the most important in predicting MPG. The Mfg variable has a negative weight, indicating it is not a good predictor of MPG.

Input Arguments

collapse all

`X` — Predictor data
numeric matrix

Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one variable.

Data Types: single | double

`y` — Response data
numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors

Response data, specified as a numeric vector, categorical vector, logical vector, character array, string array, or cell array of character vectors.

`k` — Number of nearest neighbors
positive integer scalar

Number of nearest neighbors, specified as a positive integer scalar.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: relieff(X,y,5,'method','classification','categoricalx','on') specifies 5 nearest neighbors and treats the response variable and predictor data as categorical.

`method` — Method for computing weights
`'regression'` | `'classification'`

Method for computing weights, specified as the comma-separated pair consisting of 'method' and either 'regression' or 'classification'. If y is numeric, 'regression' is the default method. Otherwise, 'classification' is the default.

Example: 'method','classification'

`prior` — Prior probabilities for each class
`'empirical'` (default) | `'uniform'` | numeric vector | structure

Prior probabilities for each class, specified as the comma-separated pair consisting of 'prior' and a value in this table.

Value	Description
`'empirical'`	The class probabilities are determined from class frequencies in `y`.
`'uniform'`	All class probabilities are equal.
numeric vector	One value exists for each distinct group name.
structure	A structure `S` with two fields: `S.group` contains the group names as a variable of the same type as `y`. `S.prob` contains a vector of corresponding probabilities.

Example: 'prior','uniform'

Data Types: single | double | char | string | struct

`updates` — Number of observations for computing weights
`'all'` (default) | positive integer scalar

Number of observations to select at random for computing weights, specified as the comma-separated pair consisting of 'updates' and either 'all' or a positive integer scalar. By default, relieff uses all observations.

Example: 'updates',25

Data Types: single | double | char | string

`categoricalx` — Categorical predictors flag
`'off'` (default) | `'on'`

Categorical predictors flag, specified as the comma-separated pair consisting of 'categoricalx' and either 'on' or 'off'. If you specify 'on', then relieff treats all predictors in X as categorical. Otherwise, it treats all predictors in X as numeric. You cannot mix numeric and categorical predictors.

Example: 'categoricalx','on'

`sigma` — Distance scaling factor
numeric positive scalar

Distance scaling factor, specified as the comma-separated pair consisting of 'sigma' and a numeric positive scalar. For observation i, influence on the predictor weight from its nearest neighbor j is multiplied by $e^{- {(rank (i, j) / sigma)}^{2}}$ . rank(i,j) is the position of the jth observation among the nearest neighbors of the ith observation, sorted by distance. The default is Inf for classification (all nearest neighbors have the same influence) and 50 for regression.

Example: 'sigma',20

Data Types: single | double

Output Arguments

collapse all

`idx` — Indices of predictors ordered by predictor importance
numeric vector

Indices of predictors in X ordered by predictor importance, returned as a numeric vector. For example, if idx(3) is 5, then the third most important predictor is the fifth column in X.

Data Types: double

`weights` — Weights of predictors
numeric vector

Weights of the predictors, returned as a numeric vector. The values in weights have the same order as the predictors in X. weights range from –1 to 1, with large positive weights assigned to important predictors.

Data Types: double

Tips

Predictor ranks and weights usually depend on k. If you set k to 1, then the estimates can be unreliable for noisy data. If you set k to a value comparable with the number of observations (rows) in X, relieff can fail to find important predictors. You can start with k = 10 and investigate the stability and reliability of relieff ranks and weights for various values of k.
relieff removes observations with NaN values.

Algorithms

collapse all

ReliefF

ReliefF finds the weights of predictors in the case where y is a multiclass categorical variable. The algorithm penalizes the predictors that give different values to neighbors of the same class, and rewards predictors that give different values to neighbors of different classes.

ReliefF first sets all predictor weights W_j to 0. Then, the algorithm iteratively selects a random observation x_r, finds the k-nearest observations to x_r for each class, and updates, for each nearest neighbor x_q, all the weights for the predictors F_j as follows:

If x_r and x_q are in the same class,

$W_{j}^{i} = W_{j}^{i - 1} - \frac{Δ_{j} (x_{r}, x_{q})}{m} \cdot d_{r q} .$

If x_r and x_q are in different classes,

$W_{j}^{i} = W_{j}^{i - 1} + \frac{p_{y_{q}}}{1 - p_{y_{r}}} \cdot \frac{Δ_{j} (x_{r}, x_{q})}{m} \cdot d_{r q} .$

W_jⁱ is the weight of the predictor F_j at the ith iteration step.
p_{y_r} is the prior probability of the class to which x_r belongs, and p_{y_q} is the prior probability of the class to which x_q belongs.
m is the number of iterations specified by 'updates'.
$Δ_{j} (x_{r}, x_{q})$ is the difference in the value of the predictor F_j between observations x_r and x_q. Let x_rj denote the value of the jth predictor for observation x_r, and let x_qj denote the value of the jth predictor for observation x_q.
- For discrete F_j,
  
  $Δ_{j} (x_{r}, x_{q}) = {\begin{matrix} 0, & x_{r j} = x_{q j} \\ 1, & x_{r j} \neq x_{q j} \end{matrix} .$
- For continuous F_j,
  
  $Δ_{j} (x_{r}, x_{q}) = \frac{| x_{r j} - x_{q j} |}{max (F_{j}) - min (F_{j})} .$
d_rq is a distance function of the form

$d_{r q} = \frac{{\tilde{d}}_{r q}}{\sum_{l = 1}^{k} {\tilde{d}}_{r l}} .$
The distance is subject to the scaling

${\tilde{d}}_{r q} = e^{- {(rank (r, q) / sigma)}^{2}}$
where rank(r,q) is the position of the qth observation among the nearest neighbors of the rth observation, sorted by distance. k is the number of nearest neighbors, specified by k. You can change the scaling by specifying 'sigma'.

RReliefF

RReliefF works with continuous y. Similar to ReliefF, RReliefF also penalizes the predictors that give different values to neighbors with the same response values, and rewards predictors that give different values to neighbors with different response values. However, RReliefF uses intermediate weights to compute the final predictor weights.

Given two nearest neighbors, assume the following:

W_dy is the weight of having different values for the response y.
W_dj is the weight of having different values for the predictor F_j.
$W_{d y \land d j}$ is the weight of having different response values and different values for the predictor F_j.

RReliefF first sets the weights W_dy, W_dj, $W_{d y \land d j}$ , and W_j equal to 0. Then, the algorithm iteratively selects a random observation x_r, finds the k-nearest observations to x_r, and updates, for each nearest neighbor x_q, all the intermediate weights as follows:

$W_{d y}^{i} = W_{d y}^{i - 1} + Δ_{y} (x_{r}, x_{q}) \cdot d_{r q} .$

$W_{d j}^{i} = W_{d j}^{i - 1} + Δ_{j} (x_{r}, x_{q}) \cdot d_{r q} .$

$W_{d y \land d j}^{i} = W_{d y \land d j}^{i - 1} + Δ_{y} (x_{r}, x_{q}) \cdot Δ_{j} (x_{r}, x_{q}) \cdot d_{r q} .$

The i and i-1 superscripts denote the iteration step number. m is the number of iterations specified by 'updates'.
$Δ_{y} (x_{r}, x_{q})$ is the difference in the value of the continuous response y between observations x_r and x_q. Let y_r denote the value of the response for observation x_r, and let y_q denote the value of the response for observation x_q.

$Δ_{y} (x_{r}, x_{q}) = \frac{| y_{r} - y_{q} |}{max (y) - min (y)} .$
The $Δ_{j} (x_{r}, x_{q})$ and d_rq functions are the same as for ReliefF.

RReliefF calculates the predictor weights W_j after fully updating all the intermediate weights.

$W_{j} = \frac{W_{d y \land d j}}{W_{d y}} - \frac{W_{d j} - W_{d y \land d j}}{m - W_{d y}} .$

For more information, see [2].

References

[1] Kononenko, I., E. Simec, and M. Robnik-Sikonja. (1997). “Overcoming the myopia of inductive learning algorithms with RELIEFF.” Retrieved from CiteSeerX: https://link.springer.com/article/10.1023/A:1008280620621

[2] Robnik-Sikonja, M., and I. Kononenko. (1997). “An adaptation of Relief for attribute estimation in regression.” Retrieved from CiteSeerX: https://www.semanticscholar.org/paper/An-adaptation-of-Relief-for-attribute-estimation-in-Robnik-Sikonja-Kononenko/9548674b6a3c601c13baa9a383d470067d40b896

[3] Robnik-Sikonja, M., and I. Kononenko. (2003). “Theoretical and empirical analysis of ReliefF and RReliefF.” Machine Learning, 53, 23–69.

Version History

Introduced in R2010b

relieff

Syntax

Description

Examples

Determine Important Predictors

Rank Predictors by Importance

Determine Important Categorical Predictors

Input Arguments

`X` — Predictor data
numeric matrix

`y` — Response data
numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors

`k` — Number of nearest neighbors
positive integer scalar

Name-Value Arguments

`method` — Method for computing weights
`'regression'` | `'classification'`

`prior` — Prior probabilities for each class
`'empirical'` (default) | `'uniform'` | numeric vector | structure

`updates` — Number of observations for computing weights
`'all'` (default) | positive integer scalar

`categoricalx` — Categorical predictors flag
`'off'` (default) | `'on'`

`sigma` — Distance scaling factor
numeric positive scalar

Output Arguments

`idx` — Indices of predictors ordered by predictor importance
numeric vector

`weights` — Weights of predictors
numeric vector

Tips

Algorithms

ReliefF

RReliefF

References

Version History

See Also

Topics

relieff

Syntax

Description

Examples

Determine Important Predictors

Rank Predictors by Importance

Determine Important Categorical Predictors

Input Arguments

X — Predictor data numeric matrix

y — Response data numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors

k — Number of nearest neighbors positive integer scalar

Name-Value Arguments

method — Method for computing weights 'regression' | 'classification'

prior — Prior probabilities for each class 'empirical' (default) | 'uniform' | numeric vector | structure

updates — Number of observations for computing weights 'all' (default) | positive integer scalar

categoricalx — Categorical predictors flag 'off' (default) | 'on'

sigma — Distance scaling factor numeric positive scalar

Output Arguments

idx — Indices of predictors ordered by predictor importance numeric vector

weights — Weights of predictors numeric vector

Tips

Algorithms

ReliefF

RReliefF

References

Version History

See Also

Topics

`X` — Predictor data
numeric matrix

`y` — Response data
numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors

`k` — Number of nearest neighbors
positive integer scalar

`method` — Method for computing weights
`'regression'` | `'classification'`

`prior` — Prior probabilities for each class
`'empirical'` (default) | `'uniform'` | numeric vector | structure

`updates` — Number of observations for computing weights
`'all'` (default) | positive integer scalar

`categoricalx` — Categorical predictors flag
`'off'` (default) | `'on'`

`sigma` — Distance scaling factor
numeric positive scalar

`idx` — Indices of predictors ordered by predictor importance
numeric vector

`weights` — Weights of predictors
numeric vector