# average

Compute performance metrics for average receiver operating characteristic (ROC) curve in multiclass problem

*Since R2022a*

## Description

`[`

computes the averages of performance metrics stored in the `FPR`

,`TPR`

,`Thresholds`

,`AUC`

] = average(`rocObj`

,`type`

)`rocmetrics`

object
`rocObj`

for a multiclass classification problem using the averaging
method specified in `type`

. The function returns the average false
positive rate (`FPR`

) and the average true positive rate
(`TPR`

) for each threshold value in `Thresholds`

.
The function also returns `AUC`

, the area under the ROC curve composed of
`FPR`

and `TPR`

.

## Examples

### Find Average ROC Curve

Compute the performance metrics for a multiclass classification problem by creating a `rocmetrics`

object, and then compute the average values for the metrics by using the `average`

function. Plot the average ROC curve using the outputs of `average`

.

Load the `fisheriris`

data set. The matrix `meas`

contains flower measurements for 150 different flowers. The vector `species`

lists the species for each flower. `species`

contains three distinct flower names.

`load fisheriris`

Train a classification tree that classifies observations into one of the three labels. Cross-validate the model using 10-fold cross-validation.

rng("default") % For reproducibility Mdl = fitctree(meas,species,Crossval="on");

Compute the classification scores for validation-fold observations.

[~,Scores] = kfoldPredict(Mdl); size(Scores)

`ans = `*1×2*
150 3

The output `Scores`

is a matrix of size `150`

-by-`3`

. The column order of `Scores`

follows the class order in `Mdl`

, stored in `Mdl.ClassNames`

.

Create a `rocmetrics`

object by using the true labels in `species`

and the classification scores in `Scores`

. Specify the column order of `Scores`

using `Mdl.ClassNames`

.

rocObj = rocmetrics(species,Scores,Mdl.ClassNames);

`rocmetrics`

computes the FPR and TPR at different thresholds and finds the AUC value for each class.

Compute the average performance metric values, including the FPR and TPR at different thresholds and the AUC value, using the macro-averaging method.

`[FPR,TPR,Thresholds,AUC] = average(rocObj,"macro");`

Plot the average ROC curve and display the average AUC value. Include (0,0) so that the curve starts from the origin `(0,0)`

.

plot([0;FPR],[0;TPR]) xlabel("False Positive Rate") ylabel("True Positive Rate") title("Average ROC Curve") hold on plot([0,1],[0,1],"k--") legend(join(["Macro-average (AUC =",AUC,")"]), ... Location="southeast") axis padded hold off

Alternatively, you can create the average ROC curve by using the `plot`

function. Specify `AverageROCType="macro"`

to compute the metrics for the average ROC curve using the macro-averaging method.

`plot(rocObj,AverageROCType="macro",ClassNames=[])`

## Input Arguments

`rocObj`

— Object evaluating classification performance

`rocmetrics`

object

Object evaluating classification performance, specified as a `rocmetrics`

object.

`type`

— Averaging method

`"micro"`

| `"macro"`

| `"weighted"`

Averaging method, specified as `"micro"`

,
`"macro"`

, or `"weighted"`

.

`"micro"`

(micro-averaging) —`average`

finds the average performance metrics by treating all one-versus-all binary classification problems as one binary classification problem. The function computes the confusion matrix components for the combined binary classification problem, and then computes the average FPR and TPR using the values of the confusion matrix.`"macro"`

(macro-averaging) —`average`

computes the average values for FPR and TPR by averaging the values of all one-versus-all binary classification problems.`"weighted"`

(weighted macro-averaging) —`average`

computes the weighted average values for FPR and TPR using the macro-averaging method and using the prior class probabilities (the`Prior`

property of`rocObj`

) as weights.

The algorithm type determines the length of the vectors for the output arguments
(`FPR`

, `TPR`

, and
`Thresholds`

). For more details, see Average of Performance Metrics.

**Data Types: **`char`

| `string`

## Output Arguments

`FPR`

— Average false positive rates

numeric vector

Average false positive rates, returned as a numeric vector.

`TPR`

— Average true positive rates

numeric vector

Average true positive rates, returned as a numeric vector.

`AUC`

— Area under average ROC curve

numeric scalar

Area under the average ROC curve composed of `FPR`

and
`TPR`

, returned as a numeric scalar.

## More About

### Receiver Operating Characteristic (ROC) Curve

A ROC curve shows the true positive rate versus the false positive rate for different thresholds of classification scores.

The true positive rate and the false positive rate are defined as follows:

True positive rate (TPR), also known as recall or sensitivity —

`TP/(TP+FN)`

, where TP is the number of true positives and FN is the number of false negativesFalse positive rate (FPR), also known as fallout or 1-specificity —

`FP/(TN+FP)`

, where FP is the number of false positives and TN is the number of true negatives

Each point on a ROC curve corresponds to a pair of TPR and FPR values for a specific
threshold value. You can find different pairs of TPR and FPR values by varying the
threshold value, and then create a ROC curve using the pairs. For each class,
`rocmetrics`

uses all distinct adjusted score values
as threshold values to create a ROC curve.

For a multiclass classification problem, `rocmetrics`

formulates a set
of one-versus-all binary
classification problems to have one binary problem for each class, and finds a ROC
curve for each class using the corresponding binary problem. Each binary problem
assumes one class as positive and the rest as negative.

For a binary classification problem, if you specify the classification scores as a
matrix, `rocmetrics`

formulates two one-versus-all binary
classification problems. Each of these problems treats one class as a positive class
and the other class as a negative class, and `rocmetrics`

finds two
ROC curves. Use one of the curves to evaluate the binary classification
problem.

For more details, see ROC Curve and Performance Metrics.

### Area Under ROC Curve (AUC)

The area under a ROC curve (AUC) corresponds to the integral of a ROC curve
(TPR values) with respect to FPR from `FPR`

= `0`

to `FPR`

= `1`

.

The AUC provides an aggregate performance measure across all possible thresholds. The AUC
values are in the range `0`

to `1`

, and larger AUC values
indicate better classifier performance.

### One-Versus-All (OVA) Coding Design

The one-versus-all (OVA) coding design reduces a multiclass classification
problem to a set of binary classification problems. In this coding design, each binary
classification treats one class as positive and the rest of the classes as negative.
`rocmetrics`

uses the OVA coding design for multiclass classification and
evaluates the performance on each class by using the binary classification that the class is
positive.

For example, the OVA coding design for three classes formulates three binary classifications:

$$\begin{array}{cccc}& \text{Binary1}& \text{Binary}2& \text{Binary3}\\ \text{Class1}& 1& -1& -1\\ \text{Class2}& -1& 1& -1\\ \text{Class3}& -1& -1& 1\end{array}$$

Each row corresponds to a class, and each column corresponds to a binary
classification problem. The first binary classification assumes that class 1 is a positive
class and the rest of the classes are negative. `rocmetrics`

evaluates the
performance on the first class by using the first binary classification problem.

## Algorithms

### Adjusted Scores for Multiclass Classification Problem

For each class, `rocmetrics`

adjusts the classification scores (input argument
`Scores`

of `rocmetrics`

) relative to the scores for the rest
of the classes if you specify `Scores`

as a matrix. Specifically, the
adjusted score for a class given an observation is the difference between the score for the
class and the maximum value of the scores for the rest of the classes.

For example, if you have [*s*_{1},*s*_{2},*s*_{3}] in a row of `Scores`

for a classification problem with
three classes, the adjusted score values are [*s*_{1}-`max`

(*s*_{2},*s*_{3}),*s*_{2}-`max`

(*s*_{1},*s*_{3}),*s*_{3}-`max`

(*s*_{1},*s*_{2})].

`rocmetrics`

computes the performance metrics using the adjusted score values
for each class.

For a binary classification problem, you can specify `Scores`

as a
two-column matrix or a column vector. Using a two-column matrix is a simpler option because
the `predict`

function of a classification object returns classification
scores as a matrix, which you can pass to `rocmetrics`

. If you pass scores in
a two-column matrix, `rocmetrics`

adjusts scores in the same way that it
adjusts scores for multiclass classification, and it computes performance metrics for both
classes. You can use the metric values for one of the two classes to evaluate the binary
classification problem. The metric values for a class returned by
`rocmetrics`

when you pass a two-column matrix are equivalent to the
metric values returned by `rocmetrics`

when you specify classification scores
for the class as a column vector.

## Alternative Functionality

You can use the

`plot`

function to create the average ROC curve. The function returns a`ROCCurve`

object containing the`XData`

,`YData`

,`Thresholds`

, and`AUC`

properties, which correspond to the output arguments`FPR`

,`TPR`

,`Thresholds`

, and`AUC`

of the`average`

function, respectively. For an example, see Plot Average ROC Curve for Multiclass Classifier.

## References

[1] Sebastiani, Fabrizio. "Machine Learning in Automated Text Categorization." *ACM Computing Surveys* 34, no. 1 (March 2002): 1–47.

## Version History

**Introduced in R2022a**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)