## Prediction Using Discriminant Analysis Models

`predict`

uses three quantities to classify observations: posterior probability, prior probability, and cost.

`predict`

classifies so as to minimize the expected classification cost:

$$\widehat{y}=\underset{y=1,\mathrm{...},K}{\mathrm{arg}\mathrm{min}}{\displaystyle \sum _{k=1}^{K}\widehat{P}\left(k|x\right)C\left(y|k\right)},$$

where

$$\widehat{y}$$ is the predicted classification.

*K*is the number of classes.$$\widehat{P}\left(k|x\right)$$ is the posterior probability of class

*k*for observation*x*.$$C\left(y|k\right)$$ is the cost of classifying an observation as

*y*when its true class is*k*.

The space of `X`

values divides into regions where a classification `Y`

is a particular value. The regions are separated by straight lines for linear discriminant analysis, and by conic sections (ellipses, hyperbolas, or parabolas) for quadratic discriminant analysis. For a visualization of these regions, see Create and Visualize Discriminant Analysis Classifier.

### Posterior Probability

The posterior probability that a point *x* belongs to class *k* is the product of the prior probability and the multivariate normal density. The density function of the multivariate normal with 1-by-*d* mean *μ _{k}* and

*d*-by-

*d*covariance Σ

*at a 1-by-*

_{k}*d*point

*x*is

$$P\left(x|k\right)=\frac{1}{{\left({\left(2\pi \right)}^{d}\left|{\Sigma}_{k}\right|\right)}^{1/2}}\mathrm{exp}\left(-\frac{1}{2}\left(x-{\mu}_{k}\right){\Sigma}_{k}^{-1}{\left(x-{\mu}_{k}\right)}^{T}\right),$$

where $$\left|{\Sigma}_{k}\right|$$ is the determinant of Σ* _{k}*, and $${\Sigma}_{k}^{-1}$$ is the inverse matrix.

Let *P*(*k*) represent the prior probability of class *k*. Then the posterior probability that an observation *x* is of class *k* is

$$\widehat{P}\left(k|x\right)=\frac{P\left(x|k\right)P\left(k\right)}{P\left(x\right)},$$

where *P*(*x*) is a normalization constant, namely, the sum over *k* of *P*(*x*|*k*)*P*(*k*).

### Prior Probability

The prior probability is one of three choices:

`'uniform'`

— The prior probability of class`k`

is 1 over the total number of classes.`'empirical'`

— The prior probability of class`k`

is the number of training samples of class`k`

divided by the total number of training samples.A numeric vector — The prior probability of class

`k`

is the`j`

th element of the`Prior`

vector. See`fitcdiscr`

.

After creating a classifier `obj`

, you can set the prior using dot notation:

obj.Prior = v;

where `v`

is a vector of positive elements representing the frequency with which each element occurs. You do not need to retrain the classifier when you set a new prior.

### Cost

There are two costs associated with discriminant analysis classification: the true misclassification cost per class, and the expected misclassification cost per observation.

#### True Misclassification Cost per Class

`Cost(i,j)`

is the cost of classifying an observation into class `j`

if its true class is `i`

. By default, `Cost(i,j)=1`

if `i~=j`

, and `Cost(i,j)=0`

if `i=j`

. In other words, the cost is `0`

for correct classification, and `1`

for incorrect classification.

You can set any cost matrix you like when creating a classifier. Pass the cost matrix in the `Cost`

name-value pair in `fitcdiscr`

.

After you create a classifier `obj`

, you can set a custom cost using dot notation:

obj.Cost = B;

`B`

is a square matrix of size `K`

-by-`K`

when there are `K`

classes. You do not need to retrain the classifier when you set a new cost.

#### Expected Misclassification Cost per Observation

Suppose you have `Nobs`

observations that you want to classify with a trained discriminant analysis classifier `obj`

. Suppose you have `K`

classes. You place the observations into a matrix `Xnew`

with one observation per row. The command

[label,score,cost] = predict(obj,Xnew)

returns, among other outputs, a cost matrix of size `Nobs`

-by-`K`

. Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of the `K`

classes. `cost(n,k)`

is

$$\sum _{i=1}^{K}\widehat{P}\left(i|X(n)\right)C\left(k|i\right)},$$

where

*K*is the number of classes.$$\widehat{P}\left(i|X(n)\right)$$ is the posterior probability of class

*i*for observation*Xnew*(*n*).$$C\left(k|i\right)$$ is the cost of classifying an observation as

*k*when its true class is*i*.