testckfold
Compare accuracies of two classification models by repeated cross-validation
Syntax
Description
testckfold
statistically assesses the accuracies of two
classification models by repeatedly cross-validating the two models, determining the
differences in the classification loss, and then formulating the test statistic by
combining the classification loss differences. This type of test is particularly
appropriate when sample size is limited.
You can assess whether the accuracies of the classification models are different, or
whether one classification model performs better than another. Available tests include a
5-by-2 paired t test, a 5-by-2 paired F test, and
a 10-by-10 repeated cross-validation t test. For more details, see
Repeated Cross-Validation Tests. To speed up computations,
testckfold
supports parallel computing (requires a Parallel Computing Toolbox™ license).
returns
the test decision that results from conducting a 5-by-2 paired F cross-validation
test. The null hypothesis is the classification models h
= testckfold(C1
,C2
,X1
,X2
)C1
and C2
have
equal accuracy in predicting the true class labels using the predictor
and response data in the tables X1
and X2
. h
= 1
indicates
to reject the null hypothesis at the 5% significance level.
testckfold
conducts the cross-validation
test by applying C1
and C2
to
all predictor variables in X1
and X2
,
respectively. The true class labels in X1
and X2
must
be the same. The response variable names in X1
, X2
, C1.ResponseName
,
and C2.ResponseName
must be the same.
For examples of ways to compare models, see Tips.
uses
any of the input arguments in the previous syntaxes and additional
options specified by one or more h
= testckfold(___,Name,Value
)Name,Value
pair
arguments. For example, you can specify the type of alternative hypothesis,
the type of test, or the use of parallel computing.
Examples
Compare Classification Tree Predictor-Selection Algorithms
At each node, fitctree
chooses the best predictor to split using an exhaustive search by default. Alternatively, you can choose to split the predictor that shows the most evidence of dependence with the response by conducting curvature tests. This example statistically compares classification trees grown via exhaustive search for the best splits and grown by conducting curvature tests with interaction.
Load the census1994
data set.
load census1994.mat rng(1) % For reproducibility
Grow a default classification tree using the training set, adultdata
, which is a table. The response-variable name is 'salary'
.
C1 = fitctree(adultdata,'salary')
C1 = ClassificationTree PredictorNames: {'age' 'workClass' 'fnlwgt' 'education' 'education_num' 'marital_status' 'occupation' 'relationship' 'race' 'sex' 'capital_gain' 'capital_loss' 'hours_per_week' 'native_country'} ResponseName: 'salary' CategoricalPredictors: [2 4 6 7 8 9 10 14] ClassNames: [<=50K >50K] ScoreTransform: 'none' NumObservations: 32561
C1
is a full ClassificationTree
model. Its ResponseName
property is 'salary'
. C1
uses an exhaustive search to find the best predictor to split on based on maximal splitting gain.
Grow another classification tree using the same data set, but specify to find the best predictor to split using the curvature test with interaction.
C2 = fitctree(adultdata,'salary','PredictorSelection','interaction-curvature')
C2 = ClassificationTree PredictorNames: {'age' 'workClass' 'fnlwgt' 'education' 'education_num' 'marital_status' 'occupation' 'relationship' 'race' 'sex' 'capital_gain' 'capital_loss' 'hours_per_week' 'native_country'} ResponseName: 'salary' CategoricalPredictors: [2 4 6 7 8 9 10 14] ClassNames: [<=50K >50K] ScoreTransform: 'none' NumObservations: 32561
C2
also is a full ClassificationTree
model with ResponseName
equal to 'salary'
.
Conduct a 5-by-2 paired F test to compare the accuracies of the two models using the training set. Because the response-variable names in the data sets and the ResponseName
properties are all equal, and the response data in both sets are equal, you can omit supplying the response data.
h = testckfold(C1,C2,adultdata,adultdata)
h = logical
0
h = 0
indicates to not reject the null hypothesis that C1
and C2
have the same accuracies at 5% level.
Compare Accuracies of Two Different Classification Models
Conduct a statistical test comparing the misclassification rates of the two models using a 5-by-2 paired F test.
Load Fisher's iris data set.
load fisheriris;
Create a naive Bayes template and a classification tree template using default options.
C1 = templateNaiveBayes; C2 = templateTree;
C1
and C2
are template objects corresponding to the naive Bayes and classification tree algorithms, respectively.
Test whether the two models have equal predictive accuracies. Use the same predictor data for each model. testckfold
conducts a 5-by-2, two-sided, paired F test by default.
rng(1); % For reproducibility
h = testckfold(C1,C2,meas,meas,species)
h = logical
0
h = 0
indicates to not reject the null hypothesis that the two models have equal predictive accuracies.
Compare Classification Accuracies of Simple and Complex Models
Conduct a statistical test to assess whether a simpler model has better accuracy than a more complex model using a 10-by-10 repeated cross-validation t test.
Load Fisher's iris data set. Create a cost matrix that penalizes misclassifying a setosa iris twice as much as misclassifying a virginica iris as a versicolor.
load fisheriris;
tabulate(species)
Value Count Percent setosa 50 33.33% versicolor 50 33.33% virginica 50 33.33%
Cost = [0 2 2;2 0 1;2 1 0]; ClassNames = {'setosa' 'versicolor' 'virginica'};... % Specifies the order of the rows and columns in Cost
The empirical distribution of the classes is uniform, and the classification cost is slightly imbalanced.
Create two ECOC templates: one that uses linear SVM binary learners and one that uses SVM binary learners equipped with the RBF kernel.
tSVMLinear = templateSVM('Standardize',true); % Linear SVM by default tSVMRBF = templateSVM('KernelFunction','RBF','Standardize',true); C1 = templateECOC('Learners',tSVMLinear); C2 = templateECOC('Learners',tSVMRBF);
C1
and C2
are ECOC template objects. C1
is prepared for linear SVM. C2
is prepared for SVM with an RBF kernel training.
Test the null hypothesis that the simpler model (C1
) is at most as accurate as the more complex model (C2
) in terms of classification costs. Conduct the 10-by-10 repeated cross-validation test. Request to return p-values and misclassification costs.
rng(1); % For reproducibility [h,p,e1,e2] = testckfold(C1,C2,meas,meas,species,... 'Alternative','greater','Test','10x10t','Cost',Cost,... 'ClassNames',ClassNames)
h = logical
0
p = 0.1077
e1 = 10×10
0 0 0 0.0667 0 0.0667 0.1333 0 0.1333 0
0.0667 0.0667 0 0 0 0 0.0667 0 0.0667 0.0667
0 0 0 0 0 0.0667 0.0667 0.0667 0.0667 0.0667
0.0667 0.0667 0 0.0667 0 0.0667 0 0 0.0667 0
0.0667 0.0667 0.0667 0 0.0667 0.0667 0 0 0 0
0 0 0.1333 0 0 0.0667 0 0 0.0667 0.0667
0.0667 0.0667 0 0 0.0667 0 0 0.0667 0 0.0667
0.0667 0 0.0667 0.0667 0 0.1333 0 0.0667 0 0
0 0.0667 0.1333 0.0667 0.0667 0 0 0 0 0
0 0.0667 0.0667 0.0667 0.0667 0 0 0.0667 0 0
e2 = 10×10
0 0 0 0.1333 0 0.0667 0.1333 0 0.2667 0
0.0667 0.0667 0 0.1333 0 0 0 0.1333 0.1333 0.0667
0.1333 0.1333 0 0 0 0.0667 0 0.0667 0.0667 0.0667
0 0.1333 0 0.0667 0.1333 0.1333 0 0 0.0667 0
0.0667 0.0667 0.0667 0 0.0667 0.1333 0.1333 0 0 0.0667
0.0667 0 0.0667 0.0667 0 0.0667 0.1333 0 0.0667 0.0667
0.2000 0.0667 0 0 0.0667 0 0 0.1333 0 0.0667
0.2000 0 0 0.1333 0 0.1333 0 0.0667 0 0
0 0.0667 0.0667 0.0667 0.1333 0 0.2000 0 0 0
0.0667 0.0667 0 0.0667 0.1333 0 0 0.0667 0.1333 0.0667
The p-value is slightly greater than 0.10, which indicates to retain the null hypothesis that the simpler model is at most as accurate as the more complex model. This result is consistent for any significance level (Alpha
) that is at most 0.10.
e1
and e2
are 10-by-10 matrices containing misclassification costs. Row r corresponds to run r of the repeated cross validation. Column k corresponds to test-set fold k within a particular cross-validation run. For example, element (2,4) of e2
is 0.1333. This value means that in cross-validation run 2, when the test set is fold 4, the estimated test-set misclassification cost is 0.1333.
Select Features Using Statistical Accuracy Comparison
Reduce classification model complexity by selecting a subset of predictor variables (features) from a larger set. Then, statistically compare the accuracy between the two models.
Load the ionosphere
data set.
load ionosphere
Train an ensemble of 100 boosted classification trees using AdaBoostM1 and the entire set of predictors. Inspect the importance measure for each predictor.
t = templateTree('MaxNumSplits',1); % Weak-learner template tree object C = fitcensemble(X,Y,'Method','AdaBoostM1','Learners',t); predImp = predictorImportance(C); bar(predImp) h = gca; h.XTick = 1:2:h.XLim(2); title('Predictor Importances') xlabel('Predictor') ylabel('Importance measure')
Identify the top five predictors in terms of their importance.
[~,idxSort] = sort(predImp,'descend');
idx5 = idxSort(1:5);
Test whether the two models have equal predictive accuracies. Specify the reduced data set and then the full predictor data. Use parallel computing to speed up computations.
s = RandStream('mlfg6331_64'); Options = statset('UseParallel',true,'Streams',s,'UseSubstreams',true); [h,p,e1,e2] = testckfold(C,C,X(:,idx5),X,Y,'Options',Options)
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).
h = logical
0
p = 0.4161
e1 = 5×2
0.0686 0.0795
0.0800 0.0625
0.0914 0.0568
0.0400 0.0739
0.0914 0.0966
e2 = 5×2
0.0914 0.0625
0.1257 0.0682
0.0971 0.0625
0.0800 0.0909
0.0914 0.1193
testckfold
treats trained classification models as templates, and so it ignores all fitted parameters in C
. That is, testckfold
cross validates C
using only the specified options and the predictor data to estimate the out-of-fold classification losses.
h = 0
indicates to not reject the null hypothesis that the two models have equal predictive accuracies. This result favors the simpler ensemble.
Input Arguments
C1
— Classification model template or trained classification model
classification model template object | trained classification model object
Classification model template or trained classification model, specified as any classification model template object or trained classification model object described in these tables.
Template Type | Returned By |
---|---|
Classification tree | templateTree |
Discriminant analysis | templateDiscriminant |
Ensemble (boosting, bagging, and random subspace) | templateEnsemble |
Error-correcting output codes (ECOC), multiclass classification model | templateECOC |
Generalized Additive Model | templateGAM |
Gaussian kernel classification with support vector machine (SVM) or logistic regression learners | templateKernel |
kNN | templateKNN |
Linear classification with SVM or logistic regression learners | templateLinear |
Naive Bayes | templateNaiveBayes |
SVM | templateSVM |
Trained Model Type | Model Object | Returned By |
---|---|---|
Classification tree | ClassificationTree | fitctree |
Discriminant analysis | ClassificationDiscriminant | fitcdiscr |
Ensemble of bagged classification models | ClassificationBaggedEnsemble | fitcensemble |
Ensemble of classification models | ClassificationEnsemble | fitcensemble |
ECOC model | ClassificationECOC | fitcecoc |
Generalized additive model (GAM) | ClassificationGAM | fitcgam |
kNN | ClassificationKNN | fitcknn |
Naive Bayes | ClassificationNaiveBayes | fitcnb |
Neural network | ClassificationNeuralNetwork (with
observations in rows) | fitcnet |
SVM | ClassificationSVM | fitcsvm |
For efficiency, supply a classification model template object instead of a trained classification model object.
C2
— Classification model template or trained model
classification model template object | trained classification model object
Classification model template or trained classification model, specified as any classification model template object or trained classification model object described in these tables.
Template Type | Returned By |
---|---|
Classification tree | templateTree |
Discriminant analysis | templateDiscriminant |
Ensemble (boosting, bagging, and random subspace) | templateEnsemble |
Error-correcting output codes (ECOC), multiclass classification model | templateECOC |
Generalized Additive Model | templateGAM |
Gaussian kernel classification with support vector machine (SVM) or logistic regression learners | templateKernel |
kNN | templateKNN |
Linear classification with SVM or logistic regression learners | templateLinear |
Naive Bayes | templateNaiveBayes |
SVM | templateSVM |
Trained Model Type | Model Object | Returned By |
---|---|---|
Classification tree | ClassificationTree | fitctree |
Discriminant analysis | ClassificationDiscriminant | fitcdiscr |
Ensemble of bagged classification models | ClassificationBaggedEnsemble | fitcensemble |
Ensemble of classification models | ClassificationEnsemble | fitcensemble |
ECOC model | ClassificationECOC | fitcecoc |
Generalized additive model (GAM) | ClassificationGAM | fitcgam |
kNN | ClassificationKNN | fitcknn |
Naive Bayes | ClassificationNaiveBayes | fitcnb |
Neural network | ClassificationNeuralNetwork (with
observations in rows) | fitcnet |
SVM | ClassificationSVM | fitcsvm |
For efficiency, supply a classification model template object instead of a trained classification model object.
X1
— Data used to apply to first full classification model or template
numeric matrix | table
Data used to apply to the first full classification model or
template, C1
, specified as a numeric matrix or
table.
Each row of X1
corresponds to one observation, and each column corresponds
to one variable. testckfold
does not support
multicolumn variables and cell arrays other than cell arrays of character
vectors.
X1
and X2
must be of
the same data type, and X1
, X2
, Y
must
have the same number of observations.
If you specify Y
as an array, then testckfold
treats all columns of X1
as separate predictor variables.
Data Types: double
| single
| table
X2
— Data used to apply to second full classification model or template
numeric matrix | table
Data used to apply to the second full classification model or
template, C2
, specified as a numeric matrix or
table.
Each row of X2
corresponds to one observation, and each column corresponds
to one variable. testckfold
does not support
multicolumn variables and cell arrays other than cell arrays of character
vectors.
X1
and X2
must be of
the same data type, and X1
, X2
, Y
must
have the same number of observations.
If you specify Y
as an array, then testckfold
treats all columns of X2
as separate predictor variables.
Data Types: double
| single
| table
Y
— True class labels
categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors | character vector | string scalar
True class labels, specified as a categorical, character, or string array, a logical or numeric vector, a cell array of character vectors, or a character vector or string scalar.
For a character vector or string scalar,
X1
andX2
must be tables, their response variables must have the same name and values, andY
must be the common variable name. For example, ifX1.Labels
andX2.Labels
are the response variables, thenY
is'Labels'
andX1.Labels
andX2.Labels
must be equivalent.For all other supported data types,
Y
is an array of true class labels.If
Y
is a character array, then each element must correspond to one row of the array.X1
,X2
,Y
must have the same number of observations (rows).
If both of these statements are true, then you can omit supplying
Y
.Consequently,
testckfold
uses the common response variable in the tables. For example, if the response variables in the tables areX1.Labels
andX2.Labels
, and the values ofC1.ResponseName
andC2.ResponseName
are'Labels'
, then you do not have to supplyY
.
Data Types: categorical
| char
| string
| logical
| single
| double
| cell
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'Alternative','greater','Test','10x10t','Options',statsset('UseParallel',true)
specifies to test whether the first set of first predicted class labels is more
accurate than the second set, to conduct the 10-by-10 t test, and to use parallel
computing for cross-validation.
Alpha
— Hypothesis test significance level
0.05
(default) | scalar value in the interval (0,1)
Hypothesis test significance level, specified as the comma-separated
pair consisting of 'Alpha'
and a scalar value in
the interval (0,1).
Example: 'Alpha',0.1
Data Types: single
| double
Alternative
— Alternative hypothesis to assess
'unequal'
(default) | 'greater'
| 'less'
Alternative hypothesis to assess, specified as the comma-separated
pair consisting of 'Alternative'
and one of the
values listed in the table.
Value | Alternative Hypothesis Description | Supported Tests |
---|---|---|
'unequal' (default) | For predicting Y , the set of predictions
resulting from C1 applied to X1 and C2 applied
to X2 have unequal accuracies. | '5x2F' , '5x2t' , and '10x10t' |
'greater' | For predicting Y , the set of predictions
resulting from C1 applied to X1 is
more accurate than C2 applied to X2 . | '5x2t' and '10x10t' |
'less' | For predicting Y , the set of predictions
resulting from C1 applied to X1 is
less accurate than C2 applied to X2 . | '5x2t' and '10x10t' |
For details on supported tests, see Test
.
Example: 'Alternative','greater'
X1CategoricalPredictors
— Flag identifying categorical predictors
[]
(default) | logical vector | numeric vector | 'all'
Flag identifying categorical predictors in the first test-set
predictor data (X1
), specified as the comma-separated
pair consisting of 'X1CategoricalPredictors'
and
one of the following:
A numeric vector with indices from
1
throughp
, wherep
is the number of columns ofX1
.A logical vector of length
p
, where atrue
entry means that the corresponding column ofX1
is a categorical variable.'all'
, meaning all predictors are categorical.
The default is []
, which indicates that the data
contains no categorical predictors.
For a kNN classification model, valid options are
[]
and 'all'
.
You must specify X1CategoricalPredictors
if
X1
is a matrix and includes categorical
predictors. testckfold
does not use the
CategoricalPredictors
property of
C1
when C1
is a trained
classification model. If C1
is a trained model with
categorical predictors, specify
'X1CategoricalPredictors',C1.CategoricalPredictors
.
Example: 'X1CategoricalPredictors','all'
Data Types: single
| double
| logical
| char
| string
X2CategoricalPredictors
— Flag identifying categorical predictors
[]
(default) | logical vector | numeric vector | 'all'
Flag identifying categorical predictors in the second test-set
predictor data (X2
), specified as the comma-separated
pair consisting of 'X2CategoricalPredictors'
and
one of the following:
A numeric vector with indices from
1
throughp
, wherep
is the number of columns ofX2
.A logical vector of length
p
, where atrue
entry means that the corresponding column ofX2
is a categorical variable.'all'
, meaning all predictors are categorical.
The default is []
, which indicates that the data contains no categorical
predictors.
For a kNN classification model, valid options are
[]
and 'all'
.
You must specify X2CategoricalPredictors
if
X2
is a matrix and includes categorical
predictors. testckfold
does not use the
CategoricalPredictors
property of
C2
when C2
is a trained
classification model. If C2
is a trained model with
categorical predictors, specify
'X2CategoricalPredictors',C2.CategoricalPredictors
.
Example: 'X2CategoricalPredictors','all'
Data Types: single
| double
| logical
| char
| string
ClassNames
— Class names
categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors
Class names, specified as the comma-separated pair consisting of
'ClassNames'
and a categorical, character, or
string array, logical or numeric vector, or cell array of character
vectors. You must set ClassNames
using the data
type of Y
.
If ClassNames
is a character array, then each
element must correspond to one row of the array.
Use ClassNames
to:
Specify the order of any input argument dimension that corresponds to class order. For example, use
ClassNames
to specify the order of the dimensions ofCost
.Select a subset of classes for testing. For example, suppose that the set of all distinct class names in
Y
is{'a','b','c'}
. To train and test models using observations from classes'a'
and'c'
only, specify'ClassNames',{'a','c'}
.
The default is the set of all distinct class names in
Y
.
Example: 'ClassNames',{'b','g'}
Data Types: single
| double
| logical
| char
| string
| cell
| categorical
Cost
— Classification cost
square matrix | structure array
Classification cost, specified as the comma-separated pair consisting
of 'Cost'
and a square matrix or structure array.
If you specify the square matrix
Cost
, thenCost(i,j)
is the cost of classifying a point into classj
if its true class isi
. That is, the rows correspond to the true class and the columns correspond to the predicted class. To specify the class order for the corresponding rows and columns ofCost
, additionally specify theClassNames
name-value pair argument.If you specify the structure
S
, thenS
must have two fields:S.ClassNames
, which contains the class names as a variable of the same data type asY
. You can use this field to specify the order of the classes.S.ClassificationCosts
, which contains the cost matrix, with rows and columns ordered as inS.ClassNames
For cost-sensitive testing use, testcholdout
.
It is a best practice to supply the same cost matrix used to train the classification models.
The default is Cost(i,j) = 1
if i
~= j
, and Cost(i,j) = 0
if i
= j
.
Example: 'Cost',[0 1 2 ; 1 0 2; 2 2 0]
Data Types: double
| single
| struct
LossFun
— Loss function
'classiferror'
(default) | 'binodeviance'
| 'exponential'
| 'hinge'
| function handle
Loss function, specified as the comma-separated pair consisting
of 'LossFun'
and 'classiferror'
, 'binodeviance'
, 'exponential'
, 'hinge'
,
or a function handle.
The following table lists the available loss functions.
Value Loss Function 'binodeviance'
Binomial deviance 'classiferror'
Classification error 'exponential'
Exponential loss 'hinge'
Hinge loss Specify your own function using function handle notation.
Suppose that
n = size(X,1)
is the sample size and there areK
unique classes. Your function must have the signaturelossvalue =
, where:lossfun
(C,S,W,Cost)The output argument
lossvalue
is a scalar.lossfun
is the name of your function.C
is ann
-by-K
logical matrix with rows indicating which class the corresponding observation belongs to. The column order corresponds to the class order in theClassNames
name-value pair argument.Construct
C
by settingC(p,q) = 1
if observationp
is in classq
, for each row. Set all other elements of rowp
to0
.S
is ann
-by-K
numeric matrix of classification scores. The column order corresponds to the class order in theClassNames
name-value pair argument.S
is a matrix of classification scores.W
is ann
-by-1 numeric vector of observation weights. If you passW
, the software normalizes the weights to sum to1
.Cost
is aK
-by-K
numeric matrix of classification costs. For example,Cost = ones(K) - eye(K)
specifies a cost of0
for correct classification and a cost of1
for misclassification.
Specify your function using
'LossFun',@
.lossfun
Options
— Options for computing in parallel and setting random streams
structure
Options for computing in parallel and setting random streams, specified as a
structure. Create the Options
structure using statset
. This table lists the option fields and their
values.
Field Name | Value | Default |
---|---|---|
UseParallel | Set this value to true to run computations in
parallel. | false |
UseSubstreams | Set this value to To compute
reproducibly, set | false |
Streams | Specify this value as a RandStream object or
cell array of such objects. Use a single object except when the
UseParallel value is true
and the UseSubstreams value is
false . In that case, use a cell array that
has the same size as the parallel pool. | If you do not specify Streams , then
testckfold uses the default stream or
streams. |
Note
You need Parallel Computing Toolbox to run computations in parallel.
Example: Options=statset(UseParallel=true,UseSubstreams=true,Streams=RandStream("mlfg6331_64"))
Data Types: struct
Prior
— Prior probabilities
'empirical'
(default) | 'uniform'
| numeric vector | structure
Prior probabilities for each class, specified as the comma-separated
pair consisting of 'Prior'
and 'empirical'
, 'uniform'
,
a numeric vector, or a structure.
This table summarizes the available options for setting prior probabilities.
Value | Description |
---|---|
'empirical' | The class prior probabilities are the class relative frequencies
in Y . |
'uniform' | All class prior probabilities are equal to 1/K, where K is the number of classes. |
numeric vector | Each element is a class prior probability. Specify the order
using the ClassNames name-value pair argument.
The software normalizes the elements such that they sum to 1 . |
structure | A structure
|
Example: 'Prior',struct('ClassNames',{{'setosa','versicolor'}},'ClassProbs',[1,2])
Data Types: char
| string
| single
| double
| struct
Test
— Test to conduct
'5x2F'
(default) | '5x2t'
| '10x10t'
Test to conduct, specified as the comma-separated pair consisting
of 'Test'
and one of he following: '5x2F'
, '5x2t'
, '10x10t'
.
Value | Description | Supported Alternative Hypothesis |
---|---|---|
'5x2F' (default) | 5-by-2 paired F test. Appropriate for two-sided testing only. | 'unequal' |
'5x2t' | 5-by-2 paired t test | 'unequal' , 'less' , 'greater' |
'10x10t' | 10-by-10 repeated cross-validation t test | 'unequal' , 'less' , 'greater' |
For details on the available tests, see Repeated Cross-Validation Tests. For details on supported
alternative hypotheses, see Alternative
.
Example: 'Test','10x10t'
Verbose
— Verbosity level
0
(default) | 1
| 2
Verbosity level, specified as the comma-separated pair consisting
of 'Verbose'
and 0
, 1
,
or 2
. Verbose
controls the amount
of diagnostic information that the software displays in the Command
Window during training of each cross-validation fold.
This table summarizes the available verbosity level options.
Value | Description |
---|---|
0 | The software does not display diagnostic information. |
1 | The software displays diagnostic messages every time it implements a new cross-validation run. |
2 | The software displays diagnostic messages every time it implements a new cross-validation run, and every time it trains on a particular fold. |
Example: 'Verbose',1
Data Types: double
| single
Weights
— Observation weights
ones(size(X,1),1)
(default) | numeric vector
Observation weights, specified as the comma-separated pair consisting
of 'Weights'
and a numeric vector.
The size of Weights
must equal the number
of rows of X1
. The software weighs the observations
in each row of X
with the corresponding weight
in Weights
.
The software normalizes Weights
to sum up
to the value of the prior probability in the respective class.
Data Types: double
| single
Notes:
testckfold
treats trained classification models as templates. Therefore, it ignores all fitted parameters in the model. That is,testckfold
cross-validates using only the options specified in the model and the predictor data.The repeated cross-validation tests depend on the assumption that the test statistics are asymptotically normal under the null hypothesis. Highly imbalanced cost matrices (for example,
Cost
=[0 100;1 0]
) and highly discrete response distributions (that is, most of the observations are in a small number of classes) might violate the asymptotic normality assumption. For cost-sensitive testing, usetestcholdout
.NaN
s,<undefined>
values, empty character vectors (''
), empty strings (""
), and<missing>
values indicate missing data values.
Output Arguments
h
— Hypothesis test result
1
| 0
Hypothesis test result, returned as a logical value.
h = 1
indicates the rejection of the null
hypothesis at the Alpha
significance level.
h = 0
indicates failure to reject the null hypothesis at the
Alpha
significance level.
Data Types: logical
p
— p-value
scalar in the interval [0,1]
p-value of the test, returned as a scalar
in the interval [0,1]. p
is the probability that
a random test statistic is at least as extreme as the observed test
statistic, given that the null hypothesis is true.
testckfold
estimates p
using
the distribution of the test statistic, which varies with the type
of test. For details on test statistics, see Repeated Cross-Validation Tests.
e1
— Classification losses
numeric matrix
Classification
losses, returned as a numeric matrix. The rows of e1
correspond
to the cross-validation run and the columns correspond to the test
fold.
testckfold
applies the first test-set
predictor data (X1
) to the first classification
model (C1
) to estimate the first set of class
labels.
e1
summarizes the accuracy of the first set
of class labels predicting the true class labels (Y
)
for each cross-validation run and fold. The meaning of the elements
of e1
depends on the type of classification loss.
e2
— Classification losses
numeric matrix
Classification
losses, returned as a numeric matrix. The rows of e2
correspond
to the cross-validation run and the columns correspond to the test
fold.
testckfold
applies the second test-set predictor data
(X2
) to the second classification model
(C2
) to estimate the second set of class
labels.
e2
summarizes the accuracy of the second set of class labels predicting the
true class labels (Y
) for each cross-validation run and
fold. The meaning of the elements of e2
depends on the
type of classification loss.
More About
Repeated Cross-Validation Tests
Repeated cross-validation tests form the test statistic for comparing the accuracies of two classification models by combining the classification loss differences resulting from repeatedly cross-validating the data. Repeated cross-validation tests are useful when sample size is limited.
To conduct an R-by-K test:
Randomly divide (stratified by class) the predictor data sets and true class labels into K sets, R times. Each division is called a run and each set within a run is called a fold. Each run contains the complete, but divided, data sets.
For runs r = 1 through R, repeat these steps for k = 1 through K:
Reserve fold k as a test set, and train the two classification models using their respective predictor data sets on the remaining K – 1 folds.
Predict class labels using the trained models and their respective fold k predictor data sets.
Estimate the classification loss by comparing the two sets of estimated labels to the true labels. Denote as the classification loss when the test set is fold k in run r of classification model c.
Compute the difference between the classification losses of the two models:
At the end of a run, there are K classification losses per classification model.
Combine the results of step 2. For each r = 1 through R:
Estimate the within-fold averages of the differences and their average:
Estimate the overall average of the differences:
Estimate the within-fold variances of the differences:
Estimate the average of the within-fold differences:
Estimate the overall sample variance of the differences:
Compute the test statistic. All supported tests described here assume that, under H0, the estimated differences are independent and approximately normally distributed, with mean 0 and a finite, common standard deviation. However, these tests violate the independence assumption, and so the test-statistic distributions are approximate.
For R = 2, the test is a paired test. The two supported tests are a paired t and F test.
The test statistic for the paired t test is
has a t-distribution with R degrees of freedom under the null hypothesis.
To reduce the effects of correlation between the estimated differences, the quantity occupies the numerator rather than .
5-by-2 paired t tests can be slightly conservative [4].
The test statistic for the paired F test is
has an F distribution with RK and R degrees of freedom.
A 5-by-2 paired F test has comparable power to the 5-by-2 t test, but is more conservative [1].
For R > 2, the test is a repeated cross-validation test. The test statistic is
has a t distribution with ν degrees of freedom. If the differences were truly independent, then ν = RK – 1. In this case, the degrees of freedom parameter must be optimized.
For a 10-by-10 repeated cross-validation t test, the optimal degrees of freedom between 8 and 11 ([2] and [3]).
testckfold
uses ν = 10.
The advantage of repeated cross-validation tests over paired tests is that the results are more repeatable [3]. The disadvantage is that they require high computational resources.
Classification Loss
Classification losses indicate the accuracy of a classification model or set of predicted labels. In general, for a fixed cost matrix, classification accuracy decreases as classification loss increases.
testckfold
returns the classification
losses (see e1
and e2
) under
the alternative hypothesis (that is, the unrestricted classification
losses). In the definitions that follow:
The classification losses focus on the first classification model. The classification losses for the second model are similar.
ntest is the test-set sample size.
I(x) is the indicator function. If x is a true statement, then I(x) = 1. Otherwise, I(x) = 0.
is the predicted class assignment of classification model 1 for observation j.
yj is the true class label of observation j.
Binomial deviance has the form
where:
yj = 1 for the positive class and -1 for the negative class.
is the classification score.
The binomial deviance has connections to the maximization of the binomial likelihood function. For details on binomial deviance, see [5].
Exponential loss is similar to binomial deviance and has the form
yj and take the same forms here as in the binomial deviance formula.
Hinge loss has the form
yj and take the same forms here as in the binomial deviance formula.
Hinge loss linearly penalizes for misclassified observations and is related to the SVM objective function used for optimization. For more details on hinge loss, see [5].
Misclassification rate, or classification error, is a scalar in the interval [0,1] representing the proportion of misclassified observations. That is, the misclassification rate for the first classification model is
Tips
Examples of ways to compare models include:
Compare the accuracies of a simple classification model and a more complex model by passing the same set of predictor data.
Compare the accuracies of two different models using two different sets of predictors.
Perform various types of Feature Selection. For example, you can compare the accuracy of a model trained using a set of predictors to the accuracy of one trained on a subset or different set of predictors. You can arbitrarily choose the set of predictors, or use a feature selection technique like PCA or sequential feature selection (see
pca
andsequentialfs
).
If both of these statements are true, then you can omit supplying
Y
.Consequently,
testckfold
uses the common response variable in the tables.One way to perform cost-insensitive feature selection is:
Create a classification model template that characterizes the first classification model (
C1
).Create a classification model template that characterizes the second classification model (
C2
).Specify two predictor data sets. For example, specify
X1
as the full predictor set andX2
as a reduced set.Enter
testckfold(C1,C2,X1,X2,Y,'Alternative','less')
. Iftestckfold
returns1
, then there is enough evidence to suggest that the classification model that uses fewer predictors performs better than the model that uses the full predictor set.
Alternatively, you can assess whether there is a significant difference between the accuracies of the two models. To perform this assessment, remove the
'Alternative','less'
specification in step 4.testckfold
conducts a two-sided test, andh = 0
indicates that there is not enough evidence to suggest a difference in the accuracy of the two models.The tests are appropriate for the misclassification rate classification loss, but you can specify other loss functions (see
LossFun
). The key assumptions are that the estimated classification losses are independent and normally distributed with mean 0 and finite common variance under the two-sided null hypothesis. Classification losses other than the misclassification rate can violate this assumption.Highly discrete data, imbalanced classes, and highly imbalanced cost matrices can violate the normality assumption of classification loss differences.
Algorithms
If you specify to conduct the 10-by-10 repeated cross-validation t test
using 'Test','10x10t'
, then testckfold
uses
10 degrees of freedom for the t distribution to
find the critical region and estimate the p-value.
For more details, see [2] and [3].
Alternatives
Use testcholdout
:
For test sets with larger sample sizes
To implement variants of the McNemar test to compare two classification model accuracies
For cost-sensitive testing using a chi-square or likelihood ratio test. The chi-square test uses
quadprog
(Optimization Toolbox), which requires an Optimization Toolbox™ license.
References
[1] Alpaydin, E. “Combined 5 x 2 CV F Test for Comparing Supervised Classification Learning Algorithms.” Neural Computation, Vol. 11, No. 8, 1999, pp. 1885–1992.
[2] Bouckaert. R. “Choosing Between Two Learning Algorithms Based on Calibrated Tests.” International Conference on Machine Learning, 2003, pp. 51–58.
[3] Bouckaert, R., and E. Frank. “Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms.” Advances in Knowledge Discovery and Data Mining, 8th Pacific-Asia Conference, 2004, pp. 3–12.
[4] Dietterich, T. “Approximate statistical tests for comparing supervised classification learning algorithms.” Neural Computation, Vol. 10, No. 7, 1998, pp. 1895–1923.
[5] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd Ed. New York: Springer, 2008.
Extended Capabilities
Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.
To run in parallel, specify the Options
name-value argument in the call to
this function and set the UseParallel
field of the
options structure to true
using
statset
:
Options=statset(UseParallel=true)
For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Usage notes and limitations:
This function fully supports GPU arrays for
ClassificationNeuralNetwork
andCompactClassificationNeuralNetwork
models.This function supports GPU arrays with limitations for the classification models described in this table.
Full or Compact Model Object Limitations ClassificationECOC
orCompactClassificationECOC
Binary learners are subject to limitations depending on type:
Ensemble learners have the same limitations as
ClassificationEnsemble
.KNN learners have the same limitations as
ClassificationKNN
.SVM learners have the same limitations as
ClassificationSVM
.Tree learners have the same limitations as
ClassificationTree
.
ClassificationEnsemble
orCompactClassificationEnsemble
Weak learners are subject to limitations depending on type:
KNN learners have the same limitations as
ClassificationKNN
.Tree learners have the same limitations as
ClassificationTree
.Discriminant learners are not supported.
ClassificationKNN
Models trained using the Kd-tree nearest neighbor search method, function handle distance metrics, or tie inclusion are not supported.
ClassificationSVM
orCompactClassificationSVM
One-class classification is not supported.
ClassificationTree
orCompactClassificationTree
Surrogate splits are not supported. testckfold
executes on a GPU in these cases only:Either or both of the input arguments
X1
andX2
are GPU arrays.Either or both of the input arguments
T1
andT2
containgpuArray
predictor variables.Either or both of the input arguments
C1
andC2
were fitted with GPU array input arguments.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2015aR2024b: Specify GPU arrays for neural network models (requires Parallel Computing Toolbox)
testckfold
fully supports GPU arrays for ClassificationNeuralNetwork
and CompactClassificationNeuralNetwork
models.
See Also
testcholdout
| templateECOC
| templateEnsemble
| templateDiscriminant
| templateTree
| templateSVM
| templateNaiveBayes
| templateKNN
Topics
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)