Main Content

Evaluate Deep Learning Experiments by Using Metric Functions

This example shows how to use metric functions to evaluate the results of an experiment. By default, when you run a built-in training experiment, Experiment Manager computes the loss, accuracy (for classification experiments), and root mean squared error (for regression experiments) for each trial in your experiment. To compute other measures, create your own metric function. For example, you can define metric functions to:

  • Test the prediction performance of a trained network.

  • Evaluate the training progress by computing the slope of the validation loss over the final epoch.

  • Display the size of the network used in an experiment that uses different network architectures for each trial.

When each trial finishes training, Experiment Manager evaluates the metric functions and displays their values in the results table.

In this example, you train a network to classify images of handwritten digits. Two metric functions determine how well the trained network identifies the images of the numerals one and seven. For more information on using Experiment Manager to train a network for image classification, see Image Classification by Sweeping Hyperparameters.

Define Metric Functions

Add a metric function to a built-in training experiment.

1. In the Experiment pane, under Metrics, click Add.

2. In the Add metric dialog box, enter a name for the metric function and click OK. If you enter the name of a function that already exists in the project, Experiment Manager adds it to the experiment. Otherwise, Experiment Manager creates a function defined by a default template.

3. Select the name of the metric function and click Edit. The metric function opens in MATLAB® Editor.

The input to a metric function is a structure with three fields:

  • trainedNetwork is the SeriesNetwork object or DAGNetwork object returned by the trainNetwork function. For more information, see net.

  • trainingInfo is a structure containing the training information returned by the trainNetwork function. For more information, see info.

  • parameters is a structure with fields from the hyperparameter table.

The output of a custom metric function must be a scalar number, a logical value, or a string.

Open Experiment

First, open the example. Experiment Manager loads a project with a preconfigured experiment that you can inspect and run. To open the experiment, in the Experiment Browser pane, double-click the name of the experiment (ClassificationExperiment).

Built-in training experiments consist of a description, a table of hyperparameters, a setup function, and a collection of metric functions to evaluate the results of the experiment. For more information, see Configure Built-In Training Experiment.

The Description field contains a textual description of the experiment. For this example, the description is:

Classification of digits, evaluating results by using metric functions:
* OnesAsSevens returns the percentage of 1s misclassified as 7s.
* SevensAsOnes returns the percentage of 7s misclassified as 1s.

The Hyperparameters section specifies the strategy (Exhaustive Sweep) and hyperparameter values to use for the experiment. When you run the experiment, Experiment Manager trains the network using every combination of hyperparameter values specified in the hyperparameter table. This example uses the hyperparameters InitialLearnRate and Momentum.

The Setup Function configures the training data, network architecture, and training options for the experiment. The input to the setup function is a structure with fields from the hyperparameter table. The setup function returns three outputs that you use to train a network for image classification problems. In this example, the setup function has three sections.

  • Load Training Data defines image datastores containing the training and validation data. This example loads images from the Digits data set. For more information on this data set, see Image Data Sets.

digitDatasetPath = fullfile(toolboxdir("nnet"), ...
    "nndemos","nndatasets","DigitDataset");
imdsTrain = imageDatastore(digitDatasetPath, ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");
numTrainingFiles = 750;
[imdsTrain,imdsValidation] = splitEachLabel(imdsTrain,numTrainingFiles);
  • Define Network Architecture defines the architecture for a convolutional neural network for deep learning classification. This example uses the default classification network provided by the setup function template.

inputSize = [28 28 1];
numClasses = 10;
layers = [
    imageInputLayer(inputSize)
    convolution2dLayer(5,20)
    batchNormalizationLayer
    reluLayer
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];
  • Specify Training Options defines a trainingOptions object for the experiment. The example loads the values for the training options 'InitialLearnRate' and 'Momentum' from the hyperparameter table.

options = trainingOptions("sgdm", ...
    MaxEpochs=5, ...
    ValidationData=imdsValidation, ...
    ValidationFrequency=30, ...
    InitialLearnRate=params.InitialLearnRate, ...
    Momentum=params.Momentum, ...
    Verbose=false);

To inspect the setup function, under Setup Function, click Edit. The setup function opens in MATLAB® Editor. In addition, the code for the setup function appears in Appendix 1 at the end of this example.

The Metrics section specifies optional functions that evaluate the results of the experiment. Experiment Manager evaluates these functions each time it finishes training the network. To inspect a metric function, select the name of the metric function and click Edit. The metric function opens in MATLAB Editor.

This example includes two metric functions.

  • OnesAsSevens returns the percentage of images of the numeral one that the trained network misclassifies as sevens.

  • SevensAsOnes returns the percentage of images of the numeral seven that the trained network misclassifies as ones.

Each of these functions uses the trained network to classify the entire Digits data set. Then, the functions determine the number of images for which the actual label and the predicted label disagree. For example, the function OnesAsSevens computes the number of images with an actual label of '1' and a predicted label of '7'. Similarly, the function SevensAsOnes computes the number of images with an actual label of '7' and a predicted label of '1'. The code for these metric functions appears in Appendix 2 and Appendix 3 at the end of this example.

Run Experiment

When you run the experiment, Experiment Manager trains the network defined by the setup function six times. Each trial uses a different combination of hyperparameter values. By default, Experiment Manager runs one trial at a time. If you have Parallel Computing Toolbox™, you can run multiple trials at the same time. For best results, before you run your experiment, start a parallel pool with as many workers as GPUs. For more information, see Use Experiment Manager to Train Networks in Parallel and GPU Support by Release (Parallel Computing Toolbox).

  • To run one trial of the experiment at a time, on the Experiment Manager toolstrip, click Run.

  • To run multiple trials at the same time, click Use Parallel and then Run. If there is no current parallel pool, Experiment Manager starts one using the default cluster profile. Experiment Manager then executes multiple simultaneous trials, depending on the number of parallel workers available.

A table of results displays the metric function values for each trial.

Evaluate Results

To find the best result for your experiment, sort the table of results. For example, find the trial with the smallest number of misclassified ones.

  1. Point to the OnesAsSevens column.

  2. Click the triangle icon.

  3. Select Sort in Ascending Order.

Similarly, find the trial with the smallest number of misclassified sevens by opening the drop-down menu for the SevensAsOnes column and selecting Sort in Ascending Order.

If no single trial minimizes both metric functions simultaneously, consider giving preference to a trial that ranks well for each metric. For instance, in these results, trial 5 ranks as one of the top three trials for each metric function.

To record observations about the results of your experiment, add an annotation.

  1. In the results table, right-click the OnesAsSevens cell of the best trial.

  2. Select Add Annotation.

  3. In the Annotations pane, enter your observations in the text box.

  4. Repeat the previous steps for the SevensAsOnes cell.

For more information, see Sort, Filter, and Annotate Experiment Results.

Close Experiment

In the Experiment Browser pane, right-click the name of the project and select Close Project. Experiment Manager closes all of the experiments and results contained in the project.

Appendix 1: Setup Function

This function configures the training data, network architecture, and training options for the experiment.

Input

  • params is a structure with fields from the Experiment Manager hyperparameter table.

Output

  • imdsTrain is an image datastore for the training data.

  • layers is a layer graph that defines the neural network architecture.

  • options is a trainingOptions object.

function [imdsTrain,layers,options] = ClassificationExperiment_setup1(params)

digitDatasetPath = fullfile(toolboxdir("nnet"), ...
    "nndemos","nndatasets","DigitDataset");
imdsTrain = imageDatastore(digitDatasetPath, ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");

numTrainingFiles = 750;
[imdsTrain,imdsValidation] = splitEachLabel(imdsTrain,numTrainingFiles);

inputSize = [28 28 1];
numClasses = 10;
layers = [
    imageInputLayer(inputSize)
    convolution2dLayer(5,20)
    batchNormalizationLayer
    reluLayer
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];

options = trainingOptions("sgdm", ...
    MaxEpochs=5, ... 
    ValidationData=imdsValidation, ...
    ValidationFrequency=30, ...
    InitialLearnRate=params.InitialLearnRate, ...
    Momentum=params.Momentum, ...
    Verbose=false);

end

Appendix 2: Find Ones Misclassified as Sevens

This function determines the number of ones that are misclassified as sevens.

function metricOutput = OnesAsSevens(trialInfo)

actualValue = '1';
predValue = '7';

net = trialInfo.trainedNetwork;

digitDatasetPath = fullfile(toolboxdir("nnet"), ...
    "nndemos","nndatasets","DigitDataset");
imds = imageDatastore(digitDatasetPath, ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");

YActual = imds.Labels;
YPred = classify(net,imds);

K = sum(YActual == actualValue & YPred == predValue);
N = sum(YActual == actualValue);

metricOutput = 100*K/N;

end

Appendix 3: Find Sevens Misclassified as Ones

This function determines the number of sevens that are misclassified as ones.

function metricOutput = SevensAsOnes(trialInfo)

actualValue = '7';
predValue = '1';

net = trialInfo.trainedNetwork;

digitDatasetPath = fullfile(toolboxdir("nnet"), ...
    "nndemos","nndatasets","DigitDataset");
imds = imageDatastore(digitDatasetPath, ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");

YActual = imds.Labels;
YPred = classify(net,imds);

K = sum(YActual == actualValue & YPred == predValue);
N = sum(YActual == actualValue);

metricOutput = 100*K/N;

end

See Also

Apps

Functions

Related Topics