plot

Plot bar graph of slice metric

Since R2026a

Syntax

plot(sliceResults)

plot(sliceResults,Metric=metric)

b = plot(___)

Description

plot(sliceResults) creates a bar graph of the accuracy (for classification models) or mean squared error (for regression models) for the data slices in sliceResults. For most metrics, plot displays bars for the data slices and their complements. The complement of a data slice consists of all observations that are not in the data slice.

example

plot(sliceResults,Metric=metric) specifies the slice metric to display in the bar graph.

b = plot(___) returns an array of Bar objects using any of the input argument combinations in previous syntaxes. Use b to query or modify Bar Properties after displaying the bar graph.

Examples

collapse all

Evaluate Binary Classifier on Slices of Numeric Predictor

Open Live Script

Train a binary classifier on numeric data. Use sliceMetrics to slice the training data according to one of the predictors. Evaluate the accuracy of the model predictions on the data slices.

Load the sample file fisheriris.csv, which contains iris data including sepal length, sepal width, petal width, and species type. Read the file into a table, and then convert the Species variable into a categorical variable. Display the first eight observations in the table.

fisheriris = readtable("fisheriris.csv");
fisheriris.Species = categorical(fisheriris.Species);
head(fisheriris)

    SepalLength    SepalWidth    PetalLength    PetalWidth    Species
    ___________    __________    ___________    __________    _______

        5.1           3.5            1.4           0.2        setosa 
        4.9             3            1.4           0.2        setosa 
        4.7           3.2            1.3           0.2        setosa 
        4.6           3.1            1.5           0.2        setosa 
          5           3.6            1.4           0.2        setosa 
        5.4           3.9            1.7           0.4        setosa 
        4.6           3.4            1.4           0.3        setosa 
          5           3.4            1.5           0.2        setosa

Separate the data for two of the iris species: versicolor and virginica.

versicolorData = fisheriris(fisheriris.Species=="versicolor",:);
virginicaData = fisheriris(fisheriris.Species=="virginica",:);
trainingData = [versicolorData;virginicaData];

Train a binary tree classifier on the versicolor and virginica data.

Mdl = fitctree(trainingData,"Species")

Mdl = 
  ClassificationTree
           PredictorNames: {'SepalLength'  'SepalWidth'  'PetalLength'  'PetalWidth'}
             ResponseName: 'Species'
    CategoricalPredictors: []
               ClassNames: [versicolor    virginica]
           ScoreTransform: 'none'
          NumObservations: 100


  Properties, Methods

Mdl is a ClassificationTree model object trained on 100 observations.

Compute metrics on the training data slices determined by petal length. Because PetalLength is a numeric predictor, sliceMetrics creates data slices by binning the petal length values of observations in Mdl.X. Data slices always partition the data.

sliceResults = sliceMetrics(Mdl,"PetalLength")

sliceResults = 
sliceMetrics evaluated on PetalLength slices:

    PetalLength    NumObservations    Accuracy    OddsRatio     PValue     EffectSize
    ___________    _______________    ________    _________    ________    __________

    [3, 3.65)             6                 1           0      0.083246    -0.031915 
    [3.65, 4.3)          17                 1           0      0.083241    -0.036145 
    [4.3, 4.95)          31           0.96774      1.1167       0.93193    0.0032726 
    [4.95, 5.6)          21           0.90476      8.2105       0.23003      0.08258 
    [5.6, 6.25)          19                 1           0       0.08324    -0.037037 
    [6.25, 6.9]           6                 1           0      0.083246    -0.031915 


  Properties, Methods

In this example, sliceMetrics creates six data slices. The accuracy for each data slice is quite high (over 90%).

Visualize the accuracy for each slice and its complement in Mdl.X.

plot(sliceResults)

Figure contains an axes object. The axes object with title Slice and Complement Accuracy, xlabel PetalLength slices, ylabel Accuracy contains 2 objects of type bar. These objects represent Slice, Complement.

In general, the accuracy for each slice is similar to the accuracy for its slice complement. However, the observations with petal lengths in the range [4.95,5.6) have a slightly lower percentage of correct classifications than all other observations.

Improve Performance on Data Slices Using Synthetic Data

Open Live Script

Train a regression model on a mix of numeric and categorical data. Use sliceMetrics to slice the test data according to two predictors. Compute the mean squared error for the data slices. To improve the general model performance across the data slices, generate synthetic observations and use them to retrain the model.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Bin the Model_Year data to form a categorical variable, and combine the variable with a subset of the other measurements into a table. Remove observations with missing values from the table. Then, display the first eight observations in the table.

load carbig
ModelDecade = discretize(Model_Year,[70 80 89], ...
    "categorical",["70s","80s"]);
ModelDecade = categorical(ModelDecade,Ordinal=false);
cars = table(Acceleration,Displacement,Horsepower, ...
    ModelDecade,Weight,MPG);
cars = rmmissing(cars);
head(cars)

    Acceleration    Displacement    Horsepower    ModelDecade    Weight    MPG
    ____________    ____________    __________    ___________    ______    ___

          12            307            130            70s         3504     18 
        11.5            350            165            70s         3693     15 
          11            318            150            70s         3436     18 
          12            304            150            70s         3433     16 
        10.5            302            140            70s         3449     17 
          10            429            198            70s         4341     15 
           9            454            220            70s         4354     14 
         8.5            440            215            70s         4312     14

Partition the data into training data and test data. Reserve approximately 50% of the observations for computing slice metrics, and use the rest of the observations for model training.

rng(0,"twister") % For reproducibility of partition
cv = cvpartition(length(cars.MPG),Holdout=0.5);
trainingCars = cars(training(cv),:);
testCars = cars(test(cv),:);

Train a regression tree model using the training data. Then, compute metrics on the test data slices determined by the decade of manufacture and the weight of the car. Partition the numeric Weight values into three bins. Because ModelDecade is a categorical variable with two categories, sliceMetrics creates six data slices.

Mdl = fitrtree(trainingCars,"MPG");
sliceResults = sliceMetrics(Mdl,testCars,["ModelDecade","Weight"], ...
    NumBins=3)

sliceResults = 
sliceMetrics evaluated on ModelDecade and Weight slices:

    ModelDecade         Weight         NumObservations    Error     TStatistic      PValue      EffectSize
    ___________    ________________    _______________    ______    __________    __________    __________

        70s        [1649, 2812.7)            64           18.392     0.23812         0.81224      1.3097  
        70s        [2812.7, 3976.3)          53           6.6804     -4.3456      2.3745e-05     -14.843  
        70s        [3976.3, 5140]            36           5.6089     -4.7484      4.0884e-06     -14.578  
        80s        [1649, 2812.7)            32           30.874      2.2891         0.02727      15.972  
        80s        [2812.7, 3976.3)          11           64.626      2.4672        0.032784      49.918  
        80s        [3976.3, 5140]             0              NaN         NaN             NaN         NaN  


  Properties, Methods

For the cars made in the 80s with a weight in the range [2812.7,3976.3) (that is, the observations in slice 5), the mean squared error (MSE) is much higher than the MSE for the cars in the other data slices.

Generate 500 synthetic observations using the synthesizeTabularData function. By default, the function uses a binning technique to learn the distribution of the variables in trainingCars before synthesizing the data.

rng(10,"twister") % For reproducibility of data generation
syntheticCars = synthesizeTabularData(trainingCars,500);

Combine the training observations with the synthetic observations. Use the combined data to retrain the regression tree model.

newTrainingCars = [trainingCars;syntheticCars];
newMdl = fitrtree(newTrainingCars,"MPG");

Compute metrics on the same test data slices using the retrained model.

newSliceResults = sliceMetrics(newMdl,testCars,["ModelDecade","Weight"], ...
    NumBins=3)

newSliceResults = 
sliceMetrics evaluated on ModelDecade and Weight slices:

    ModelDecade         Weight         NumObservations    Error     TStatistic      PValue      EffectSize
    ___________    ________________    _______________    ______    __________    __________    __________

        70s        [1649, 2812.7)            64           20.986      1.5012         0.13736      8.5476  
        70s        [2812.7, 3976.3)          53           9.9331     -2.0166        0.045209     -7.2595  
        70s        [3976.3, 5140]            36           5.4484     -4.3062      2.6575e-05     -11.982  
        80s        [1649, 2812.7)            32           20.422      1.0664         0.29206       6.206  
        80s        [2812.7, 3976.3)          11           24.162      1.0643         0.30927       9.463  
        80s        [3976.3, 5140]             0              NaN         NaN             NaN         NaN  


  Properties, Methods

Visually compare the MSE values for the data slices.

tiledlayout(1,2)

nexttile
plot(sliceResults)
ylim([0 70])
xticklabels(1:6)
legend(Location="northwest")

nexttile
plot(newSliceResults)
ylim([0 70])
xticklabels(1:6)
legend(Location="northwest")

Figure contains 2 axes objects. Axes object 1 with title Slice and Complement Error, xlabel ModelDecade and Weight slices, ylabel Error contains 2 objects of type bar. These objects represent Slice, Complement. Axes object 2 with title Slice and Complement Error, xlabel ModelDecade and Weight slices, ylabel Error contains 2 objects of type bar. These objects represent Slice, Complement.

The MSE for slice 5 is lower for the model trained with the training and synthetic observations (newMdl) than for the model trained with only the training data (Mdl).

Input Arguments

collapse all

`sliceResults` — Slice metrics results
`sliceMetrics` object

Slice metrics results, specified as a sliceMetrics object.

`metric` — Metric to plot
character vector | string scalar

Metric to plot, specified as a character vector or string scalar containing one metric name. The following tables describe the supported metrics. The default is "accuracy" for classification models and "error" for regression models.

Classification Metrics

Value	Description
`"TruePositives"` or `"tp"`	Number of true positives (TP)
`"FalseNegatives"` or `"fn"`	Number of false negatives (FN)
`"FalsePositives"` or `"fp"`	Number of false positives (FP)
`"TrueNegatives"` or `"tn"`	Number of true negatives (TN)
`"SumOfTrueAndFalsePositives"` or `"tp+fp"`	Sum of TP and FP
`"RateOfPositivePredictions"` or `"rpp"`	Rate of positive predictions (RPP), `(TP+FP)/(TP+FN+FP+TN)`
`"RateOfNegativePredictions"` or `"rnp"`	Rate of negative predictions (RNP), `(TN+FN)/(TP+FN+FP+TN)`
`"FalseNegativeRate"`, `"fnr"`, or `"miss"`	False negative rate (FNR), or miss rate, `FN/(TP+FN)`
`"TrueNegativeRate"`, `"tnr"`, or `"spec"`	True negative rate (TNR), or specificity, `TN/(TN+FP)`
`"PositivePredictiveValue"`, `"ppv"`, `"prec"`, or `"precision"`	Positive predictive value (PPV), or precision, `TP/(TP+FP)`
`"NegativePredictiveValue"` or `"npv"`	Negative predictive value (NPV), `TN/(TN+FN)`
`"Accuracy"`, `"accu"`, or `"accuracy"`	Accuracy, `(TP+TN)/(TP+FN+FP+TN)`
`"F1Score"` or `"f1score"`	F1 score, `2TP/(2TP+FP+FN)`
`"OddsRatio"` or `"oddsratio"`	Odds ratio, which is `numSliceIncorrect/numSliceCorrect` divided by `numCompIncorrect/numCompCorrect` `numSliceIncorrect` is the number of misclassified observations in the data slice. `numSliceCorrect` is the number of correctly classified observations in the data slice. `numCompIncorrect` is the number of misclassified observations in the complement of the data slice. `numCompCorrect` is the number of correctly classified observations in the complement of the data slice.
`"PValue"` or `"pvalue"`	p-value for the test of the null hypothesis that there is no association between slice membership and error rate (odds ratio = 1), against the alternative hypothesis that there is an association (odds ratio ≠ 1). The software uses Fisher's exact test for small counts (see `fishertest`) and the Chi-squared test otherwise.
`"EffectSize"` or `"effect"`	Mean-difference effect size for the test of the null hypothesis that there is no association between slice membership and error rate (odds ratio = 1), against the alternative hypothesis that there is an association (odds ratio ≠ 1) (see `meanEffectSize`)
`"TStatistic"` or `"tstat"`	t-statistic for Welch's t-test of the slice error rate against the slice complement error rate (see `ttest2`)

Regression Metrics

Value	Description
`"Error"` or `"error"`	Mean squared error (MSE)
`"TStatistic"` or `"tstat"`	t-statistic for Welch's t-test of the slice error against the slice complement error (see `ttest2`)
`"PValue"` or `"pvalue"`	p-value for Welch's t-test of the slice error against the slice complement error (see `ttest2`)
`"EffectSize"` or `"effect"`	Mean-difference effect size for the slice error against the slice complement error (see `meanEffectSize`)

For most metrics, plot displays bars for the data slices and their complements. For metrics that directly compare slices to their complements, such as "oddsratio", "tstat", "pvalue", and "effect", the plot displays bars for the data slices only.

Data Types: char | string

References

[1] Chung, Yeounoh, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, and Steven Euijong Whang. “Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach.” IEEE Transactions on Knowledge and Data Engineering 32, no. 12 (2020): 2284–96.

Version History

Introduced in R2026a