Main Content

Classify Streaming Webcam Video Using SlowFast Video Classifier

This example shows how to classify a streaming video from a webcam using a pretrained SlowFast Video Classifier. To learn more about how to train a video classifier network for your dataset, see Gesture Recognition using Videos and Deep Learning.

Download Pretrained Video Classifier

Download the pretrained SlowFast video classifier.

downloadFolder = fullfile(tempdir,"gesture");
zipFile = "";
if ~isfile(fullfile(downloadFolder,zipFile))
    disp("Downloading the pretrained network...");    
    downloadURL = "" + zipFile;
    zipFile = fullfile(downloadFolder,zipFile);

Load the pretrained SlowFast video classifier.

pretrainedDataFile = fullfile(downloadFolder,"slowFastPretrained_fourClasses.mat");
pretrained = load(pretrainedDataFile);
slowFastClassifier =;

Display the class label names of the pretrained video classifier. Any gesture such as "clapping" and "wavingHello" on to the webcam will be recognized by the SlowFast Video Classifier.

classes = slowFastClassifier.Classes
classes = 4×1 categorical

Setup the Webcam and the Video Player

In this example, a webcam object is used to capture streaming video. A Video Player is used to display the streaming video along with the predicted class.

Create a webcam object using the webcam function.

cam = webcam;

Create a Video Player using vision.VideoPlayer function. Make sure to place the Video Player in a position where you can clearly see the streaming video when running the classification.

player = vision.VideoPlayer;

Classify the Webcam Streaming Video

Specify how frequently the classifier should be applied to incoming video frames.

classifyInterval = 10;

A value of 10 balances runtime performance against classification performance. Increase this value to improve runtime performance at the cost of missing gestures from the live video stream.

Obtain the sequence length of the SlowFast Video Classifier. Classify only after capturing at least sequenceLength number of frames from the webcam.

sequenceLength = slowFastClassifier.InputSize(4);

Specify the maximum number of frames to capture in a loop using the maxNumFrames variable. Make sure you wave one of your hands to recognize "wavingHello" label, and clap using both your hands for the classifier to recognize "clapping" label.

maxNumFrames = 280;

Capture the webcam snapshot in a loop. Update the streaming video sequence of the classifier using the updateSequence method, and classify the streaming sequence using the classifySequence method.

numFrames = 0;
text = "";

while numFrames <= maxNumFrames
    frame = snapshot(cam);
    numFrames = numFrames + 1;
    slowFastClassifier = updateSequence(slowFastClassifier,frame);
    if mod(numFrames, classifyInterval) == 0 && numFrames >= sequenceLength
        [label,scores] = classifySequence(slowFastClassifier);
        if ~isempty(label)
            text = string(label) + "; " + num2str(max(scores), "%0.2f");
    frame = insertText(frame,[30,30],text,'FontSize',18);