Main Content

Visual Tracking of Occluded and Unresolved Objects

This example shows how to resolve challenging tracking scenarios when objects are occluded or when they are in close proximity to each other. The example revisits the Motion-Based Multiple Object Tracking example available in the Computer Vision Toolbox™. The problem of motion-based object tracking can be divided into two parts:

  1. Detecting moving objects in each frame

  2. Tracking the objects detected in each video frame over time

The example uses multi-object trackers available in the Sensor Fusion and Tracking Toolbox™ to elaborate on the tracking part, which includes the following stages:

  1. Associating the detections corresponding to the same object over time

  2. Managing the emergence and disappearance of objects in the scene

  3. Filtering the noisy measurements made by the detector

Understand the Challenges in Video-Based Tracking

This section presents two major challenges of tracking moving objects in a video frame: Detecting the objects in the presence of occlusion and providing resolved detections when the objects are close to each other.

Video and Detector

Define a video reader and video player. This example is based on the atrium video, in which individuals are walking in an atrium with some plants that can potentially occlude the people.

filename = "atrium.mp4";
vidReader = VideoReader(filename);
vidPlayer = vision.DeployableVideoPlayer;

One way to detect moving objects when the camera is static is to analyze changes in the video frame, called foreground, relative to the static frame, considered background. The following code section creates the detector objects that separate foreground from background and connect areas of foreground into blobs. A blob detector is a simple, yet effective, detector because it does not require any prior knowledge about the moving objects.

minBlobArea = 400; % Minimum blob size, in pixels, to be considered as a detection
detectorObjects = setupDetectorObjects(minBlobArea);

Run the video and observe the detection, in purple boxes, that are created.

interestingFrameInds = [150,160,170,330,350,370,Inf];
interestingFrames = cell(1,numel(interestingFrameInds)-1);
ind = 0;
frameCount = 0;
numFrames = vidReader.NumFrames;
bboxes = cell(1,numFrames);
centroids = cell(1,numFrames);
while hasFrame(vidReader)
    % Read a video frame and detect objects in it.
    frame = readFrame(vidReader); % Read frame
    frameCount = frameCount + 1; % Increment frame count
    
    % Detect blocs in the video frame
    [centroids{frameCount}, bboxes{frameCount}] = detectBlobs(detectorObjects, frame);

	% Annotate frame with blobs
    frame = insertShape(frame,"rectangle",bboxes{frameCount}, ...
        Color="magenta",LineWidth=4);

    % Add frame count in the top right corner
    frame = insertText(frame,[0,0],"Frame: "+int2str(frameCount), ...
        BoxColor="black",TextColor="yellow",BoxOpacity=1);

    % Display Video
    step(vidPlayer,frame);

    % Grab interesting frames
    if frameCount == interestingFrameInds(ind+1)
        ind = ind + 1;
        interestingFrames{ind} = frame;
    end
end

Occlusion and Missed Detections

The first challenge with vision-based tracking is occlusion. Occlusion happens when a moving object moves behind another object, whether moving or static. In the series of pictures below, follow the detection of the person on the left when he is about to go behind the plant (frame 150), when he is completely occluded by the plant (frame 160), and when he emerges on the other side of the plant (frame 170).

imshow(interestingFrames{1});

imshow(interestingFrames{2});

imshow(interestingFrames{3});

Unresolved Detections

A second common challenge in tracking is when the detector is unable to resolve two or more objects when they are near each other. In this video, two individuals approach each other and then continue on their way. As long as they are far from each other, the blob detector can resolve two distinct blobs (frame 330). However, when the two individuals are too close to each other, the blob detector merges the two blobs into a single unresolved blob (frame 350). Only after the two people separate, the blob detector can resolve them and provides two separate detections (frame 370).

imshow(interestingFrames{4});

imshow(interestingFrames{5});

imshow(interestingFrames{6});

Use Multi-Object Trackers to Overcome Challenges

Multi-object trackers provide solutions that overcome the challenges described in the previous section.

Occlusion: To keep track of objects that are temporarily occluded, a multi-object tracker uses a track management algorithm. A track management algorithm is responsible for three things:

  1. Start a new track when a new object appears in the frame, which is called track initialization.

  2. Reduce the number of false tracks, which may be caused by false detections from the detector, using a confirmation logic. For example, it may count how many detections have been associated with the track before it is considered as real or confirmed.

  3. Keep tracks that are temporarily occluded a while longer using a deletion logic. For example, the tracker may count how many frames the track was not associated to any detection before it gets deleted.

Unresolved detections: The way the tracker handles unresolved detections depends on the association algorithm that it uses. If the tracker makes crisp association decisions, like a global nearest neighbor tracker does, it can only associate the detection to one track and the other track is considered undetected. If the tracker uses an association algorithm that is probabilistic or allows for multiple hypotheses, both tracks may be maintained for a while longer.

Convert Blob Detections to objectDetection Objects

All the trackers in the Sensor Fusion and Tracking Toolbox™ require an input in the objectDetection (Sensor Fusion and Tracking Toolbox) format. This section shows how to convert the blob detections provided by the blob detector into this format. The blob detection consists of the centroid, which the tracker will track, and a bounding box, which the tracker will use to draw the tracks. In objectDetection terms, the centroid is the Measurement and the bounding box that is only used for visualization is ObjectAttributes. The objectDetection also requires Time, which in this case will be the frame count. Since the Measurement is reported in pixels and the Time is reported in frames, the tracker tracks the centroid position in pixels and velocity in pixels per frame units.

detectionHistory = cell(1,numFrames);
for frameCount = 1:numFrames
    thisFrameCentroids = centroids{frameCount};
    thisFrameBboxes = bboxes{frameCount};
    numMeasurementsInFrame = size(thisFrameCentroids,1);
    detectionsInFrame = cell(numMeasurementsInFrame,1);
    for detCount = 1:numMeasurementsInFrame
        detectionsInFrame{detCount} = objectDetection(...
            frameCount, ... % Use frame count as time
            thisFrameCentroids(detCount,:), ... % Use centroid as measurement in pixels
            MeasurementNoise = diag([100 100]), ... % Centroid measurement noise in pixels
            ObjectAttributes = struct(BoundingBox = thisFrameBboxes(detCount,:)) ... % Attach bounding box information
            );
    end
    detectionHistory{frameCount} = detectionsInFrame;
end

Define Multi-Object Tracker

To use a multi-object tracker, first define the object. The following code section defines a global nearest neighbor (GNN) tracker, trackerGNN (Sensor Fusion and Tracking Toolbox). The term GNN relates to how the tracker associates detections with tracks, in this case using the best association as found by the Hungarian algorithm. The benefit of GNN is its simplicity, but, as the next section shows, different association algorithms can lead to better tracking.

Generally, trackerGNN can handle any number of sensors and any number of tracks. In this video, there are only several people and one sensor. Therefore, define the tracker for one sensor and 10 tracks.

tracker = trackerGNN(MaxNumSensors=1,MaxNumTracks=10);

Next, define how to track the people in the video. The video has a high frame rate of 30 frames per second. Within the short periods of time between frames, people motion can be described as mostly constant velocity. Therefore, tracking the centroid of the bounding box as a constant velocity linear Kalman filter is the simplest way. The function initcvkf (Sensor Fusion and Tracking Toolbox) defines an initialization function for a constant velocity Kalman filter.

tracker.FilterInitializationFcn = @initcvkf;

Finally, a multi-object tracker needs to handle the occlusion and appearance/disappearance of people from the frame. The ConfirmationThreshold and DeletionThreshold properties control how quickly a track is confirmed after appearance and how quickly it is deleted after disappearance or in cases of occlusion. As seen in the previous section, there are very few false detections in the video. Therefore, ConfirmationThreshold can be as low as 2-out-of-2 or even 1-out-of-1. Setting DeletionThreshold requires more tuning based on the frame rate and length of occlusion events. 23-out-of-23 means that a track is deleted if it is not associated with any detection for 23 consecutive frames.

tracker.ConfirmationThreshold = [2 2]; % Quick to confirm
tracker.DeletionThreshold = [23 23];   % Slow to delete

Run Multi-Object Tracker

The following code block runs the tracker using the detections gathered earlier. The tracker outputs, called tracks, are displayed using a yellow bounding box annotated over the video frame. When a track is not assigned to any detections in the current frame, it is marked as predicted in the annotation.

vidReader.CurrentTime = 0; % Reset the video reader
ind = 0;
frameCount = 0;
numFrames = vidReader.NumFrames;
if isempty(vidPlayer.Location)
    vidPlayer = vision.DeployableVideoPlayer;
end

while hasFrame(vidReader)
    % Read a video frame and detect objects in it.
    frame = readFrame(vidReader); % Read frame
    frameCount = frameCount + 1; % Increment frame count
    
    % Update the tracker
    if isLocked(tracker) || ~isempty(detectionHistory{frameCount})
        tracks = tracker(detectionHistory{frameCount}, frameCount);
    else
        tracks = objectTrack.empty;
    end

    % Add track information to the frame
    frame = insertTracksToFrame(frame, tracks);

    % Add frame count on the top right corner
    frame = insertText(frame,[0,0],"Frame: "+int2str(frameCount), ...
        BoxColor="black",TextColor="yellow",BoxOpacity=1);

    % Display Video
    step(vidPlayer,frame);

    % Grab interesting frames
    if frameCount == interestingFrameInds(ind+1)
        ind = ind + 1;
        interestingFrames{ind} = frame;
    end
end

Observe the Results

This section reviews the same occlusion and unresolved detection situations showed in the first section. Observe how the tracker keeps predicting the individuals in the frame even as they are not detected due to occlusion or when the detection is unresolved. Keeping the same track ID, as indicated by the integer number above the bounding box, shows that the tracker maintains them as the same object. This is important for continuity from frame to frame as well as counting the total number of people in the scene.

figure;imshow(interestingFrames{1});

figure;imshow(interestingFrames{2});

figure;imshow(interestingFrames{3});

figure;imshow(interestingFrames{4});

figure;imshow(interestingFrames{5});

figure;imshow(interestingFrames{6});

Explore Other Trackers and Track Management Settings

As mentioned above, GNN is just one type of association algorithm. Other association types include joint probabilistic data association (JPDA) and multiple hypothesis tracking (MHT). These algorithms are better at handling cases of ambiguity in the association of detections with tracks, such as the one that the unresolved detection makes. The Sensor Fusion and Tracking Toolbox provides trackers that are based on JPDA and MHT, trackerJPDA (Sensor Fusion and Tracking Toolbox) and trackerTOMHT (Sensor Fusion and Tracking Toolbox). All three trackers follow the same conventions for inputs and outputs as the trackerGNN. Therefore, you can easily switch between them and see how well they work.

In this section, you can use the provided controls to set the confirmation and deletion thresholds. Then click on "Run Section" on the toolstrip to run the tracker with the new settings.

By default, the example shows how the JPDA tracker can have a lower DeletionThreshold setting because it probabilistically associates the unresolved detection with both tracks and thus both of them are considered assigned to some degree. Lowering the DeletionThreshold value allows for faster deletion when an object goes out of frame and the track should be deleted.

tracker = trackerJPDA(MaxNumSensors=1,MaxNumTracks=10,FilterInitializationFcn=@initcvkf);
tracker.ConfirmationThreshold = sort([2, 2]); % How fast to confirm a track
tracker.DeletionThreshold = sort([11, 11]); % How long to keep a track
frames = runTracker(vidReader,tracker,detectionHistory,interestingFrameInds);
figure;imshow(frames{1});

figure;imshow(frames{2});

figure;imshow(frames{3});

figure;imshow(frames{4});

figure;imshow(frames{5});

figure;imshow(frames{6});

Use a Different Filter

While a constant velocity Kalman filter is sufficient in this case, sometimes lower frame rates or more maneuvering objects may require more sophisticated models and filters. This section shows how to use a different filter type, in this case a particle filter, trackingPF (Sensor Fusion and Tracking Toolbox). A particle filter maintains the uncertainty about the track state as a collection of particles, which are predicted and corrected using nonlinear functions, and are resampled by the filter. Visualize these particles by small circles to observe how the uncertainty grows when the track is unassigned to a detection and has to be predicted.

release(tracker);
tracker.FilterInitializationFcn = @initcv2dpf;
frames = runTracker(vidReader, tracker, detectionHistory, interestingFrameInds);
figure;imshow(frames{1});

figure;imshow(frames{2});

figure;imshow(frames{3})

Summary

This example shows how to use multi-object trackers to track people in a video. The trackers use different association algorithms and allow you to maintain consistent tracking of individuals in the video. You can tune various parameters, for example the confirmation and deletion thresholds, of each tracker to improve tracking results.

The example also shows how you can visualize the tracks and determine which tracker to use and how to tune it. You can also use track metrics, for example the trackCLEARMetrics (Sensor Fusion and Tracking Toolbox), as shown in the Implement Simple Online and Realtime Tracking (Sensor Fusion and Tracking Toolbox) example, which requires having ground truth.

This example does not show how to tune the trackers. Tracker tuning is explained in the Tuning a Multi-Object Tracker (Sensor Fusion and Tracking Toolbox) example.

Supporting Functions

Create Detector Objects

This function creates a foreground detector and a blob analysis object. These two objects are used to detect moving objects in the frame.

The foreground detector segments moving objects from the background. It outputs a binary mask, where the pixel value of 1 corresponds to the foreground and the value of 0 corresponds to the background.

Connected groups of foreground pixels are likely to correspond to moving objects. The blob analysis System object finds such groups (called blobs or connected components) and computes their characteristics, such as their areas, centroids, and the bounding boxes.

function detectorObjects = setupDetectorObjects(minBlobArea)
% Create System objects for foreground detection and blob analysis

detectorObjects.detector = vision.ForegroundDetector(NumGaussians = 3, ...
    NumTrainingFrames = 40, MinimumBackgroundRatio = 0.7);

detectorObjects.blobAnalyzer = vision.BlobAnalysis(BoundingBoxOutputPort = true, ...
    AreaOutputPort = true, CentroidOutputPort = true, MinimumBlobArea = minBlobArea);
end

Detect Blobs

Use the two detector objects to detect blobs in the frame.

function [centroids, bboxes] = detectBlobs(detectorObjects, frame)
% Expected uncertainty (noise) for the blob centroid.

% Detect foreground.
mask = detectorObjects.detector.step(frame);

% Apply morphological operations to remove noise and fill in holes.
mask = imopen(mask, strel(rectangle = [6, 6]));
mask = imclose(mask, strel(rectangle = [50, 50]));
mask = imfill(mask, "holes");

% Perform blob analysis to find connected components.
[~, centroids, bboxes] = detectorObjects.blobAnalyzer.step(mask);
end

Insert Tracks Information

This function adds bound box annotations to represent the tracks in the frame.

function frame = insertTracksToFrame(frame, tracks)
numTracks = numel(tracks);
boxes = zeros(numTracks, 4);
ids = zeros(numTracks, 1, "int32");
predictedTrackInds = zeros(numTracks, 1);
for tr = 1:numTracks
    % Get bounding boxes.
    boxes(tr, :) = tracks(tr).ObjectAttributes.BoundingBox;
    boxes(tr, 1:2) = (tracks(tr).State(1:2:3))'-boxes(tr,3:4)/2;

    % Get IDs.
    ids(tr) = tracks(tr).TrackID;

    if tracks(tr).IsCoasted
        predictedTrackInds(tr) = tr;
    end
end

predictedTrackInds = predictedTrackInds(predictedTrackInds > 0);

% Create labels for objects that display the predicted rather
% than the actual location.
labels = cellstr(int2str(ids));

isPredicted = cell(size(labels));
isPredicted(predictedTrackInds) = {' predicted'};
labels = strcat(labels, isPredicted);

% Draw the objects on the frame.
frame = insertObjectAnnotation(frame, "rectangle", boxes, labels, ...
    TextBoxOpacity = 0.5);
end

Run the Tracker

This function reads the video frame, runs the tracker with the detections at each frame, and captures interesting frames.

function frames = runTracker(vidReader, tracker, detectionHistory, interestingFrameInds)
vidReader.CurrentTime = 0; % Reset the video reader
ind = 0;
frameCount = 0;
frames = cell(1,numel(interestingFrameInds)-1);
vidPlayer = vision.DeployableVideoPlayer;
isPF = isParticleFilterUsed(tracker,detectionHistory);
while hasFrame(vidReader)
    % Read a video frame and detect objects in it.
    frame = readFrame(vidReader); % Read frame
    frameCount = frameCount + 1; % Increment frame count
    
    % Update the tracker
    if isLocked(tracker) || ~isempty(detectionHistory{frameCount})
        tracks = tracker(detectionHistory{frameCount}, frameCount);
    else
        tracks = objectTrack.empty;
    end

    % Add track information to the frame
    frame = insertTracksToFrame(frame, tracks);

    % Add particles to display
    if isPF
        for trackInd = 1:numel(tracks)
            % Get particles
            particles = getTrackFilterProperties(tracker, tracks(trackInd).TrackID, "Particles");
            positions = particles{1};
            positions = positions([1,3],:)';
            % Add particles on frame
            frame = insertMarker(frame, positions, "circle", Color = "yellow", Size = 1);
        end
    end

    % Add frame count in the top right corner
    frame = insertText(frame, [0,0], "Frame: " + frameCount, ...
        BoxColor = "black", TextColor = "yellow", BoxOpacity = 1);

    % Display Video
    step(vidPlayer,frame);

    % Grab interesting frames
    if frameCount == interestingFrameInds(ind+1)
        ind = ind + 1;
        frames{ind} = frame;
    end
end
end

isParticleFilterUse

This function returns true if the tracker uses a particle filter.

function isPF = isParticleFilterUsed(tracker, detectionHistory)
isemptyCell = cellfun(@(d) isempty(d), detectionHistory);
ind = find(~isemptyCell, 1, "first");
filter = tracker.FilterInitializationFcn(detectionHistory{ind}{1});
isPF = isa(filter, "trackingPF");
end

cvmeas2d

This function returns the two-dimensional measurement of the filter state.

function meas = cvmeas2d(state, varargin)
% Measurement model for 2d constant velocity
meas3d = cvmeas(state,varargin{:});
meas = meas3d(1:2,:);
end

initcv2dpf

This function initializes a 2-D constant velocity particle filter based on an unassigned detection.

function pf = initcv2dpf(detection)
%INITCV2DPF Filter initialization function 2D constant velocity particle filter
% PF = INITCV2DPF(DETECTION) initialized PF, a trackingPF, filter using
% DETECTION, and objectDetection object. PF uses a 2D constant velocity
% measurement model.
%
% The function follows similar steps as initcvpf, but uses the knowledge
% that the measurement is the position in rectangular coordinates.

classToUse = class(detection.Measurement);

% Create Process Noise matrix
scaleAccel = ones(1, classToUse);
Q = eye(2, classToUse) * scaleAccel;

% Store measurement properties
n = numel(detection.Measurement);
if isscalar(detection.MeasurementNoise)
    measurementNoise = detection.MeasurementNoise * eye(n,n,classToUse);
else
    measurementNoise = cast(detection.MeasurementNoise,classToUse);
end

% Number of particles
numParticles = 1000;

%% Initialize the particle filter in Rectangular frame using state and state covariance
posMeas = detection.Measurement(:);
velMeas = zeros(n,1,classToUse);
posCov = cast(detection.MeasurementNoise,classToUse);
velCov = eye(n,n,classToUse);

H1d = cast([1 0], classToUse);
Hpos = blkdiag(H1d, H1d);                       % position = Hpos * state
Hvel = [zeros(2,1,classToUse),Hpos(:,1:end-1)]; % velocity = Hvel * state
state = Hpos' * posMeas(:) + Hvel' * velMeas(:);
stateCov = Hpos' * posCov * Hpos + Hvel' * velCov * Hvel;
% Measurement related properties are not set for invalid detection.
pf = trackingPF(@constvel,@cvmeas2d,state, NumParticles = numParticles, ...
    StateCovariance = stateCov, ProcessNoise = Q, ...
    MeasurementNoise = measurementNoise, HasAdditiveProcessNoise = false);
setMeasurementSizes(pf,n,n);
end