Main Content

Object Detection

Label ground truth and detect objects using pretrained AI models like YOLO and Grounding DINO, create custom detectors using transfer learning

Computer Vision Toolbox™ provides a comprehensive set of tools and functions to build, train, evaluate, and deploy object detection models using both deep learning and traditional computer vision techniques. You can start by creating labeled ground truth using the Image Labeler and Video Labeler apps, which support interactive and AI-assisted annotation of bounding boxes around objects in images and video frames.

Once you have labeled data, you can choose from a wide range of pretrained deep learning object detectors, including YOLO v2, YOLO v3, YOLO v4, YOLOX, RTMDet, SSD, and Grounding DINO. The toolbox also contains specialized detectors like peopleDetector and faceDetector for human and face recognition tasks. You can use these models directly for inference or as a starting point for transfer learning, enabling you to customize them to specific data sets and applications. For more information, see Get Started with Object Detection Using Deep Learning. For classical object detection methods, the toolbox includes support for the aggregate channel features (ACF) and cascade (Viola-Jones) object detectors.

The toolbox provides functions for training object detectors using transfer learning. The toolbox also provides functionality to manage and preprocess training data as well as data augmentation tools, that ensure robust model training by simulating real-world variations. For more information, see Get Started with Image Preprocessing and Augmentation for Deep Learning.

After you generate detections using pretrained or custom models, you can use the Object Detector Analyzer app to compare the detection results against ground truth data. The app enables you to evaluate key performance metrics, such as the confusion matrix, precision, recall, F1 score and mean Average Precision (mAP), across a range of intersection over union (IOU) thresholds. Alternatively, you can use the evaluateObjectDetection function to evaluate detection performance metrics. For more information, see Evaluate Object Detector Performance and Get Started with Object Detector Analyzer App.

Three images: the first contains labeled boats, the second a diagram of a neural network, and the third the keypoints from a person detector overlaid on the image of the people it has detected.

Apps

Image LabelerLabel images for computer vision applications
Video LabelerLabel video for computer vision applications
Object Detector AnalyzerInteractively visualize and evaluate object detection results against ground truth (Since R2026a)

Functions

expand all

Deep Learning Detectors

groundingDinoObjectDetectorDetect and localize objects using Grounding DINO object detector (Since R2026a)
rtmdetObjectDetectorDetect objects using RTMDet object detector (Since R2024b)
ssdObjectDetectorDetect objects using SSD deep learning detector
yolov2ObjectDetectorDetect objects using YOLO v2 object detector
yolov3ObjectDetectorDetect objects using YOLO v3 object detector
yolov4ObjectDetectorDetect objects using YOLO v4 object detector (Since R2022a)
yoloxObjectDetectorDetect objects using YOLOX object detector (Since R2023b)
peopleDetectorDetect people using pretrained deep learning object detector (Since R2024b)
faceDetectorDetect faces using pretrained RetinaFace face detector (Since R2025a)
detectTextCRAFTDetect texts in images by using CRAFT deep learning model (Since R2022a)
imfindcirclesYOLOFind circles using YOLOX object detector (Since R2026a)

Feature-based Detectors

acfObjectDetectorDetect objects using aggregate channel features
peopleDetectorACFDetect people using aggregate channel features
vision.CascadeObjectDetectorDetect objects using the Viola-Jones algorithm
vision.ForegroundDetectorForeground detection using Gaussian mixture models
vision.BlobAnalysisProperties of connected regions

Select Detected Objects

selectStrongestBboxSelect strongest bounding boxes from overlapping clusters using nonmaximal suppression (NMS)
selectStrongestBboxMulticlassSelect strongest multiclass bounding boxes from overlapping clusters using nonmaximal suppression (NMS)

Load Training Data

boxLabelDatastoreDatastore for bounding box label data
groundTruthGround truth label data
imageDatastoreDatastore for image data
objectDetectorTrainingDataCreate training data for an object detector
combineCombine data from multiple datastores

Train Deep Learning Based Object Detectors

trainSSDObjectDetectorTrain SSD deep learning object detector
trainYOLOv2ObjectDetectorTrain YOLO v2 object detector
trainYOLOv3ObjectDetectorTrain YOLO v3 object detector (Since R2024a)
trainYOLOv4ObjectDetectorTrain YOLO v4 object detector (Since R2022a)
trainYOLOXObjectDetectorTrain YOLOX object detector (Since R2023b)

Train Feature-Based Object Detectors

trainACFObjectDetectorTrain ACF object detector
trainCascadeObjectDetectorTrain cascade object detector model

Augment and Preprocess Training Data for Deep Learning

balanceBoxLabelsBalance bounding box labels for object detection
bboxcropCrop bounding boxes
bboxeraseRemove bounding boxes
bboxresizeResize bounding boxes
bboxwarpApply geometric transformation to bounding boxes
bbox2pointsConvert rectangle to corner points list
blockLocationsWithROISelect image block locations that contain bounding box ROIs (Since R2025a)
imwarpApply geometric transformation to image
imcropCrop image
imresizeResize image
randomAffine2dCreate randomized 2-D affine transformation
centerCropWindow2dCreate rectangular center cropping window
randomWindow2dRandomly select rectangular region in image
integralImageCalculate 2-D integral image
transformTransform datastore

R-CNN (Regions With Convolutional Neural Networks)

roiAlignLayerNon-quantized ROI pooling layer for Mask-CNN
roiMaxPooling2dLayerNeural network layer used to output fixed-size feature maps for rectangular ROIs
roialignNon-quantized ROI pooling of dlarray data (Since R2021b)

YOLO v2 (You Only Look Once version 2)

yolov2TransformLayerCreate transform layer for YOLO v2 object detection network
spaceToDepthLayerSpace to depth layer

Focal Loss

focalCrossEntropyCompute focal cross-entropy loss

SSD (Single Shot Detector)

ssdMergeLayerCreate SSD merge layer for object detection

Anchor Boxes

estimateAnchorBoxesEstimate anchor boxes for deep learning object detectors
evaluateObjectDetectionEvaluate object detection data set against ground truth (Since R2023b)
objectDetectionMetricsObject detection quality metrics (Since R2023b)
mAPObjectDetectionMetricMean average precision (mAP) metric for object detection (Since R2024a)
bboxOverlapRatioCompute bounding box overlap ratio
bboxPrecisionRecallCompute bounding box precision and recall against ground truth
driseExplain object detection network predictions using D-RISE (Since R2024a)
cuboid2imgProject cuboids from 3-D world coordinates to 2-D image coordinates (Since R2022b)
insertObjectAnnotationAnnotate truecolor or grayscale image or video
insertObjectMask Insert masks in image or video stream
insertShapeInsert shapes in image or video
insertTextInsert text in image or video
showShapeDisplay shapes on image, video, or point cloud

Blocks

Deep Learning Object DetectorDetect objects using trained deep learning object detector (Since R2021b)

Topics

Create Ground Truth and Training Data for Object Detection

Detect Objects Using Pretrained Detectors

Evaluate Object Detection Results

Featured Examples