Get Started with Segment Anything Model for Image Segmentation
Perform image segmentation using the Image Processing Toolbox™ Model for Segment Anything Model support package. Using the support package, you can perform image segmentation on a data set by using the pretrained Segment Anything Model (SAM), or perform downstream tasks such as instance segmentation by passing the output of an object detector network as an input to the SAM. To learn more about the model and the training data, see the SA-1B Dataset page on the Meta website.
The SAM is a zero-shot image segmentation model that uses deep learning neural networks to accurately segment objects within images without requiring training. SAM allows you to actively guide and refine segmentation by providing feedback through visual prompts, such as points, boxes, and mask logits.
The SAM architecture consists of an image and visual prompt encoder and a mask decoder. This enables you to reuse the same image embeddings with different visual prompts. For a given image embedding, the image encoder and mask decoder use the visual prompt to predict a mask. Because the SAM enables you to predict multiple masks for a single prompt, you can use the SAM to segment ambiguous entities, such as both a person and the shirt they wear.
Install Support Package
You can install the Image Processing Toolbox Model for Segment Anything Model from the Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. The support package also requires Deep Learning Toolbox™. Processing image data on a GPU requires a supported GPU device and Parallel Computing Toolbox™.
Apply Pretrained Segment Anything Model
Use this process to segment a test image using a pretrained SAM.
Load an image to segment into the workspace, and return the image size. The SAM supports only RGB images.
I = imread("peppers.png"); imageSize = size(I);
For best model performance, use an image with a data range of [0, 255], such as one with a
uint8
data type. If your input image has a larger data range, rescale the range of pixel values using therescale
function:I = 255.*rescale(I)
Create a
segmentAnythingModel
object to configure a pretrained SAM.model = segmentAnythingModel;
Extract the feature embeddings of your image by using the
extractEmbeddings
object function.embeddings = extractEmbeddings(model,I);
Specify the visual prompts as foreground and background point coordinates.
pointPrompt = [453 283; 496 288; 504 300]; backgroundPoints = [308 272; 348 176];
Segment the object defined by the foreground and background points in the image by using the
segmentObjectsFromEmbeddings
object function.masks = segmentObjectsFromEmbeddings(model,embeddings,imageSize, ... ForegroundPoints=pointPrompt,BackgroundPoints=backgroundPoints);
Overlay the detected object mask on the input image, and display the image with its object mask.
imMask = insertObjectMask(I,masks); imshow(imMask)
You can use this approach to segment multiple objects in an image, one a time, using an interactive user interface with SAM. For a detailed example, see Interactively Segment Image Using Segment Anything Model.
Refine Segmentation Results
To refine segmentation results, use the segmentObjectsFromEmbeddings
function on the same image, but provide the
mask logits of the object mask from the previous segmentation as an additional visual
prompt input by specifying them to the MaskLogits
name-value argument. The mask logits returned in the
maskLogits
argument of segmentObjectsFromEmbeddings
function are non-thresholded mask logits, instead of binary masks. If you specify SAM to
return multiple masks using the ReturnMultiMask
argument, the model returns the mask logits
corresponding to only the mask with the highest confidence score. The mask logits
refinement process enables you to iteratively tune your image segmentation.
This image shows segmentation masks predicted using a SAM at two stages, before and after refinement. For both stages, the same visual prompts have been specified as foreground and background points. The image in the second stage has been refined using the mask logits returned in the first stage of the model as an additional prompt.
Perform Downstream Tasks Using SAM
The pretrained model has a zero-shot response to any prompt at inference time, enabling you to solve downstream tasks by feeding suitable prompts to the model. You can apply this approach to perform edge detection, segment everything (object proposal generation), and segment detected objects (instance segmentation).
For example, you can employ a SAM in conjunction with an object detector to perform
instance segmentation. To do this, use the bounding box output of an object detector,
such as yoloxObjectDetector
(Computer Vision Toolbox), to specify the visual prompt name-value arguments to
the segmentObjectsFromEmbeddings
object function.
References
[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.
See Also
Apps
- Image Labeler (Computer Vision Toolbox) | Image Segmenter
Functions
Related Topics
- Interactively Segment Image Using Segment Anything Model
- Get Started with Image Preprocessing and Augmentation for Deep Learning (Computer Vision Toolbox)
- Getting Started with Image Segmenter
- Deep Learning in MATLAB (Deep Learning Toolbox)
- Data Sets for Deep Learning (Deep Learning Toolbox)