Main Content


Non-quantized ROI pooling of dlarray data

Since R2021b


The ROI align operation pools a rectangular ROI into fixed sized bins without quantizing the grid points to the nearest pixel. The function uses bilinear interpolation to infer the value at each grid point.

Given input data of size [H W C N], where C is the number of channels and N is the number of observations, the pooled deep learning data has size [h w C sum(M)], where h and w are the specified output size. M is a vector of length N and M(i) is the number of ROIs associated with the i-th observation.


To perform ROI pooling within a layerGraph (Deep Learning Toolbox) object or Layer (Deep Learning Toolbox) array, use roiAlignLayer.

This function requires Deep Learning Toolbox™.


dlY = roialign(dlX,boxes,outputSize) performs a pooling operation along the spatial dimensions of the input X for each bounding box in boxes. The outputs, Y, are of size outputSize.

dlY = roialign(dlX,boxes,outputSize,Name=Value) specifies additional name-value arguments.


collapse all

Create a 4-D formatted dlarray object that simulates a batch of two RGB images.

X = dlarray(rand(10,10,3,2),"SSCB");

Specify the position and batch index of one bounding box.

startXY = [2 2];
endXY = [4 4];
batchIdx = 1;
rois = [startXY endXY batchIdx]';

Perform ROI pooling with an output size of 3-by-3.

Y = roialign(X,rois,[3 3])
Y = 
  3(S) x 3(S) x 3(C) x 1(B) single dlarray

(:,:,1) =

    0.7464    0.3069    0.1780
    0.9212    0.8491    0.4677
    0.7303    0.9057    0.3840

(:,:,2) =

    0.3024    0.6428    0.6594
    0.1542    0.0046    0.1228
    0.6295    0.5182    0.3304

(:,:,3) =

    0.4915    0.7590    0.5035
    0.4574    0.4302    0.5453
    0.2960    0.2666    0.5389

Input Arguments

collapse all

Deep learning data to pool, specified as a 4-D formatted dlarray (Deep Learning Toolbox) object with a data format of "SSCB".

Bounding boxes, specified as a 5-by-N numeric matrix, where N is the number of bounding boxes. Each bounding box is formatted as a column vector of the form [x_start; y_start; x_end; y_end; batchIdx], where:

  • x_start and y_start specify the (x,y) coordinates of the upper-left corner of the rectangle.

  • x_end and y_end specify the (x,y) coordinates of the bottom-right corner of the rectangle.

  • batchIdx specifies the index of the observation corresponding to the rectangle.

By default, boxes are in the same coordinate space and scale as the input deep learning data dlX.

Pooled output size, specified as a vector of two positive integers [h w], where h is the height and w is the width.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: dlY = roialign(dlX,boxes,outputSize,ROIScale=2) scales the input ROIs by a factor of 2

Ratio of the scale of the input feature map to that of the ROI coordinates. This ratio specifies the factor used to scale input ROIs to the input feature map size.

Number of samples in each pooled bin, specified as "auto" or a row vector of two positive integers. The two elements are the number of vertical and horizontal samples, respectively.

If you do not specify the sampling ratio, then the number of vertical samples has the default value ceil(roiHeight/outputHeight). Likewise, the number of horizontal samples has the default value ceil(roiWidth/outputWidth).

Data Types: double | char

Output Arguments

collapse all

Pooled deep learning data, returned as a 4-D formatted dlarray (Deep Learning Toolbox) object with a data format of "SSCB".

More About

collapse all

ROI Align

An ROI align operation returns fixed size feature maps for every rectangular ROI within an input dlarray. The function first partitions an ROI into fixed sized bins of size OutputSize without quantizing the grid points. Each bin is further sampled at SamplingRatio locations. The value at each sampled point is inferred using bilinear interpolation. The average of the sampled values is returned as the output value of each pooled bin.

Extended Capabilities

Version History

Introduced in R2021b