dlconv

Deep learning convolution

Description

The convolution operation applies sliding filters to the input data. Use 1-D and 2-D filters with ungrouped or grouped convolutions and 3-D filters with ungrouped convolutions.

Use grouped convolution for channel-wise separable (also known as depth-wise separable) convolution. For each group, the operation convolves the input by moving filters along spatial dimensions of the input data, computing the dot product of the weights and the data and adding a bias. If the number of groups is equal to the number of channels, then this function performs channel-wise convolution. If the number of groups is equal to 1, this function performs ungrouped convolution.

Note

This function applies the deep learning convolution operation to dlarray data. If you want to apply convolution within a layerGraph object or Layer array, use one of the following layers:

example

dlY = dlconv(dlX,weights,bias) computes the deep learning convolution of the input dlX using sliding convolutional filters defined by weights, and adds a constant bias. The input dlX is a formatted dlarray with dimension labels. Convolution acts on dimensions that you specify as 'S' dimensions. The output dlY is a formatted dlarray with the same dimension labels as dlX.

example

dlY = dlconv(dlX,weights,bias,'DataFormat',FMT) also specifies dimension label format FMT when dlX is not a formatted dlarray. The output dlY is an unformatted dlarray with the same dimension order as dlX.

example

dlY = dlconv(___Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'Stride',3 sets the stride of the convolution operation.

Examples

collapse all

Convolve all channels of an image input using a single filter.

Import the image data and convert it to a dlarray.

X = imread('sherlock.jpg');
dlX = dlarray(single(X),'SSC');

Display the image.

imshow(X,'DisplayRange',[])

Initialize the convolutional filters. Specify an ungrouped convolution that applies a single filter to all three channels of the input data.

filterHeight = 10;
filterWidth = 10;
numChannelsPerGroup = 3;
numFiltersPerGroup = 1;
numGroups = 1;

weights = rand(filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups);

Initialize the bias term.

bias = rand(numFiltersPerGroup*numGroups,1);

Perform the convolution. Use a 'Stride' value of 2 and a 'DilationFactor' value of 2.

dlY = dlconv(dlX,weights,bias,'Stride',2,'DilationFactor',2);

Display the convolved image.

Y = extractdata(dlY);
imshow(Y,'DisplayRange',[])

Convolve the input data in three groups of two channels each. Apply four filters per group.

Create the input data as ten observations of size 100-by-100 with six channels.

height = 100;
width = 100;
channels = 6;
numObservations = 10;

X = rand(height,width,channels,numObservations);
dlX = dlarray(X,'SSCB');

Initialize the convolutional filters. Specify three groups of convolutions that each apply four convolution filters to two channels of the input data.

filterHeight = 8;
filterWidth = 8;
numChannelsPerGroup = 2;
numFiltersPerGroup = 4;
numGroups = 3;

weights = rand(filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups);

Initialize the bias term.

bias = rand(numFiltersPerGroup*numGroups,1);

Perform the convolution.

dlY = dlconv(dlX,weights,bias);
size(dlY)
dims(dlY)
ans = 1×4    
    93    93    12    10
ans = 'SSCB'

The 12 channels of the convolution output represent the three groups of convolutions with four filters per group.

Separate the input data into channels and perform convolution on each channel separately.

Create the input data as a single observation with a size of 64-by-64 and ten channels. Create the data as an unformatted dlarray.

height = 64;
width = 64;
channels = 10;

X = rand(height,width,channels);
dlX = dlarray(X);

Initialize the convolutional filters. Specify an ungrouped convolution that applies a single convolution to all three channels of the input data.

filterHeight = 8;
filterWidth = 8;
numChannelsPerGroup = 1;
numFiltersPerGroup = 1;
numGroups = channels;

weights = rand(filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups);

Initialize the bias term.

bias = rand(numFiltersPerGroup*numGroups,1);

Perform the convolution. Specify the dimension labels of the input data using the 'DataFormat' option.

dlY = dlconv(dlX,weights,bias,'DataFormat','SSC');
size(dlY)
ans = 1×3    
    57    57    10

Each channel is convolved separately, so there are ten channels in the output.

Input Arguments

collapse all

Input data, specified as a dlarray with or without dimension labels or a numeric array. When dlX is not a formatted dlarray, you must specify the dimension label format using 'DataFormat',FMT. If dlX is a numeric array, at least one of weights or bias must be a dlarray.

Convolution acts on dimensions that you specify as spatial dimensions using the 'S' dimension label. You can specify up to three dimensions in dlX as 'S' dimensions.

Data Types: single | double

Convolutional filters, specified as a dlarray with or without labels or a numeric array. The weights argument specifies the size and values of the filters, as well as the number of filters and the number of groups for grouped convolutions.

Specify weights as a filterSize-by-numChannelsPerGroup-by-numFiltersPerGroup-by-numGroups array.

  • filterSize — Size of the convolutional filters. filterSize can have up to three dimensions, depending on the number of spatial dimensions in the input data.

    Input Data 'S' DimensionsfilterSize
    1-Dh, where h corresponds to the height of the filter
    2-D h-by-w, where h and w correspond to the height and width of the filter, respectively
    3-Dh-by-w-by-d, where h, w, and d correspond to the height, width, and depth of the filter, respectively

  • numChannelsPerGroup — Number of channels to convolve within each group. numChannelsPerGroup must equal the number of channels in the input data divided by numGroups, the number of groups. For ungrouped convolutions, where numGroups = 1, numChannelsPerGroup must equal the number of channels in the input data.

  • numFiltersPerGroup — Number of filters to apply within each group.

  • numGroups — Number of groups (optional). When numGroups > 1, the function performs grouped convolutions. Grouped convolutions are not supported for input data with more than two 'S' dimensions. When numGroups = 1, the function performs ungrouped convolutions; in this case, this dimension is singleton and can be omitted.

If weights is a formatted dlarray, it can have multiple spatial dimensions labeled 'S', one channel dimension labeled 'C', and up to two other dimensions labeled 'U'. The number of 'S' dimensions must match the number of 'S' dimensions of the input data. The labeled dimensions correspond to the filter specifications as follows.

Filter SpecificationDimension Labels
filterSizeUp to three 'S' dimensions
numChannelsPerGroup'C' dimension
numFiltersPerGroupFirst 'U' dimension
numGroups (optional)Second 'U' dimension

Data Types: single | double

Bias constant, specified as a dlarray vector or dlarray scalar with or without labels, a numeric vector, or a numeric scalar.

  • If bias is a scalar or has only singleton dimensions, the same bias is applied to each output.

  • If bias has a nonsingleton dimension, each element of bias is the bias applied to the corresponding convolutional filter specified by weights. The number of elements of bias must match the number of filters specified by weights.

  • If bias is a scalar numeric array with value 0, the bias term is disabled and no bias is added during the convolution operation.

If bias is a formatted dlarray, the nonsingleton dimension must be a channel dimension labeled 'C'.

Data Types: single | double

Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character array or string that provides a label for each dimension of the data. Each character in FMT must be one of the following:

  • 'S' — Spatial

  • 'C' — Channel

  • 'B' — Batch (for example, samples and observations)

  • 'T' — Time (for example, sequences)

  • 'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', or 'T' at most once.

You must specify 'DataFormat' when the input data dlX is an unformatted dlarray.

Example: 'DataFormat','SSCB'

Data Types: char | string

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'DilationFactor',2 sets the dilation factor for each convolutional filter to 2.

Step size for traversing the input data, specified as the comma-separated pair consisting of 'Stride' and a numeric scalar or numeric vector. If you specify 'Stride' as a scalar, the same value is used for all spatial dimensions. If you specify 'Stride' as a vector of the same size as the number of spatial dimensions of the input data, the vector values are used for the corresponding spatial dimensions.

The default value of 'Stride' is 1.

Example: 'Stride',3

Data Types: single | double

Filter dilation factor, specified as the comma-separated pair consisting of 'DilationFactor' and one of the following.

  • Numeric scalar — The same dilation factor value is applied for all spatial dimensions.

  • Numeric vector — A different dilation factor value is applied along each spatial dimension. Use a vector of size d, where d is the number of spatial dimensions of the input data. The ith element of the vector specifies the dilation factor applied to the ith spatial dimension.

Use the dilation factor to increase the receptive field of the filter (the area of the input that the filter can see) on the input data. Using a dilation factor corresponds to an effective filter size of filterSize + (filterSize-1)*(dilationFactor-1).

Example: 'DilationFactor',2

Data Types: single | double

Padding applied to edges of data, specified as the comma-separated pair consisting of 'Padding' and one of the following:

  • 'same' — Padding is set so that the output size is the same as the input size when the stride is 1. More generally, the output size of each spatial dimension is ceil(inputSize/stride), where inputSize is the size of the input along a spatial dimension.

  • Numeric scalar — The same padding value is applied to both ends of all spatial dimensions.

  • Numeric vector — A different padding value is applied along each spatial dimension. Use a vector of size d, where d is the number of spatial dimensions of the input data. The ith element of the vector specifies the padding applied to the start and the end along the ith spatial dimension.

  • Numeric matrix — A different padding value is applied to the start and end of each spatial dimension. Use a matrix of size 2-by-d, where d is the number of spatial dimensions of the input data. The element (1,d) specifies the padding applied to the start of spatial dimension d. The element (2,d) specifies the padding applied to the end of spatial dimension d. For example, in 2-D, the format is [top, left; bottom, right].

The default value of 'Padding' is 0.

Example: 'Padding','same'

Data Types: single | double

Output Arguments

collapse all

Convolved feature map, returned as a dlarray. The output dlY has the same underlying data type as the input dlX.

If the input data dlX is a formatted dlarray, dlY has the same dimension labels as dlX. If the input data is not a formatted dlarray, dlY is an unformatted dlarray with the same dimension order as the input data.

The size of the 'C' channel dimension of dlY depends on the size of the weights input. The size of the 'C' dimension of output Y is the product of the size of the dimensions numFiltersPerGroup and numGroups in the weights argument. If weights is a formatted dlarray, this product is the same as the product of the size of the 'U' dimensions.

More About

collapse all

Deep Learning Convolution

The dlconv function applies sliding convolution filters to the spatial dimensions of the input data. The dlconv function supports convolution in one, two, or three spatial dimensions. For more information, see the definition of convolutional layer on the convolution2dLayer reference page.

Extended Capabilities

Introduced in R2019b