Main Content

Deep Learning Data Formats

Most deep learning networks and functions operate on different dimensions of the input data in different ways.

For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.

Data can have many different types of layouts:

  • Data can have different numbers of dimensions. For example, you can represent image and video data as 4-D and 5-D arrays, respectively.

  • Dimensions of data can represent different things. For example, image data has two spatial dimensions, one channel dimension, and one batch dimension.

  • Data can have dimensions in multiple permutations. For example, you can represent a batch of sequences as a 3-D array with dimensions corresponding to channels, time steps, and observations. These dimensions can be in any order.

To ensure that the software operates on the correct dimensions, you can provide data layout information in different ways:

OptionScenarioUsage

Provide data with dimensions in a specific permutation

Network with an input layer and the data has the required layout.

Pass data directly to network or function.

Provide data with labeled dimensions

Network with an input layer and the data does not have the required layout.

Create a formatted dlarray object using the fmt argument.

Deep learning model defined as a function that uses multiple deep learning operations.

Custom layer that uses multiple deep learning operations.

Create layer that inherits from nnet.layer.Formattable.

Provide data with additional layout information

Deep learning functions that require layout information, and you want to preserve the layout of the data.

Specify layout information using the appropriate input argument. For example, the DataFormat argument of the lstm function.

Model functions where dimensions change between functions. For example, when one function must treat the third dimension as time, and a second function must treat the third dimension as spatial.

To provide input data with labeled dimensions or additional layout information, you can use data formats.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

  • "S" — Spatial

  • "C" — Channel

  • "B" — Batch

  • "T" — Time

  • "U" — Unspecified

For example, consider an array containing a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can specify that this array has the format "CBT" (channel, batch, time).

For dlnetwork objects with input layers, or when you use the trainnet function, if your data already has the layout required by the network, then the easiest option is usually to provide input data with the dimensions in the permutation that the network requires. In this case, you can input your data directly and not specify layout information. The required format depends on the type of input layer.

LayerFormat
Feature input layer"BC"
2-D image input layer"SSCB"
3-D image input layer"SSSCB"
Sequence input layer"TCB" (vector sequences)
"SCBT" (1-D image sequences)
"SSCBT" (2-D image sequences)
"SSSCBT" (3-D image sequences)

When your data has a different layout, providing formatted data or data format information can be easier than reshaping and preprocessing your data. For example, if you have sequence data, where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively, then you can specify the string "CBT" instead of permuting and preprocessing the data to have the layout required by the software.

To create formatted input data, create a dlarray object and specify the format using the fmt argument. For example, for an array X that represents a batch of sequences, where the first, second, and third dimension correspond to channels, observations, and time steps, respectively, specify:

X = dlarray(X,"CBT");

Note

When you create a formatted dlarray object, the software automatically permutes the dimensions such that the format has dimensions in this order:

  • "S"

  • "C"

  • "B"

  • "T"

  • "U"

For example, if you specify a format of "TCB" (time, channel, batch), then the software automatically permutes the dimensions so that it has format "CBT" (channel, batch, time).

To provide additional layout information with unformatted data to deep learning operations, specify the formats using the appropriate input argument of the function. For example, to apply the dlconv operation to an unformatted dlarray object X, that represents a batch of images, where the first two dimensions correspond to the spatial dimensions and the third and forth dimensions correspond to the channel and batch dimensions, respectively, specify:

Y = dlconv(X,weights,bias,DataFormat="SSCB");

To view the layout information of dlarray objects, use the dims function. To view the layout information of layer outputs, use the analyzeNetwork function.

See Also

| | |

Related Topics