Datastores for Deep Learning
Datastores in MATLAB® are a convenient way of working with and representing collections of data that are too large to fit in memory at one time. Because deep learning often requires large amounts of data, datastores are an important part of the deep learning workflow in MATLAB.
Select Datastore
For many applications, the easiest approach is to start with a built-in datastore. For more information about the available built-in datastores, see Select Datastore for File Format or Application. However, only some types of built-in datastores can be used directly as input for network training, validation, and inference. These datastores are:
Datastore | Description | Additional Toolbox Required |
---|---|---|
ImageDatastore | Datastore for image data | none |
AugmentedImageDatastore | Datastore for resizing and augmenting training images Datastore is nondeterministic | none |
PixelLabelDatastore (Computer Vision Toolbox) | Datastore for pixel label data | Computer Vision Toolbox™ |
boxLabelDatastore (Computer Vision Toolbox) | Datastore for bounding box label data | Computer Vision Toolbox |
RandomPatchExtractionDatastore (Image Processing Toolbox) | Datastore for extracting random patches from image-based data Datastore is nondeterministic | Image Processing Toolbox™ |
blockedImageDatastore (Image Processing Toolbox) | Datastore for blockwise reading and processing of image data, including large images that do not fit in memory | Image Processing Toolbox |
blockedPointCloudDatastore (Lidar Toolbox) | Datastore for blockwise reading and processing of point cloud data, including large point clouds that do not fit in memory | Lidar Toolbox™ |
DenoisingImageDatastore (Image Processing Toolbox) | Datastore to train an image denoising deep neural network Datastore is nondeterministic | Image Processing Toolbox |
audioDatastore (Audio Toolbox) | Datastore for audio data | Audio Toolbox™ |
signalDatastore (Signal Processing Toolbox) | Datastore for signal data | Signal Processing Toolbox™ |
Other built-in datastores can be used as input for deep learning, but the data read from these datastores must be preprocessed into a format required by a deep learning network. For more information on the required format of read data, see Input Datastore for Training, Validation, and Inference. For more information on how to preprocess data read from datastores, see Transform and Combine Datastores.
For some applications, there may not be a built-in datastore type that fits your data
well. For these problems, you can create a custom datastore. For more information, see
Develop Custom Datastore. All custom datastores are valid inputs to deep
learning interfaces as long as the read
function of the custom
datastore returns data in the required form.
Input Datastore for Training, Validation, and Inference
Datastores are valid inputs in Deep Learning Toolbox™ for training, validation, and inference.
Training and Validation
You can use an image datastore or other types of datastore as a source of training
data when training using the trainnet
function. To use a datastore for validation, use the
name-value argument in the
ValidationData
trainingOptions
function.
Most built-in datastores output data in the layout that the network expects. If
you are training your network using the trainnet
function and
your data is in a different layout to what the network expects, then indicate that
your data has a different layout by using the InputDataFormats
option of the trainingOptions
function. It is usually
easiest to adjust the InputDataFormats
option than to
preprocess the input data.
To be a valid input for training or validation, the read
function of a datastore must
return data as either a cell array or a table (with the exception of
ImageDatastore
objects which can output numeric arrays and
custom mini-batch datastores which must output tables).
For networks with a single input, the table or cell array returned by the
datastore must have two columns. The first column of data represents inputs to the
network and the second column of data represents responses. Each row of data
represents a separate observation. For ImageDatastore
only,
trainnet
and trainingOptions
support
data returned as integer arrays and single-column cell array of integer
arrays.
To use a datastore for networks with multiple input layers or multiple outputs, use the
combine
and transform
functions to create a
datastore that outputs a cell array with (numInputs
+
numOutputs
) columns, where numInputs
is the number
of network inputs and numOutputs
is the number of network outputs. In
this case, the first numInputs
columns specify the predictors for each
input and the last numOutputs
columns specify the responses. The order of
inputs and outputs are given by the InputNames
and
OutputNames
properties of the neural network respectively.
The following table shows example outputs of calling the read
function for datastore ds
.
Neural Network Architecture | Datastore Output | Example Output |
---|---|---|
Single input layer and single output | Table or cell array with two columns. The first and second columns specify the predictors and targets, respectively. Table elements must be scalars, row vectors, or 1-by-1 cell arrays containing a numeric array. Custom mini-batch datastores must output tables. | Table for neural network with one input and one output: data = read(ds) data = 4×2 table Predictors Response __________________ ________ {224×224×3 double} 2 {224×224×3 double} 7 {224×224×3 double} 9 {224×224×3 double} 9 |
Cell array for neural network with one input and one output: data = read(ds) data = 4×2 cell array {224×224×3 double} {[2]} {224×224×3 double} {[7]} {224×224×3 double} {[9]} {224×224×3 double} {[9]} | ||
Multiple input layers or multiple outputs | Cell array with ( The first The order of inputs and outputs are
given by the | Cell array for neural network with two inputs and two outputs. data = read(ds) data = 4×4 cell array {224×224×3 double} {128×128×3 double} {[2]} {[-42]} {224×224×3 double} {128×128×3 double} {[2]} {[-15]} {224×224×3 double} {128×128×3 double} {[9]} {[-24]} {224×224×3 double} {128×128×3 double} {[9]} {[-44]} |
The format of the predictors depend on the type of data.
Data | Format of Predictors |
---|---|
2-D image | h-by-w-by-c numeric array, where h, w, and c are the height, width, and number of channels of the image, respectively. |
3-D image | h-by-w-by-d-by-c numeric array, where h, w, d, and c are the height, width, depth, and number of channels of the image, respectively. |
Vector sequence | s-by-c matrix, where s is the sequence length and c is the number of features of the sequence. |
1-D image sequence | h-by-c-by-s array, where h and c correspond to the height and number of channels of the image, respectively, and s is the sequence length. Each sequence in the mini-batch must have the same sequence length. |
2-D image sequence | h-by-w-by-c-by-s array, where h, w, and c correspond to the height, width, and number of channels of the image, respectively, and s is the sequence length. Each sequence in the mini-batch must have the same sequence length. |
3-D image sequence | h-by-w-by-d-by-c-by-s array, where h, w, d, and c correspond to the height, width, depth, and number of channels of the image, respectively, and s is the sequence length. Each sequence in the mini-batch must have the same sequence length. |
Features | c-by-1 column vector, where c is the number of features. |
For predictors returned in tables, the elements must contain a numeric scalar, a numeric row vector, or a 1-by-1 cell array containing a numeric array.
Most loss functions that you can use when you train a network using the
trainnet
function expect these data layouts for
targets:
Target | Target Layout |
---|---|
Categorical labels | Categorical scalar. |
Sequences of categorical labels | t-by-1 categorical vector, where t is the number of time steps. |
Binary labels (single label) | Numeric scalar |
Binary labels (mutlilabel) | 1-by-c vector, where c is the numbers of classes, respectively. |
Numeric scalars | Numeric scalar |
Numeric vectors | 1-by-R vector, where R is the number of responses. |
2-D images | h-by-w-by-c numeric array, where h, w, and c are the height, width, and number of channels of the images, respectively. |
3-D images | h-by-w-by-d-by-c numeric array, where h, w, d, and c are the height, width, depth, and number of channels of the images, respectively. |
Numeric sequences of scalars | t-by-1 vector, where t is the numbers of time steps. |
Numeric sequences of vectors | t-by-c array, where t, and c are the numbers of time steps and channels, respectively. |
Sequences of 1-D images | h-by-c-by-t array, where h, c, and t are the height, number of channels, and number of numbers of time steps of the sequences, respectively. |
Sequences of 2-D images | h-by-w-by-c-by-t array, where h, w, c, and t are the height, width, number of channels, and number of numbers of time steps of the sequences, respectively. |
Sequences of 2-D images | h-by-w-by-d-by-c-by-t array, where h, w, d, c, and t are the height, width, depth, number of channels, and number of numbers of time steps of the sequences, respectively. |
For more information, see Deep Learning Data Formats.
For responses returned in tables, the elements must be a categorical scalar, a numeric scalar, a numeric row vector, or a 1-by-1 cell array containing a numeric array.
Prediction
For inference using the minibatchpredict
function, a datastore is only required to yield the
columns corresponding to the predictors. The inference functions use the first
NumInputs
columns and ignores the subsequent layers, where
NumInputs
is the number of network input layers.
Specify Read Size and Mini-Batch Size
A datastore may return any number of rows (observations) for each call to read
. Functions such as trainnet
and minibatchpredict
that accept datastores and support specifying a
MiniBatchSize
call read
as many times as is
necessary to form complete mini-batches of data. As these functions form mini-batches,
they use internal queues in memory to store read data. For example, if a datastore
consistently returns 64 rows per call to read
and
MiniBatchSize
is 128
, then to form each
mini-batch of data requires two calls to read
.
For best runtime performance, it is recommended to configure datastores such that the
number of observations returned by read
is equal to the
MiniBatchSize
. For datastores that have a
ReadSize
property, set the ReadSize
to change
the number of observations returned by the datastore for each call to
read
.
Transform and Combine Datastores
Deep learning frequently requires the data to be preprocessed and augmented before
data is in an appropriate form to input to a network. The transform
and combine
functions of datastore are useful in preparing data to be fed into a network.
To use a datastore for networks with multiple input layers or multiple outputs, use the
combine
and transform
functions to create a
datastore that outputs a cell array with (numInputs
+
numOutputs
) columns, where numInputs
is the number
of network inputs and numOutputs
is the number of network outputs. In
this case, the first numInputs
columns specify the predictors for each
input and the last numOutputs
columns specify the responses. The order of
inputs and outputs are given by the InputNames
and
OutputNames
properties of the neural network respectively.
Transform Datastores
A transformed datastore applies a particular data transformation to an underlying
datastore when reading data. To create a transformed datastore, use the transform
function and specify the underlying datastore and the transformation.
For complex transformations involving several preprocessing operations, define the complete set of transformations in your own function. Then, specify a handle to your function as the
@fcn
argument oftransform
. For more information, see Create Functions in Files.For simple transformations that can be expressed in one line of code, you can specify a handle to an anonymous function as the
@fcn
argument oftransform
. For more information, see Anonymous Functions.
The function handle provided to transform
must
accept input data in the same format as returned by the read
function of the underlying datastore.
Example: Transform Image Datastore to Train Digit Classification Network
Combine Datastores
The combine
function associates multiple datastores.
Operating on the resulting CombinedDatastore
,
such as resetting the datastore, performs the same operation on all of the
underlying datastores. Calling the read
function of a combined
datastore reads one batch of data from all of the N underlying
datastores, which must return the same number of observations. Reading from a
combined datastore returns the horizontally concatenated results in an
N-column cell array that is suitable for training and
validation. Shuffling a combined datastore results in an identical randomized
ordering of files in the underlying datastores.
For example, if you are training an image-to-image regression network, then you
can create the training data set by combining two image datastores. This sample code
demonstrates combining two image datastores named imdsX
and
imdsY
. The combined datastore imdsTrain
returns data as a two-column cell array.
imdsX = imageDatastore(___); imdsY = imageDatastore(___); imdsTrain = combine(imdsX,imdsY)
imdsTrain = CombinedDatastore with properties: UnderlyingDatastores: {1×2 cell}
If you have Image Processing Toolbox, then the randomPatchExtractionDatastore
(Image Processing Toolbox) provides an alternate solution to
associating image-based data in ImageDatastore
,
PixelLabelDatastore
, and
TransformedDatastore
objects. A
randomPatchExtractionDatastore
has several advantages over
associating data using the combine
function. Specifically, a
random patch extraction datastore:
Provides an easy way to extract patches from both 2-D and 3-D data without requiring you to implement a custom cropping operation using
transform
andcombine
Provides an easy way to generate multiple patches per image per mini-batch without requiring you to define a custom concatenation operation using
transform
.Supports efficient conversion between categorical and numeric data when applying image transforms to categorical data
Supports parallel training
Improves performance by caching images
Use Datastore for Parallel Training and Background Preprocessing
Parallel Training
Specify parallel or multi-GPU training using the ExecutionEnvironment
name-value argument of trainingOptions
. Training in parallel or using single or multiple
GPUs requires Parallel Computing Toolbox™.
Many built-in datastores already support parallel and multi-GPU training. Using
the transform
and combine
functions with
built-in datastores frequently maintains support for parallel and multi-GPU
training.
If you need to create a custom datastore that supports parallel or multi-GPU
training, your datastore should implement the matlab.io.datastore.Subsettable
class.
To use a datastore for parallel training or multi-GPU training, it must be
subsettable or partitionable. To determine if a datastore is subsettable or
partitionable, use the functions isSubsettable
and isPartitionable
respectively.
When training in parallel, datastores do not support specifying the Shuffle
name-value
argument of trainingOptions
as
"never"
.
Preprocess Data in the Background or in Parallel
To speed up training, you can fetch and preprocess training data from a datastore in the background or in parallel during training.
As shown in the following diagram, fetching, preprocessing, and performing training computations in serial can result in downtime where your GPU (or other hardware) utilization is low. Using the background pool or parallel workers to fetch and preprocess the next batch of training data while your GPU is processing the current batch can increase hardware utilization, resulting in faster training. Use background or parallel preprocessing if your training data requires significant preprocessing, for example if you are manipulating large images.
To preprocess data in the background or in parallel, do one of the following:
For built-in training, set the
PreprocessingEnvironment
option to"background"
or"parallel"
using thetrainingOptions
function.For custom training loops, set the
PreprocessingEnvironment
property of yourminibatchqueue
to"background"
or"parallel"
.
Setting the PreprocessingEnvironment
option to "parallel"
is supported for local parallel pools only and requires Parallel Computing Toolbox.
To use the "background"
or "parallel"
options, the input datastore must be subsettable or partitionable. Custom datastores must implement the matlab.io.datastore.Subsettable
class.
See Also
transform
| combine
| read
| trainnet
| trainingOptions
| dlnetwork
Related Examples
- Prepare Datastore for Image-to-Image Regression
- Classify Text Data Using Convolutional Neural Network