A minibatch datastore is an implementation of a datastore with support for reading data in batches. You can use a minibatch datastore as a source of training, validation, test, and prediction data sets for deep learning applications that use Deep Learning Toolbox™.
To preprocess sequence, time series, or text data, build your own minibatch datastore using the framework described here. For an example showing how to use a custom minibatch datastore, see Train Network Using Custom MiniBatch Datastore for Sequence Data.
Build your custom datastore interface using the custom datastore classes and objects. Then, use the custom datastore to bring your data into MATLAB^{®}.
Designing your custom minibatch datastore involves inheriting from the matlab.io.Datastore
and matlab.io.datastore.MiniBatchable
classes, and implementing the required
properties and methods. You optionally can add support for shuffling during
training.
Processing Needs  Classes 

Minibatch datastore for training, validation, test, and prediction data sets in Deep Learning Toolbox  
Minibatch datastore with support for shuffling during training 

MiniBatchable
DatastoreTo implement a custom minibatch datastore named MyDatastore
, create
a script MyDatastore.m
. The script must be on the MATLAB path and should contain code that inherits from the appropriate class and
defines the required methods. The code for creating a minibatch datastore for training,
validation, test, and prediction data sets in Deep Learning Toolbox must:
Inherit from the classes matlab.io.Datastore
and matlab.io.datastore.MiniBatchable
.
Define these properties: MiniBatchSize
and
NumObservations
.
In addition to these steps, you can define any other properties or methods that you need to process and analyze your data.
Note
If you are training a network and trainingOptions
specifies 'Shuffle'
as
'once'
or 'everyepoch'
, then you must also
inherit from the matlab.io.datastore.Shuffleable
class. For more
information, see Add Support for Shuffling.
The datastore read function must return data in a table. The table elements must be scalars, row vectors, or 1by1 cell arrays containing a numeric array.
For networks with a single input layer, the first and second columns specify the predictors and responses, respectively.
Tip
To use a datastore for networks with multiple input layers, use the
combine
and transform
functions to create a
datastore that outputs a cell array with (numInputs
+ 1) columns, where
numInputs
is the number of network inputs. In this case, the first
numInputs
columns specify the predictors for each input and the last
column specifies the responses. The order of inputs is given by the
InputNames
property of the layer graph
layers
.
The format of the predictors depend on the type of data.
Data  Format of Predictors 

2D image  hbywbyc numeric array, where h, w, and c are the height, width, and number of channels of the image, respectively. 
3D image  hbywbydbyc numeric array, where h, w, d, and c are the height, width, depth, and number of channels of the image, respectively. 
Vector sequence  cbys matrix, where c is the number of features of the sequence and s is the sequence length. 
2D image sequence  hbywbycbys array, where h, w, and c correspond to the height, width, and number of channels of the image, respectively, and s is the sequence length. Each sequence in the minibatch must have the same sequence length. 
3D image sequence  hbywbydbycbys array, where h, w, d, and c correspond to the height, width, depth, and number of channels of the image, respectively, and s is the sequence length. Each sequence in the minibatch must have the same sequence length. 
Features  cby1 column vector, where c is the number of features. 
The table elements must contain a numeric scalar, a numeric row vector, or a 1by1 cell array containing a numeric array.
The trainNetwork
function does not support networks with multiple
sequence input layers.
The format of the responses depend on the type of task.
Task  Format of Responses 

Classification  Categorical scalar 
Regression 

Sequencetosequence classification  1bys sequence of categorical labels, where s is the sequence length of the corresponding predictor sequence. 
Sequencetosequence regression  Rbys matrix, where R is the number of responses and s is the sequence length of the corresponding predictor sequence. 
The table elements must contain a categorical scalar, a numeric scalar, a numeric row vector, or a 1by1 cell array containing a numeric array.
This example shows how to create a custom minibatch datastore for processing sequence
data. Save the script in a file called MySequenceDatastore.m
.
Steps  Implementation 

 classdef MySequenceDatastore < matlab.io.Datastore & ... matlab.io.datastore.MiniBatchable properties Datastore Labels NumClasses SequenceDimension MiniBatchSize end properties(SetAccess = protected) NumObservations end properties(Access = private) % This property is inherited from Datastore CurrentFileIndex end methods function ds = MySequenceDatastore(folder) % Construct a MySequenceDatastore object % Create a file datastore. The readSequence function is % defined following the class definition. fds = fileDatastore(folder, ... 'ReadFcn',@readSequence, ... 'IncludeSubfolders',true); ds.Datastore = fds; % Read labels from folder names numObservations = numel(fds.Files); for i = 1:numObservations file = fds.Files{i}; filepath = fileparts(file); [~,label] = fileparts(filepath); labels{i,1} = label; end ds.Labels = categorical(labels); ds.NumClasses = numel(unique(labels)); % Determine sequence dimension. When you define the LSTM % network architecture, you can use this property to % specify the input size of the sequenceInputLayer. X = preview(fds); ds.SequenceDimension = size(X,1); % Initialize datastore properties. ds.MiniBatchSize = 128; ds.NumObservations = numObservations; ds.CurrentFileIndex = 1; end function tf = hasdata(ds) % Return true if more data is available tf = ds.CurrentFileIndex + ds.MiniBatchSize  1 ... <= ds.NumObservations; end function [data,info] = read(ds) % Read one minibatch batch of data miniBatchSize = ds.MiniBatchSize; info = struct; for i = 1:miniBatchSize predictors{i,1} = read(ds.Datastore); responses(i,1) = ds.Labels(ds.CurrentFileIndex); ds.CurrentFileIndex = ds.CurrentFileIndex + 1; end data = preprocessData(ds,predictors,responses); end function data = preprocessData(ds,predictors,responses) % data = preprocessData(ds,predictors,responses) preprocesses % the data in predictors and responses and returns the table % data miniBatchSize = ds.MiniBatchSize; % Pad data to length of longest sequence. sequenceLengths = cellfun(@(X) size(X,2),predictors); maxSequenceLength = max(sequenceLengths); for i = 1:miniBatchSize X = predictors{i}; % Pad sequence with zeros. if size(X,2) < maxSequenceLength X(:,maxSequenceLength) = 0; end predictors{i} = X; end % Return data as a table. data = table(predictors,responses); end function reset(ds) % Reset to the start of the data reset(ds.Datastore); ds.CurrentFileIndex = 1; end end methods (Hidden = true) function frac = progress(ds) % Determine percentage of data read from datastore frac = (ds.CurrentFileIndex  1) / ds.NumObservations; end end end % end class definition readSequence . You must create this
function to read sequence data from a
MATfile.function data = readSequence(filename) % data = readSequence(filename) reads the sequence X from the MATfile % filename S = load(filename); data = S.X; end 
To add support for shuffling, first follow the instructions in Implement MiniBatchable Datastore and then update your
implementation code in MySequenceDatastore.m
to:
Inherit from an additional class matlab.io.datastore.Shuffleable
.
Define the additional method shuffle
.
This example code adds shuffling support to the MySequenceDatastore
class. Vertical ellipses indicate where you should
copy code from the MySequenceDatastore
implementation.
Steps  Implementation 


classdef MySequenceDatastore < matlab.io.Datastore & ... matlab.io.datastore.MiniBatchable & ... matlab.io.datastore.Shuffleable % previously defined properties . . . methods % previously defined methods . . . function dsNew = shuffle(ds) % dsNew = shuffle(ds) shuffles the files and the % corresponding labels in the datastore. % Create a copy of datastore dsNew = copy(ds); dsNew.Datastore = copy(ds.Datastore); fds = dsNew.Datastore; % Shuffle files and corresponding labels numObservations = dsNew.NumObservations; idx = randperm(numObservations); fds.Files = fds.Files(idx); dsNew.Labels = dsNew.Labels(idx); end end end 
If you have followed all the instructions presented here, then the implementation of your custom minibatch datastore is complete. Before using this datastore, qualify it using the guidelines presented in Testing Guidelines for Custom Datastores.