Select Datastore for File Format or Application

A datastore is a repository for collections of data that are too large to fit in memory. Each file format and application uses a different type of datastore, which contains properties pertinent to the type of data or application that it supports. MATLAB® provides datastores for standard file formats such as Excel® files and datastores for specific applications such as Deep Learning. In addition to the existing datastores, if your data is in a proprietary format, then you can develop a customized datastore using the custom datastore framework.

Datastores for Standard File Formats

For a collection of data in standard file format use one of these options.

DatastoreDescription
TabularTextDatastore

Text files containing column-oriented data, including CSV files

SpreadsheetDatastore

Spreadsheet files with a supported Excel format such as .xlsx

ImageDatastore

Image files, including formats that are supported by imread such as JPEG and PNG

ParquetDatastoreParquet files containing column-oriented data
FileDatastore

Files with nonstandard file format

Requires a custom file reading function

Transform or combine existing datastores.

DatastoreDescription
CombinedDatastoreDatastore to combine data read from multiple underlying datastores
TransformedDatastoreDatastore to transform underlying datastore

Datastores to integrate with MapReduce and tall arrays.

DatastoreDescription
KeyValueDatastore

Key-value pair data that are inputs to or outputs of mapreduce

TallDatastore

Datastore for checkpointing tall arrays

Datastores for audio and database data require additional toolboxes.

DatastoreDescriptionToolbox Required
AudioDatastore

Datastore for collection of audio files

Audio Toolbox™
DatabaseDatastore

Datastore for collections of data in a relational database

Database Toolbox™

Datastores for Specific Applications

Based on your application use one of these datastores.

ApplicationDatastoreDescriptionToolbox Required

Simulink Model Data

SimulationDatastore

Datastore for simulation input and output data that you use with a Simulink® model

Simulink

Simulation Ensemble and Predictive Maintenance Data

SimulationEnsembleDatastore

Datastore to manage simulation ensemble data

Predictive Maintenance Toolbox™

FileEnsembleDatastore

Datastore to manage ensemble data in custom file format

Predictive Maintenance Toolbox

Measurement Data Format (MDF) Files

MDFDatastore

Datastore for collection of MDF files

Vehicle Network Toolbox™

MDFDatastore

Datastore for collection of MDF files

Powertrain Blockset™

Deep Learning

Datastores for preprocessing image or sequence data

PixelLabelDatastore

Datastore for pixel label data

Computer Vision Toolbox™ and Deep Learning Toolbox™

PixelLabelImageDatastore

Datastore for training semantic segmentation networks

Datastore is nondeterministic

Computer Vision Toolbox and Deep Learning Toolbox

boxLabelDatastore

Datastore for bounding box label data

Computer Vision Toolbox and Deep Learning Toolbox

RandomPatchExtractionDatastore

Datastore for extracting random patches from images or pixel label images

Datastore is nondeterministic

Image Processing Toolbox™ and Deep Learning Toolbox

DenoisingImageDatastore

Datastore to train an image denoising deep neural network

Datastore is nondeterministic

Image Processing Toolbox and Deep Learning Toolbox

AugmentedImageDatastore

Datastore for resizing and augmenting training images

Datastore is nondeterministic

Deep Learning Toolbox

Custom File Formats

For a collection of data in a custom file format, if each individual file fits in memory, use FileDatastore along with your custom file reading function. Otherwise, develop your own fully customized datastore for custom or proprietary data using the matlab.io.Datastore class. See Develop Custom Datastore.

Nondeterministic Datastores

Datastores that do not return the exact same data for a call to the read function after a call to the reset function are nondeterministic datastores. Do not use nondeterministic datastores with tall arrays, mapreduce, or any other code that requires reading the data more than once.

Some applications require data that is randomly augmented or transformed. For example, the AugmentedImageDatastore datastore, from the deep learning application augments training image data with randomized preprocessing operations to help prevent the network from overfitting and memorizing the exact details of the training images. The output of this datastore is different every time you perform a read operation after a call to reset.

See Also

| | | | |

Related Topics