This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Preprocess Images for Deep Learning

Training a network and making predictions on new data require images that match the input size of the network. Depending on the format of your data, you can use imresize or augmentedImageDatastore to resize images to the required size.

You can apply affine geometric transformations to images to augment training, validation, test, and prediction data sets. Augmenting training images helps to prevent the network from overfitting and memorizing the exact details of the training images.

For more advanced preprocessing, you can start with a built-in datastore that performs specific image preprocessing operations suitable for common applications. You can also preprocess images according to your own pipeline by using the transform and combine functions. For more information, see Datastores for Deep Learning.

Resize Images

You can store image data as a numeric array, ImageDatastore, or table. An ImageDatastore enables you to import data from image collections that are too large to fit in memory. This function is designed to read batches of images for faster processing in machine learning and computer vision applications. You can use an augmented image datastore or a resized 4-D array for training, prediction, and classification. You can use a resized 3-D array for prediction and classification only.

The method to resize images depends on the image data type.

Data TypeResizing FunctionSample Code
3-D array representing a single color image, a single multispectral image, or a stack of grayscale imagesimresize

To resize images in the 3-D array im3d:

im = imresize(im3d,inputSize);
4-D array representing a stack of imagesimresize

To resize images in the 4-D array im4d:

im = imresize(im4d,inputSize);
augmentedImageDatastore

To rescale images in the 4-D array im4d:

auimds = augmentedImageDatastore(inputSize,im4d);
ImageDatastoreaugmentedImageDatastore

To rescale images in the image datastore imds:

auimds = augmentedImageDatastore(inputSize,imds);
For a more complete example, see Train Deep Learning Network to Classify New Images.
tableaugmentedImageDatastore

To rescale images in the table tbl:

auimds = augmentedImageDatastore(inputSize,tbl);

By default, augmentedImageDatastore rescales images to the desired size. If instead you want to crop images from the center or from random positions in the image, you can use the 'OutputSizeMode' name-value pair argument. For example, this code shows how to crop images in image datastore imds from the center of each image:

auimds = augmentedImageDatastore(inputSize,imds,'OutputSizeMode','centercrop');

Augment Images for Training

In addition to resizing images, an augmentedImageDatastore enables you to preprocess images with a combination of rotation, reflection, shear, and translation transformations. The diagram shows how trainNetwork uses an augmented image datastore to transform training data for each epoch. For an example of the workflow, see Train Network with Augmented Images.

  1. Specify your training images.

  2. Configure image transformation options, such as the range of rotation angles and whether to apply reflection at random, by creating an imageDataAugmenter.

    Tip

    To preview the transformations applied to sample images, use the augment function.

  3. Create an augmentedImageDatastore. Specify the training images, the size of output images, and the imageDataAugmenter. The size of output images must be compatible with the size of the imageInputLayer of the network.

  4. Train the network, specifying the augmented image datastore as the data source for trainNetwork. For each iteration of training, the augmented image datastore applies a random combination of transformations to images in the mini-batch of training data.

    When you use an augmented image datastore as a source of training images, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set. The actual number of training images at each epoch does not change. The transformed images are not stored in memory.

Datastores for Advanced Image Preprocessing

Some datastores perform specific image preprocessing operations when they read a batch of data. These application-specific datastores are listed in the table. You can use these datastores as a source of training, validation, and test data sets for deep learning applications that use Deep Learning Toolbox™. All of these datastores return data in a format supported by trainNetwork.

DatastoreDescription
augmentedImageDatastoreApply random affine geometric transformations, including resizing, rotation, reflection, shear, and translation, for training deep neural networks. For an example, see Transfer Learning Using AlexNet.
pixelLabelImageDatastoreApply identical affine geometric transformations to images and corresponding ground truth labels for training semantic segmentation networks (requires Computer Vision Toolbox™). For an example, see Semantic Segmentation Using Deep Learning.
randomPatchExtractionDatastoreExtract multiple pairs of random patches from images or pixel label images (requires Image Processing Toolbox™). You optionally can apply identical random affine geometric transformations to the pairs of patches. For an example, see Single Image Super-Resolution Using Deep Learning.
denoisingImageDatastoreApply randomly generated Gaussian noise for training denoising networks (requires Image Processing Toolbox).

To perform more general and complex image preprocessing operations than offered by the application-specific datastores, you can use the transform and combine functions. The transform function creates an altered form of a datastore, called an underlying datastore, by transforming the data read by the underlying datastore according to a transformation function that you define. The combine function concatenates the data read from multiple datastores to the two-column table or two-column cell array format required by trainNetwork. The combine function maintains parity between the underlying datastores.

FunctionResulting DatastoreDescription
transformTransformedDatastoreTransform batches of read data from an underlying datastore according to your own preprocessing pipeline.
combineCombinedDatastoreHorizontally concatenate the data read from two or more underlying datastores.

The custom transformation function must accept data in the format returned by the read function of the underlying datastore. For image data, the format depends on the ReadSize property of the underlying ImageDatastore.

  • When ReadSize is 1, the transformation function must accept an integer array. The size of the array is consistent with the type of images in the ImageDatastore. For example, a grayscale image has dimensions m-by-n, a truecolor image has dimensions m-by-n-by-3, and a multispectral image with c channels has dimensions m-by-n-by-c.

  • When ReadSize is greater than 1, the transformation function must accept a cell array of image data corresponding to each image in the batch.

The transform function must return data that matches the input size of the network. The transform function does not support one-to-many observation mappings.

Tip

The transform function supports prefetching when the underlying ImageDatastore reads a batch of JPG or PNG image files. For these image types, do not use the readFcn argument of ImageDatastore to apply image preprocessing, as this option is usually significantly slower. If you use a custom read function, then ImageDatastore does not prefetch.

See Also

| | | |

Related Examples

More About