Work with Deep Learning Data in Azure
This example shows how to set up, write to, and read from Azure® Blob Storage.
Before you can train your deep neural network in the cloud, you need to upload your data to the cloud. This example shows how to set up a cloud storage resource, upload a data set of labeled images to the cloud, and read that data from the cloud into MATLAB®. The example uses the CIFAR-10 data set, which is a labeled image data set commonly used for benchmarking image classification networks.
Download Data Set to Local Machine
Specify a local directory in which to download the data set. The following code creates a folder in your current directory containing all the images in the data set.
directory = pwd; [trainDirectory,testDirectory] = downloadCIFARToFolders(directory);
Downloading CIFAR-10 data set...done. Copying CIFAR-10 to folders...done.
Upload Local Data Set to Azure Blob Storage
To work with data in the cloud, you can upload it to Azure Blob Storage and then access the data from your local MATLAB session or from workers in your cluster. The following steps describe how to set up cloud storage and upload the CIFAR-10 data set from your local machine to an Azure Blob Container.
1. Log in to your Microsoft® Azure account. For information on creating an account, see Microsoft Azure.
2. For efficient file transfers to and from Azure Blob Storage, download and install the Azure Command Line Interface tool from How to install the Azure CLI.
3. Login to Azure at a your system's command prompt.
4. Create a resource group, specifying a name for the resource group and the geographic location.
az group create --name <your resource group name> --location <your storage location>
A resource group is a container that holds resources for an Azure solution. To see a list of locations, use the command
az account list-locations. Any of the locations in the returned
"name" fields, for example
useast, can be passed as a location.
5. Create a storage account in your resource group, specifying a name for the storage account.
az storage account create --name <your storage account name> --resource-group <your resource group name>
6. Create a storage container in your storage account, specifying a name for the storage container.
az storage container create --name <your storage container name> --account-name <your storage account name>
7. Upload the CIFAR-10 data to the container, specifying the source directory. Use the
--recursive flag to upload files within subdirectories of the source directory.
az storage fs directory upload --file-system <your storage container name> --account-name <your storage account name> --source "path/to/CIFAR10/on/the/local/machine" --recursive
Access Data Set in MATLAB
By default, MATLAB does not have permission to access data stored in your Azure Blob Storage. You can grant MATLAB access to the data by generating a shared access signature (SAS) token and providing it to MATLAB.
At your system's command prompt, generate an SAS token. You can vary the permissions that the token provides and the expiry date of the token using the
--expiry parameters. For example, this line generates an SAS token that grants read, write, and list permissions until the specified date.
az storage container generate-sas --account-name <your storage account name> --name <your storage container name> --permissions rwl --expiry YYYY-MM-DD
Copy the generated SAS token and, in MATLAB, set the environment variable
MW_WASB_SAS_TOKEN using the generated token.
SASToken = "<your generated SAS Token>"; setenv("MW_WASB_SAS_TOKEN",SASToken);
Changes to environment variables do not persist between MATLAB sessions. To specify an environment variable permanently, set them in your user or system environment. When you offload to workers in a cluster, the client MATLAB session and the workers have different environment variables. For information on how to copy environment variables from the client to the workers so that the workers can access cloud storage, see Set Environment Variables on Workers (Parallel Computing Toolbox).
You can read or write data from cloud storage using MATLAB functions and objects, such as file I/O functions and some datastore objects. When you specify the location of the data, you must specify the full path to the files or folders using a uniform resource locator (URL) of the form
URL = "wasbs://<your storage container name>@<your storage account name>.blob.core.windows.net/cifar10/train";
Create a datastore pointing to the URL of the container and show the number of images in each category.
ds = datastore(URL, ... Type="image", ... IncludeSubfolders=true, ... LabelSource="foldernames"); countEachLabel(ds)
With the CIFAR-10 data set now stored in Azure Blob Storage, you can try any of the examples in Parallel and Cloud that show how to use the data set in different situations. Note that training a network is faster if you have locally hosted training data.