Work with Deep Learning Data in Azure Blob Storage
This example shows how to set up, write to, and read from Azure® Blob Storage.
Before you can train your deep neural network in the cloud, you need to upload your data to the cloud. This example shows how to set up a cloud storage resource, upload a data set of labeled images to the cloud, and read that data from the cloud into MATLAB®. The example uses the CIFAR-10 data set, which is a labeled image data set commonly used for benchmarking image classification networks.
Download Data Set to Local Machine
Specify a local directory in which to download the data set. The following code creates a folder in your current directory containing all the images in the data set.
directory = pwd; [trainDirectory,testDirectory] = downloadCIFARToFolders(directory);
Downloading CIFAR-10 data set...done. Copying CIFAR-10 to folders...done.
Upload Local Data Set to Azure Blob Storage
To work with data in the cloud, you can upload it to Azure Blob Storage and then access the data from your local MATLAB session or from workers in your cluster. The following steps describe how to set up cloud storage and upload the CIFAR-10 data set from your local machine to an Azure Blob Container.
1. Sign up for a Microsoft® Azure account. See Microsoft Azure.
2. For efficient file transfers to and from Azure Blob Storage, download and install the Azure Command Line Interface tool from How to install the Azure CLI.
3. Login to Azure at a Windows® Command Prompt (CMD) or Linux® terminal.
4. Create a resource group, specifying the name of the resource group and the geographic location.
az group create --location <your storage location> --name <your resource group name>
A resource group is a container that holds resources for an Azure solution. To see a list of locations, use the command
az account list-locations.
5. Create a storage account in your resource group, specifying the name of the storage account.
az storage account create --name <your storage account name> --resource-group <your resource group name>
6. Create a storage container in your storage account, specifying the name of the storage container.
az storage container create --name <your storage container name> --account-name <your storage account name>
7. Upload the CIFAR-10 data to the container, specifying the source directory. Use the
--recursive flag to upload files within subdirectories of the source directory.
az storage fs directory upload --file-system <your storage container name> --account-name <your storage account name> -source "path/to/CIFAR10/on/the/local/machine"--recursive
Access Data Set in MATLAB
By default, MATLAB does not have permission to access data stored in your Azure Blob Storage. You can grant MATLAB access to the data by generating a shared access signature (SAS) token and providing it to MATLAB.
At a Windows® Command Prompt (CMD) or Linux® terminal, generate an SAS token. You can vary the permissions that the token provides and the expiry date of the token using the
az storage container generate-sas --account-name <your storage account name> --name <your storage container name> --permissions rwl --expiry 2023-06-01
In MATLAB, set the environment variable
MW_WASB_SAS_TOKEN as the generated SAS token.
SASToken = "<your generated SAS Token>"; setenv("MW_WASB_SAS_TOKEN",SASToken);
Changes to environment variables do not persist between MATLAB sessions. To specify an environment variable permanently, set them in your user or system environment. When you offload to workers in a cluster, the client MATLAB session and the workers have different environment variables. For information on how to copy environment variables from the client to the workers so that the workers can access cloud storage, see Set Environment Variables on Workers (Parallel Computing Toolbox).
You can read or write data from cloud storage using MATLAB functions and objects, such as file I/O functions and some datastore objects. When you specify the location of the data, you must specify the full path to the files or folders using a uniform resource locator (URL) of the form
URL = "wasbs://<your storage container name>@<your storage account name>.blob.core.windows.net/cifar10/train";
Create a datastore pointing to the URL of the container and show the number of images in each category.
ds = datastore(URL, ... Type="image", ... IncludeSubfolders=true, ... LabelSourc="foldernames"); countEachLabel(ds)
ans=10×2 table Label Count __________ _____ airplane 5000 automobile 5000 bird 5000 cat 5000 deer 5000 dog 5000 frog 5000 horse 5000 ship 5000 truck 5000
With the CIFAR-10 data set now stored in Azure Blob Storage, you can try any of the examples in Parallel and Cloud that show how to use the data set in different situations. Note that training a network is always faster if you have locally hosted training data.