Main Content

Create and Use Distributed Arrays

If your data is currently in the memory of your local machine, you can use the distributed function to distribute an existing array from the client workspace to the workers of a parallel pool. Distributed arrays use the combined memory of multiple workers in a parallel pool to store the elements of an array. For alternative ways of partitioning data, see Distributing Arrays to Parallel Workers. You operate on the entire array as a single entity, however, workers operate only on their part of the array, and automatically transfer data between themselves when necessary. You can use distributed arrays to scale up your big data computation. Consider distributed arrays when you have access to a cluster, as you can combine the memory of multiple machines in your cluster.

A distributed array is a single variable, split over multiple workers in your parallel pool. You can work with this variable as one single entity, without having to worry about its distributed nature. To explore the functionalities available for distributed arrays in the Parallel Computing Toolbox™, see Run MATLAB Functions with Distributed Arrays.

When you create a distributed array, you cannot control the details of the distribution. On the other hand, codistributed arrays allow you to control all aspects of distribution, including dimensions and partitions. In the following, you learn how to create both distributed and codistributed arrays.

Creating Distributed Arrays

You can create a distributed array in different ways:

  • Use the distributed function to distribute an existing array from the client workspace to the workers of a parallel pool.

  • You can directly construct a distributed array on the workers. You do not need to first create the array in the client, so that client workspace memory requirements are reduced. The functions available include eye(___,"distributed"), rand(___,"distributed"), etc. For a full list, see the Alternative Functionality section of the distributed object reference page.

  • To create a codistributed array inside an spmd statement, see Single Program Multiple Data (spmd). Then access it as a distributed array outside the spmd statement. This lets you use distribution schemes other than the default.

In this example, you create an array in the client workspace, then turn it into a distributed array.

Create a new pool if you do not have an existing one open.

parpool("Processes",4)

Create a magic 4-by-4 matrix on the client and distribute the matrix to the workers. View the results on the client and display information about the variables.

A = magic(4);
B = distributed(A);
B
whos
  Name      Size            Bytes  Class                   Attributes

  A         4x4               128  double                            
  B         4x4               128  distributed                       
  

You have created B as a distributed array, split over the workers in your parallel pool. This is shown in the figure below. The distributed array is ready for further computations.

Array split by column over four workers.

Close the pool after you have finished using the distributed array.

delete(gcp)

Creating Codistributed Arrays

Unlike distributed arrays, codistributed arrays allow you to control all aspects of distribution, including dimensions and partitions. You can create a codistributed array in different ways:

  • Partitioning a Larger Array — Start with a large array that is replicated on all workers, and partition it so that the pieces are distributed across the workers. This is most useful when you have sufficient memory to store the initial replicated array.

  • Building from Smaller Arrays — Start with smaller replicated arrays stored on each worker, and combine them so that each array becomes a segment of a larger codistributed array. This method reduces memory requirements as it lets you build a codistributed array from smaller pieces.

  • Using MATLAB Constructor Functions — Use any of the MATLAB® constructor functions like rand or zeros with a codistributor object argument. These functions offer a quick means of constructing a codistributed array of any size in just one step.

In this example, you create a codistributed array inside an spmd statement, using a nondefault distribution scheme.

Create a parallel pool with two workers.

parpool("Processes",2)

In an spmd statement, define a 1-D distribution along the third dimension, with 4 parts on worker 1, and 12 parts on worker 2. Then create a 3-by-3-by-16 array of zeros. View the codistributed array on the client and display information about the variables.

spmd
    codist = codistributor1d(3,[4,12]);
    Z = zeros(3,3,16,codist);
    Z = Z + spmdIndex;
end
Z
whos
  Name        Size              Bytes  Class          Attributes

  Z           3x3x16             1152  distributed              
  codist      1x2                 265  Composite 

Close the pool after you have finished using the codistributed array.

delete(gcp)

For more details on codistributed arrays, see Working with Codistributed Arrays.

Related Topics