# Run `parfor`

-Loops Without a Parallel Pool

*Since R2024a*

This example shows how to run `parfor`

-loops on a large cluster without a parallel pool.

Running `parfor`

computations directly on a cluster allows you to use hundreds of workers to perform your `parfor`

-loop. When you use this approach, `parfor`

can use all the available workers in the cluster, and release the workers as soon as the loop completes. This approach is also useful if your cluster does not support parallel pools. However, when you run parfor computations directly on a cluster, you do not have access to `DataQueue`

or `Constant`

objects, and the workers restart between iterations, which can lead to significant overheads.

This example recreates the update of the ARGESIM benchmark CP2 Monte Carlo study [1] by Jammer et al [2]. For the CP2 Monte Carlo study, you simulate a spring-mass-damper system with different randomly sampled damping factors in parallel.

### Create Cluster Object

Create the cluster object to and display the number of workers available in the cluster. `HPCProfile`

is a profile for a MATLAB® Job Scheduler cluster.

cluster = parcluster("HPCProfile"); maxNumWorkers = cluster.NumWorkers; fprintf("Number of workers available: %d",maxNumWorkers)

Number of workers available: 496

### Define Simulation Parameters

Set the simulation period, time interval, and initial states for the mass-spring system ODE.

period = [0 2]; % Use a period from 0 to 2 seconds h = 0.001; % time step t_interval = period(1):h:period(2); y0 = [0 0.1];

Set the number of iterations.

nReps = 10000000;

Initialize the random number generator and create an array of damping coefficients sampled from a uniform distribution with the range [800,1200].

rng(0); a = 800; b = 1200; d = (b-a).*rand(nReps,1)+a;

### Run ODE Solver in Parallel

Initialize the results variable for the reduction operation.

y_sum = zeros(numel(t_interval),1);

Execute the ODE solver in a `parfor`

-loop to simulate the system with varying damping coefficients. To run the `parfor`

computations directly on the cluster, pass the cluster object as the second input argument to `parfor`

. Use a reduction variable to compute the sum of the motion at each time step.

parfor(n = 1:nReps,cluster) f = @(t,y) massSpringODE(t,y,d(n)); [tOut,yOut] = ode45(f,t_interval,y0); y_sum = y_sum + yOut(:,1); end

Compute the mean response of the system and plot the response against time.

meanY = y_sum./numel(d); plot(t_interval,meanY) title("ODE Solution of Mass-Spring System") xlabel("Time") ylabel("Motion") grid on

### Compare Computational Speedup

Compare the computational speedup of running the `parfor`

-loop directly on the cluster to that of running the `parfor`

-loop on a parallel pool.

Use the `timeExecution`

helper function attached to this example to measure the execution time of the `parfor`

-loop workflow on the client, on a parallel pool with 496 workers, and directly on a cluster with 496 workers available.

```
[serialTime,hpcPoolTime,hpcClusterTime] = timeExecution("HPCProfile",maxNumWorkers);
elapsedTimes = [serialTime hpcPoolTime hpcClusterTime];
```

Calculate the computational speedup.

```
speedUp = elapsedTimes(1)./elapsedTimes;
fprintf("Speedup on cluster = %4.2f\nSpeedup on pool = %4.2f",speedUp(3),speedUp(2))
```

Speedup on cluster = 154.11 Speedup on pool = 171.23

Create a bar chart comparing the speedup of each execution. The chart shows that running the `parfor`

-loop directly on the cluster has a similar speedup to that of running the `parfor`

-loop on a parallel pool.

figure; x = ["Client","Pool","Cluster"]; bar(x,speedUp); ylabel("Computational Speedup") xlabel("parfor Execution Environment") grid on

The speedup values are similar because the example uses a MATLAB Job Scheduler cluster. When you run the `parfor`

-loop directly on a MATLAB Job Scheduler cluster, `parfor`

can sometimes resuse workers without restarting them between iterations, which reduces overheads. If you run the `parfor`

-loop directly on a third-party scheduler cluster, `parfor`

restarts workers between iterations, which can result in significant overheads and much lower speedup values.

### Helper Functions

This helper function represents the mass-spring system's ODEs that the solver uses.

You can rewrite the differential equation that describes the spring-mass system (eq1) as a system of first-order ODEs (eq2) that you can solve using the `ode45`

solver.

$d\phantom{\rule{0.16666666666666666em}{0ex}}\underset{}{\overset{\dot{}}{x}}(t)+k\phantom{\rule{0.16666666666666666em}{0ex}}x(t)+m\phantom{\rule{0.16666666666666666em}{0ex}}\underset{}{\overset{\xa8}{x}}(t)=0$ (eq1)

$$\begin{array}{l}{\underset{}{\overset{\dot{}}{y}}}_{1}={y}_{2}\\ {\underset{}{\overset{\dot{}}{y}}}_{2}=-\frac{d{y}_{2}+k{y}_{1}}{m}\end{array}$$ (eq2)

function dy = massSpringODE(t,y0,d) k = 9000; % spring stiffness (N/m) m = 450; % mass (kg) dy = zeros(2,1); dy(1) = y0(2); dy(2) = -(d*y0(2)+k*y0(1))/m; end

### References

[1] Breitenecker, Felix, Gerhard Höfinger, Thorsten Pawletta, Sven Pawletta, and Rene Fink. "ARGESIM Benchmark on Parallel and Distributed Simulation." *Simulation News Europe SNE* 17, no. 1 (2007): 53-56.

[2] Jammer, David, Peter Junglas, and Sven Pawletta. “Solving ARGESIM Benchmark CP2 ’Parallel and Distributed Simulation’ with Open MPI/GSL and Matlab PCT - Monte Carlo and PDE Case Studies.” *SNE Simulation Notes Europe* 32, no. 4 (December 2022): 211–20. https://doi.org/10.11128/sne.32.bncp2.10625.

## See Also

Scale Up Parallel Code to Large Clusters