for-Loop Implementation in Code Generated by Using parfor
When you generate C/C++ code for a model by using a MATLAB Function, a
MATLAB System, and For Each subsystem block, by default,
the code generator produces code that implements
for-loops in a single
for-loop can be optimized for MATLAB Function
and MATLAB System blocks. The iterations of the
parfor-loop can run in parallel on multiple cores on the target hardware.
Running the iterations in parallel might significantly improve execution speed of generated code. For more information, see How parfor-Loops Improve Execution Speed.
The code generator implements the
for-loops in parallel by using
Embedded Coder™ software uses the Open Multiprocessing (OpenMP)
application interface to support shared-memory, multicore code generation. By default,
Embedded Coder uses as many threads as it finds available. If you
specify the number of threads to use, Embedded Coder uses at most
that number of threads, even if additional threads are available. For more information, see
parfor-loop might provide better execution speed than its
for-loop because several threads can compute concurrently
on the same loop.
Each execution of the body of a
parfor-loop is called an iteration.
The threads evaluate iterations in an arbitrary order and independently of each other.
Because each iteration is independent, the threads do not have to be synchronized. If
the number of threads is equal to the number of loop iterations, each thread performs
one iteration of the loop. If the number of iterations is greater than the number of
threads, some threads perform more than one loop iteration.
For example, when a loop of 100 iterations runs on 20 threads, each thread simultaneously executes five iterations of the loop. If your loop takes a long time to run because of the large number of iterations or lengthy individual iterations, you can reduce the run time significantly by using multiple threads. In this example, you might not get 20 times improvement in speed because of parallelization overheads, such as thread creation and deletion.
parfor when you have:
Many iterations of a simple calculation.
divides the loop iterations into groups so that each thread executes one
group of iterations.
A loop iteration that takes a long time to execute.
parfor executes the iterations simultaneously on
different threads. Although this simultaneous execution does not reduce the
time spent on an individual iteration, it might significantly reduce overall
time spent on the loop.
Do not use
An iteration of your loop depends on other iterations. Running the iterations in parallel can lead to erroneous results.
To help you avoid using
parfor when an iteration of your
loop depends on other iterations, Embedded Coder specifies a rigid classification of variables. For more
information, see Classification of Variables in parfor-Loops. If Embedded Coder detects loops that do not conform to the
parfor specifications, it does not generate code and
produces an error.
Reductions are an exception to the rule that loop iterations must be independent. A reduction variable accumulates a value that depends on all the iterations together, but is independent of the iteration order. For more information, see Reduction Variables.
There are only a few iterations that perform some simple calculations.
For small number of loop iterations, you might not accelerate execution due to parallelization overheads. Such overheads include time taken for thread creation, data synchronization between threads, and thread deletion.
For Each Subsystem contains an S-Function block. The generated
code will not contain
To run for-loops in parallel in the generated code, write the code within a
MATLAB Function, or a MATLAB System, block using
Create a Simulink™ model.
Add the MATLAB Function or the MATLAB System block to the model.
Add the code to the MATLAB Function or the MATLAB System block.
function y = access3a(u) %#codegen % Copyright 2010 The MathWorks, Inc. persistent pA; if isempty(pA) pA = 0; end A = ones(20,50); t = 0; parfor (i = 1:10,4) A(i,1) = A(i,1) + 1; end y = A(1,4) + u + t + pA;
In the Optimization pane select the
execution speed option from the Priority drop-down
list. The parameter Generate parallel for-loops is
automatically selected. The parameter enables the compiler to compute loops in
Connect the blocks.
Build the model and generate code.
In the generated code, the pragma instructs the compiler to execute the looping in OpenMP parallel for-loops through multithreading :
#pragma omp parallel for num_threads(4 > omp_get_max_threads() ? omp_get_max_threads() : 4)
4 indicates the number of processing
Because the loop body can execute in parallel on multiple threads, it must conform to
certain restrictions. If the Embedded Coder software detects
loops that do not conform to
parfor specifications, it produces an
error. For more information, see parfor Restrictions.