This example shows three ways to accelerate the simulation of communications algorithms in MATLAB®. In particular, it showcases the effects that using System objects, MATLAB to C code generation, and parallel processing runs (using the MATLAB
parfor function) have on simulation speed.
The combined effect of using these methods may speed up a typical simulation time by an order of magnitude. The difference is tantamount to running the simulation overnight or within just a few hours.
The System objects this example features are accessible in the Communications Toolbox™ product. In order to run the MATLAB to C code generation section of this example, you must have a MATLAB Coder™ license. In order to run the parallel processing section of this example, you must have a Parallel Computing Toolbox™ license.
This example examines various implementations of the following transceiver system:
This system is composed of a transmitter, a channel model, and a receiver. The transmitter processes the input bit stream with a convolutional encoder, an interleaver, a modulator, and a MIMO space-time block encoder [ 1 ], [ 2 ]. The transmitted signal is then processed by a 2x2 MIMO channel and an additive white gaussian noise (AWGN) channel. The receiver processes its input signal with a 2x2 MIMO space-time block decoder, a demodulator, a deinterleaver, and a Viterbi decoder to recover the best estimate of the input bit stream at the receiver.
The workflow of this example is as follows:
Start with a function-based algorithm as your baseline
Use the MATLAB Profiler GUI to identify speed bottlenecks
Improve the simulation time using System objects
Accelerate the simulation with MATLAB to C code generation
Achieve even faster simulation using parallel processing runs
Start with a function that represents the first version or baseline implementation of this algorithm. The inputs to the commaccelerationbaseline function are the Eb/No value of the current frame (EbNo) and the number of bits the algorithm processes (MaxNumBits). Eb/No is the ratio of energy per bit to noise power spectral density. The function output is the bit error rate (BER) of the algorithm. Note that the baseline algorithm implementation uses functions and data objects from the Communications Toolbox, along with some user-defined functions, such as localmimoencoder, localmimodecoder and localmimochannel.
As a starting point, measure the time it takes to run this baseline algorithm in MATLAB. Use the MATLAB timing functions (tic and toc) to record the elapsed time for running this function, as it is called within a for-loop that iterates over Eb/No values from 0 to 7 dB.
MaxSNRdB=7;EbNo=1;MaxNumBits=2e5; N=1;str='Baseline'; commaccelerationbaseline(EbNo,1e4); berBaseline=zeros(size(0:MaxSNRdB)); fprintf(1,'Processing the baseline algorithm.\n'); tic; for EbNo=0:MaxSNRdB y=commaccelerationbaseline(EbNo,MaxNumBits); berBaseline(EbNo+1)=y; end a=toc;
The result shows the simulation time (in seconds) of the baseline algorithm. Use this measurement as a yardstick to compare with subsequent versions of the algorithm.
Identify the processing bottlenecks and problem areas of the baseline algorithm using the MATLAB Profiler. Obtain the profiler information by executing the following script:
profile on y=commaccelerationbaseline(EbNo,1e5); profile off
The Profiler report presents the execution time for each function call of the algorithm in descending order. The first few functions that the Profiler window depicts represent the speed bottleneck of the algorithm. In this case, two user-defined functions (localmimodecoder and localmimochannel) and the vitdec function (Viterbi decoder function of the Toolbox) are identified as the major speed bottlenecks.
The function commaccelerationsystemobjects implements the second version of the algorithm, which uses System objects from the Communications Toolbox. Among ten function calls found in the baseline version of this algorithm, nine calls are replaced by corresponding calls to available System objects. The resulting commaccelerationsystemobjects function is composed of two distinct parts:
Declaration, which creates the System objects
Execution, which calls the step method for each System object in order to perform specific operations.
Measure the simulation time for this version of the algorithm. Record the elapsed time for running this function in the same for-loop as before.
N=2; str='Using System objects'; commaccelerationsystemobjects(EbNo,1e4); berSystemobject=zeros(size(berBaseline)); fprintf(1,'Processing the System object version of the algorithm.\n'); tic; for EbNo=0:MaxSNRdB y=commaccelerationsystemobjects(EbNo,MaxNumBits); berSystemobject(EbNo+1)=y; end b=toc;
The result shows the simulation time of the System object version of the algorithm. Note that by replacing calls to the functions with calls to System objects, the algorithm runs faster. This behavior is expected because one of the advantages of using System objects is efficiency. Algorithms using System objects separate declaration from execution, which avoids the repeated input validation and verification routines found in function-based algorithms. The algorithm implementation that uses System objects performs parameter handling and initializations only once, outside the loop, and exploits efficient MEX implementations of the System objects, which improves overall simulation performance.
MATLAB Coder generates portable and readable C code from algorithms that are part of the MATLAB code generation subset. The function commaccelerationsystemobjects, the second version of the algorithm, uses System objects that support code generation with MATLAB Coder. Therefore, this algorithm can be compiled and linked into MATLAB as a MEX (MATLAB executable) function. Use the MATLAB Coder codegen command to compile the System objects version of the algorithm into a MEX function (called commaccelerationsystemobjects_mex).
Note: You must have a MATLAB Coder license to run this section of the example.
Measure the simulation time for the MEX version of the algorithm. Record the elapsed time for running this function in the same for-loop as before.
N=3; str='MATLAB to C code generation'; commaccelerationsystemobjects_mex(EbNo,1e4); berCodegen=zeros(size(berBaseline)); fprintf(1,'Processing the MEX function of the second version of the algorithm.\n'); tic; for EbNo=0:MaxSNRdB y=commaccelerationsystemobjects_mex(EbNo,MaxNumBits); berCodegen(EbNo+1)=y; end c=toc;
The result shows the simulation time of the MEX version of the algorithm. Note that by compiling the System object algorithm into a MEX function, the MEX version of the algorithm runs faster than both the second and the baseline versions of the algorithm. This behavior is expected because one of the advantages of using MATLAB to C code generation is simulation acceleration. Although the algorithm that uses System objects is highly optimized, code generation can accelerate simulation by locking-down the sizes and data-types of variables inside the function. This process makes the execution more efficient because it removes the overhead of the interpreted language that checks for size and data-type in every line of the code.
Utilize multiple cores to increase simulation acceleration by running tasks in parallel. Use parallel processing runs (
parfor loops) in MATLAB to perform the work on the number of available workers. Parallel Computing Toolbox enables you to run different iterations of the simulation in parallel. Use the
parpool function to reserve a number of MATLAB workers for executing a subsequent
parfor-loop. In this example, two workers run locally on a MATLAB client machine.
Note: You must have a Parallel Computing Toolbox license to run this section of the example.
if isempty(gcp('nocreate')), parpool; end
Measure the simulation time for the MEX version of the algorithm that executes within a
parfor-loop rather than a for-loop used in the previous cases.
N=4; str='Parallel simulation runs with parfor'; commaccelerationsystemobjects_mex(EbNo,1e4); berPct=zeros(size(berBaseline)); fprintf(1,'Processing the MEX function of the second version of the algorithm within a parfor-loop.\n'); tic; parfor EbNo=0:MaxSNRdB y=commaccelerationsystemobjects_mex(EbNo,MaxNumBits); berPct(EbNo+1)=y; end d=toc;
The result shows the simulation time of the MEX version of the second algorithm executing within a
parfor-loop. Note that by running this version within a
parfor-loop we get the fastest simulation performance. The basic concept of a
parfor-loop is the same as the standard MATLAB for-loop. The difference is that
parfor divides the loop iterations into groups so that each worker executes some portion of the total number of iterations. Because several MATLAB workers can be computing concurrently on the same loop, a
parfor-loop provides significantly better performance than its analogous for-loop.
The combined effects of
MATLAB to C code generation and
Parallel processing runs
can significantly speed up simulations of your communications algorithms.
System objects help improve simulation speed by performing parameter handling and initializations only once outside the execution loop, by exploiting efficient MEX implementations of the System objects, and by avoiding repeated input validation and verification routines found in function-based algorithms.
MATLAB to C code generation accelerates simulation by locking-down data-types and sizes of every variable and by reducing the overhead of the interpreted language that checks for the size and data-type of variables in every line of the code.
Parallel processing runs can substantially accelerate simulation by computing different iterations of your algorithm concurrently on a multitude of available MATLAB workers.
In this example we use the function
parpool to reserve a number of MATLAB workers that run locally on our MATLAB client machine. By modifying the parallel configurations, you can accelerate the simulation even further by running the algorithm on a larger cluster of workers that are not on your MATLAB client machine. For a description of how to manage and use parallel configurations, see Programming with User Configurations page of Parallel Computing Toolbox User's Guide.
In this example, the baseline and System object versions of the algorithm are not numerically identical. The user-defined function localmimodecoder used in the baseline algorithm implements a slightly different version of the MIMO space-time block decoding operation than the one used in the comm.OSTBCCombiner System object of the second version of the algorithm. Nevertheless, when you run each of the first three versions of the algorithm (the baseline version, the System object version, and the MEX function of the second version) with a sufficiently large number of input bits, you will get very similar BER curves indicating similar numerical performance for different versions of the algorithm. Here you find the BER curves obtained by running each of the three versions of the algorithm with the number of input bits set to ten millions (i.e. MaxNumBits=1e7).
The following functions are used in this example.
S. M. Alamouti, "A simple transmit diversity technique for wireless communications," IEEE® Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1451-1458, Oct. 1998.
V. Tarokh, H. Jafarkhami, and A. R. Calderbank, "Space-time block codes from orthogonal designs," IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1456-1467, Jul. 1999.