Multicore Execution of Interpolated FIR Filter using Dataflow domain
This example shows how to speed up execution of an Interpolated FIR Filter using dataflow domain in Simulink.
Introduction
Many real-time audio and digital signal processing applications require filtering of a signal streaming at a high sampling rate. The computational power thus required will grow exponentially with the increase in the input sample rate or the filter order. One way to optimize the filtering process is to break it into multiple stages, but the input will still be processed at the same rate. This examples demonstrates how to use multicore processing in the context of an Interpolated FIR Filter to improve the performance of the model.
Interpolated FIR Filter
Interpolated FIR Filter provides an efficient alternative to a high filter order FIR Filter by using an FIR Decimator and an FIR Interpolator to change the rate at which the input is filtered. The input is first passed through an FIR Decimator to lower the sampling rate, following which it is filtered by a set of FIR Filters. Before we emit the output, we need to convert the sampling rate of the filtered output back to its original value, which is performed using an FIR Interpolator.
modelName = 'MulticoreInterpolatedFIRExample';
open_system(modelName);
Simulate the model and measure the execution time. Execution time is measured using the output of the sim command which returns the simulation execution time of the model.
set_param(modelName, 'SimCompilerOptimization', 'on'); set_param(modelName, 'SimCtrlC', 'off'); tSim = sim(modelName, 'StopTime', '1e4'); tSimSingleThread = tSim.getSimulationMetadata.TimingInfo.ExecutionElapsedWallTime; fprintf('Simulation execution time for single threaded model = %.2fs\n', tSimSingleThread);
Simulation execution time for single threaded model = 18.48s
Specify Dataflow Execution Domain
In Simulink, you specify dataflow as the execution domain for a subsystem by setting the Domain parameter to Dataflow
using Property Inspector. To access Property Inspector, in the Simulink Toolstrip, on the Modeling tab, in the Design gallery select Property Inspector or on the Simulation tab, Prepare gallery, select Property Inspector. In the Property Inspector, you can set the Domain to Dataflow
by selecting Set domain specification and then choosing Dataflow
for Domain setting. You can also use Dataflow Subsystem block from the Dataflow library of DSP System toolbox to get a subsystem that is preconfigured with the dataflow execution domain.
set_param([modelName,'/Dataflow Subsystem'],'SetDomainSpec','on'); set_param([modelName,'/Dataflow Subsystem'],'DomainSpecType','Dataflow'); set_param([modelName,'/Dataflow Subsystem'],'Latency','0'); set_param([modelName,'/Dataflow Subsystem'],'AutoFrameSizeCalculation','off'); set_param(modelName, 'SimulationCommand', 'Update');
Multicore Simulation of Dataflow Domain
Dataflow domains automatically partition your model into multiple threads for better performance. Once you set the Domain parameter to Dataflow
, you can use the Multicore tab analysis to analyze your model to get better performance. The Multicore tab is available in the toolstrip when there is a dataflow domain in the model. To learn more about the Multicore tab, see Perform Multicore Analysis for Dataflow.
For this example the Multicore tab mode is set to Simulation Profiling
for simulation performance analysis.
It is recommended to optimize model settings for optimal simulation performance. To accept the proposed model settings, on the Multicore tab, click Optimize. Alternatively, you can use the drop menu below the Optimize button to change the settings individually. In this example the model settings are already optimal.
On the Multicore tab, click the Run Analysis button to start the analysis of the dataflow domain for simulation performance. Once the analysis is finished, the Analysis Report and Suggestions window shows how many threads the dataflow subsystem uses during simulation.
sim(modelName);
After analyzing the model, the Analysis Report and Suggestions window shows one thread because the data dependency between the blocks in the model prevents blocks from being executed concurrently. By pipelining the data dependent blocks, the dataflow subsystem can increase concurrency for higher data throughput. The Analysis Report and Suggestions window shows the recommended number of pipeline delays as Suggested for Increasing Concurrency. The suggested latency value is computed to give the best performance.
The following diagram shows the Analysis Report and Suggestions window where the suggested latency is 3 for the dataflow subsystem.
Click the Accept button to use the recommended latency for the dataflow subsystem. This value can also be entered directly in the Property Inspector for Latency parameter. Simulink shows the Latency parameter value using tags at the output ports of the dataflow subsystem.
The Analysis Report and Suggestions window now shows the number of threads as 3 meaning that the blocks inside the dataflow subsystem simulate in parallel using 3 threads. Highlight threads highlights the blocks with colors based on their thread allocation as shown in the Thread Highlighting Legend. Show pipeline delays shows where pipelining delays were inserted within the dataflow subsystem using tags. Note that the number of threads that can be used in the dataflow domain depend on the machine configuration as well as the filter specifications defined in ifir_init.
set_param([modelName,'/Dataflow Subsystem'],'Latency','3'); set_param(modelName, 'SimulationCommand', 'Update');
Dataflow Simulation Performance
Simulate the model and measure model execution time. Execution time is measured using the sim command, which returns the simulation execution time of the model. We can measure the amount of speedup obtained by dividing the execution time taken by the model using multiple threads with the execution time taken by the original model. This number is computed and shown below.
These numbers and analysis were published on a Windows desktop computer with Intel® Xeon® CPU W-2133 v3 @ 3.6 GHz 6 Cores 12 Threads processor.
sim(modelName); tSim = sim(modelName, 'StopTime', '1e4'); tSimMultiThread = tSim.getSimulationMetadata.TimingInfo.ExecutionElapsedWallTime; fprintf('Simulation execution time for multithreaded model = %.2fs\n', tSimMultiThread); fprintf('Actual speedup with dataflow: %.1fx\n', tSimSingleThread/tSimMultiThread);
Simulation execution time for multithreaded model = 10.37s Actual speedup with dataflow: 1.8x