Multicore Simulation of Video Processing System

This example shows how a Video Processing system that implements counting of objects in input video. It uses the dataflow domain in Simulink® to automatically partition the data-driven portions of the video processing system into multiple threads and thereby improving the performance of the simulation by executing it on your desktop's multiple cores

Introduction

Dataflow execution domain allows you to make use of multiple cores in the design of computationally intensive systems. This example shows how dataflow as the execution domain of a subsystem improves simulation performance of the model. To learn more about dataflow and how to run Simulink models using multiple threads, see Multicore Execution using Dataflow Domain (DSP System Toolbox).

Object Counting in Video

This example shows how to use basic morphological operators to extract information from a video stream. In this case the model counts the number of staples in each video frame. The model uses the Top-hat block to remove uneven illumination and then the Autothreshold block to convert it into a binary image. The Blob Analysis block is then used to count the number of staples and compute the centroid of each staple. The Draw markers and insert text block are used to mark the staples and write the number of staples found on the video frame.

Setting up the Dataflow Subsystem

This example uses dataflow domain in Simulink to make use of multiple cores on your desktop to improve simulation performance. The Domain parameter of the Dataflow Subsystem in this model is set as Dataflow. You can view this by selecting the subsystem and then selecting View>Property Inspector. Dataflow domains automatically partition your model and simulate the system using multiple threads for better simulation performance. Once you set the Domain parameter to Dataflow, you can use Dataflow Simulation Assistant to analyze your model to get better performance. You can open Dataflow Simulation Assistant, by clicking on the Dataflow assistant button below the Automatic frame size calculation parameter in Property Inspector.

Analyzing Concurrency in Dataflow Subsystem

In the Dataflow Simulation Assistant, click the Analyze button to start the analysis of the dataflow domain for simulation performance. Once the analysis is finished, the Dataflow Simulation Assistant shows how many threads the dataflow subsystem will use during simulation.

After analyzing the model, the assistant shows one thread because the data dependency between the blocks in the model prevents blocks from being executed concurrently. By pipelining the data dependent blocks, the Dataflow Subsystem can increase concurrency for higher data throughput. The Dataflow Simulation Assistant shows the recommended number of pipeline delays as Suggested Latency. The suggested latency value is computed to give the best performance.

The following diagram shows the Dataflow Simulation Assistant where the Dataflow Subsystem currently specifies a latency value of zero, and the suggested latency for the system is two.

Click the Accept button next to Suggested Latency in the Dataflow Simulation Assistant to use the recommended latency for the Dataflow Subsystem. This value can also be entered directly in the Property Inspector for "Latency" parameter. Simulink shows the latency parameter value using tags at the output ports of the dataflow subsystem.

Dataflow Simulation Assistant now shows the number of threads as 2 meaning that the blocks inside the dataflow subsystem simulate in parallel using 2 threads.

Multicore Simulation Performance

We measure the performance improvement of using dataflow domain by comparing the execution time taken for running model with and without using dataflow. Execution time is measured using the sim command, which returns the simulation execution time of the model. While measuring the execution time the Video Viewer block is commented to measure the time taken primarily for the Dataflow Subsystem. These numbers and analysis were published on a Windows desktop computer with Intel® Xeon® CPU E5-1650 v3 @ 3.5 GHz 6 Cores 12 Threads processor.

Simulation execution time for multithreaded model = 6.84s
Simulation execution time for single-threaded model = 10.75s
Actual speedup with dataflow: 1.6x

Summary

This example shows how multithreading using dataflow domain can improve performance in a video processing model using multiple cores on the desktop.