What Is Distributed Pipelining?
Distributed pipelining, or register retiming, is a speed optimization that moves existing delays in a design to reduce the critical path while preserving functional behavior. This optimization moves the delays within a subsystem while preserving the hierarchy.
The HDL Coder™ software uses an adaptation of the Leiserson-Saxe retiming algorithm.
For example, in this model, there is a delay of 2 at the output.
The diagram shows the generated model after distributed pipelining redistributes the delay to reduce the critical path.
Benefits and Costs of Distributed Pipelining
Distributed pipelining can reduce your design’s critical path, enabling you to use a higher clock rate and increase throughput.
However, distributed pipelining requires your design to contain a number of delays. If you need to insert additional delays in your design to enable distributed pipelining, this increases the area and the initial latency of your design.
How Distributed Pipelining Works
HDL Coder applies distributed pipelining with other optimization options to move
the existing delays in your design to reduce the critical path. The existing delays
in your design can either be design delays or pipeline delays. Design delays are
delays that you manually insert in your design by using Delay blocks, or other
blocks that have state, including Queue, HDL FIFO, or Buffer blocks. Pipeline delays
are delays that are generated by optimization settings, such as input or output
pipelining options, or block implementation settings, such as the
ShiftAdd implementation for a Divide block and
CORDIC approximation for a Trigonometric
Function block. See InputPipeline and OutputPipeline.
When you generate code, before distributed pipelining runs, the input and output pipelines you specify are inserted at the input and outputs ports in your design. HDL Coder uses these pipelines to determine latency requirements and an outline for ideal pipelining for your design.
HDL Coder begins distributed pipelining by calculating the propagation delay for the components in your design by assigning each component an equal weight, except for wire components, such as Selector blocks and Bit Concat blocks, that are assigned zero-propagation delay. Distributed pipelining then redistributes the design delays and pipeline delays based on the weight assigned to each component in your design. Distributed pipelining takes into consideration all of the pipeline stages and distributes them as evenly as possible in order to obtain the shortest critical path. The inserted pipelines might not appear in the place you originally requested them because distributed pipelining can move these to improve the critical path.
If pipelines are required at specific point after a block, you can set the HDL Block Property ConstrainedOutputPipeline on the block at the specified point to the number of pipelines that you want to keep as output pipelines after that block. Constrained output pipelining does not insert extra pipelines, but instead provides information to distributed pipelining to first prioritize pipelines moved to or left at that specific point in your design. This option is used only to prevent distributed pipelining from moving specified pipelines or to require distributed pipelining to move pipelines to that specified point. For more information, see Constrained Output Pipelining.
If you want to disable distributed pipelining from running on your subsystem and you have set constrained output pipelining on blocks in your design, disable distributed pipelining for your subsystem and reset the ConstrainedOutputPipeline property on the blocks to zero.
If you do not want the design delays you have in your design to be moved by distributed pipelining, enable Preserve design delays. See Preserve design delays.
Requirements for Distributed Pipelining
Distributed pipelining requires your design to contain delays or registers that can be redistributed. You can use input pipelining or output pipelining to insert more registers. You can insert these delays manually or insert them by setting the HDL block properties InputPipeline or OutputPipeline for the subsystem or block that you plan to set distributed pipelining. For more information, see InputPipeline and OutputPipeline.
If your design does not meet your timing requirements at first, try adding more delays or registers to improve your results.
Specify Distributed Pipelining
You can set distributed pipelining on a model or the top-level DUT subsystem. For finer control, you can set distributed pipelining on subsystems, Stateflow® charts, and MATLAB Function blocks in the top-level DUT subsystem. See Use Distributed Pipelining Optimization in Models with MATLAB Function Blocks.
To enable distributed pipelining for the model from the UI:
In the Apps tab, select HDL Coder. The HDL Code tab appears.
Click Settings. In the HDL Code Generation > Optimization pane, in the Pipelining tab, select Distributed pipelining and click OK.
To enable distributed pipelining for a model, on the command line, enter:
hdlset_param('modelname', 'DistributedPipelining', 'on')
If you have a top-level device under test (DUT) that contains a subsystem
hierarchy, such as lower-level subsystems, and you want distributed pipelining to
run throughout the DUT and the lower-level subsystems, enable the model
configuration parameter Distributed pipelining and leave
the HDL block property DistributedPipelining as
inherit for the DUT and all lower-level
To prevent distributed pipelining from running in a specific lower-level subsystem
in the DUT, set the HDL block property DistributedPipelining to
off for that subsystem.
If you insert pipeline registers, output data can be in an invalid state initially. To avoid test bench errors resulting from initial invalid samples, disable output checking for those samples. For more information, see Ignore output data checking (number of samples).
Limitations of Distributed Pipelining
The distributed pipelining optimization has these limitations:
Your pipelining results might not be optimal in hardware because the operator latencies in your target hardware might differ from the estimated operator latencies used by the distributed pipelining algorithm.
HDL Coder generates pipeline registers at the outputs instead of distributing the registers to reduce critical path in these situations:
Stateflow chart containing a state, local variable, or a matrix with statically unresolvable index.
HDL Coder distributes pipeline registers around these blocks instead of within them:
Sine HDL Optimized and Cosine HDL Optimized
Reciprocal Sqrt (
Trigonometric Function (
Single Port RAM
Dual Port RAM
Simple Dual Port RAM
Blocks with floating-point IP (i.e. native floating-point)
If you enable distributed pipelining for a subsystem that contains these blocks, HDL Coder generates a warning message during code generation in the HDL Code Generation Check Report. This message identifies blocks that act as barriers for distributed pipelining in the subsystem. HDL Coder distributes pipeline registers around nested subsystems.
M-PSK Demodulator Baseband
M-PSK Modulator Baseband
QPSK Demodulator Baseband
QPSK Modulator Baseband
BPSK Demodulator Baseband
BPSK Modulator Baseband
PN Sequence Generator
Distributed Pipelining Report
To see the distributed pipelining information in the report, before you generate code for each subsystem or model reference, enable the optimization report. In the HDL Code tab, select Report Options, and then select Generate optimization report.
When you generate the optimization report, in the Distributed Pipelining section, you see the effect of the distributed pipelining optimization for the subsystems that have distributed pipelining enabled in the generated model and in the Distributed Pipelining Summary. If distributed pipelining is unsuccessful, the report shows diagnostic messages and blocks that caused distributed pipelining to fail.
If you set the HDL block property ConstrainedOutputPipeline for a block in your design, the Constrained Output Pipeline Summary displays the requested constrained output pipelines and if those requests are met during code generation.
If distributed pipelining is successful, the Detailed Report in the Distributed Pipelining section displays comparative listings of registers before and after you apply the distributed pipelining transform.
Leiserson, C.E, and James B. Saxe. “Retiming Synchronous Circuitry.” Algorithmica. Vol. 6, Number 1, 1991, pp. 5-35.
- Distributed Pipelining: Speed Optimization
- Use Distributed Pipelining Optimization in Models with MATLAB Function Blocks
- Iteratively Maximize Clock Frequency by Using Speed Optimizations