Complex PartialSystolic Matrix Solve Using Qless QR Decomposition with Forgetting Factor
Compute the value of X in the equation A'AX = B for complexvalued matrices with infinite number of rows using Qless QR decomposition
Since R2020b
Libraries:
FixedPoint Designer HDL Support /
Matrices and Linear Algebra /
Linear System Solvers
Description
The Complex PartialSystolic Matrix Solve Using Qless QR Decomposition with Forgetting Factor block solves the system of linear equations, A'AX = B, using Qless QR decomposition, where A and B are complexvalued matrices. A is an infinitely tall matrix representing streaming data.
When the regularization parameter is nonzero, the Complex PartialSystolic Matrix
Solve Using Qless QR Decomposition with Forgetting Factor initializes the first
uppertriangular factor R to λI_{n} before factoring in the rows of A, where
λ is the regularization parameter and
I_{n} =
eye(n)
.
Examples
Ports
Input
A(i,:) — Rows of matrix A
vector
Rows of matrix A, specified as a vector. A is an infinitely tall matrix of streaming data. If B is single or double, A must be the same data type as B. If A is a fixedpoint data type, A must be signed, use binarypoint scaling, and have the same word length as B. Slopebias representation is not supported for fixedpoint data types.
Data Types: single
 double
 fixed point
Complex Number Support: Yes
B — Matrix B
matrix  vector
Matrix B, specified as a vector or a matrix. B is an nbyp matrix where n ≥ 2. If A is single or double, B must be the same data type as A. If B is a fixedpoint data type, B must be signed, use binarypoint scaling, and have the same word length as A. Slopebias representation is not supported for fixedpoint data types.
Data Types: single
 double
 fixed point
validInA — Whether A input is valid
Boolean
scalar
Whether A(i, ;)
input is valid, specified as a Boolean
scalar. This control signal indicates when the data from the
A(i,:)
input port is valid. When this value is
1
(true
) and the readyA
value is 1
(true
), the block captures the values
at the A(i,:)
input port. When this value is 0
(false
), the block ignores the input samples.
After sending a true
validInA
signal, there may be some delay before
readyA
is set to false
. To ensure all data
is processed, you must wait until readyA
is set to
false
before sending another true
validInA
signal.
Data Types: Boolean
validInB — Whether input B is valid
Boolean
scalar
Whether input B
is valid, specified as a Boolean scalar. This
control signal indicates when the data from the B
input port is
valid. When this value is 1
(true
) and the
readyB
value is 1
(true
),
the block captures the values at the B
input port. When this
value is 0
(false
), the block ignores the input
samples.
After sending a true
validInB
signal, there may be some delay before
readyB
is set to false
. To ensure all data
is processed, you must wait until readyB
is set to
false
before sending another true
validInB
signal.
Data Types: Boolean
restart — Whether to clear internal states
Boolean
scalar
Whether to clear internal states, specified as a Boolean scalar. When this value
is 1 (true
), the block stops the current calculation and clears all
internal states. When this value is 0 (false
) and the
validInA
and validInB
values are 1
(true
), the block begins a new subframe.
Data Types: Boolean
Output
X — Matrix X
matrix  vector
Matrix X, returned as a matrix or vector.
Data Types: single
 double
 fixed point
validOut — Whether output data is valid
Boolean
scalar
Whether the output data is valid, returned as a Boolean scalar. This control
signal indicates when the data at the output port X
is valid.
When this value is 1
(true
), the block has
successfully computed a row of X. When this value is
0
(false
), the output data is not
valid.
Data Types: Boolean
readyA — Whether block is ready for input A
Boolean
scalar
Whether the block is ready for input A, returned as a Boolean scalar. This control
signal indicates when the block is ready for new input data. When this value is 1
(true
) and validInA
value is 1
(true
), the block accepts input data in the next time step. When
this value is 0 (false
), the block ignores input data in the next
time step.
After sending a true
validInA
signal, there may be some delay before
readyA
is set to false
. To ensure all data
is processed, you must wait until readyA
is set to
false
before sending another true
validInA
signal.
Data Types: Boolean
readyB — Whether block is ready for input B
Boolean
scalar
Whether the block is ready for input B, returned as a Boolean scalar. This control
signal indicates when the block is ready for new input data. When this value is 1
(true
) and validInB
value is 1
(true
), the block accepts input data in the next time step. When
this value is 0 (false
), the block ignores input data in the next
time step.
After sending a true
validInB
signal, there may be some delay before
readyB
is set to false
. To ensure all data
is processed, you must wait until readyB
is set to
false
before sending another true
validInB
signal.
Data Types: Boolean
Parameters
Number of columns in matrix A and rows in matrix B — Number of columns in matrix A and rows in matrix B
4
(default)  positive integervalued scalar
Number of columns in matrix A and rows in matrix B, specified as a positive integervalued scalar.
Programmatic Use
Block Parameter:
n 
Type: character vector 
Values: positive integervalued scalar 
Default:
4 
Number of columns in matrix B — Number of columns in matrix B
1
(default)  positive integervalued scalar
Number of columns in matrix B, specified as a positive integervalued scalar.
Programmatic Use
Block Parameter:
p 
Type: character vector 
Values: positive integervalued scalar 
Default:
1 
Forgetting factor — Forgetting factor applied after each row of the matrix is factored
0.99 (default)  real positive scalar
Forgetting factor applied after each row of the matrix is factored, specified as a real positive scalar. The output is updated as each row of A is input indefinitely.
Programmatic Use
Block Parameter:
forgettingFactor 
Type: character vector 
Values: positive integervalued scalar 
Default:
0.99 
Regularization parameter — Regularization parameter
0 (default)  real nonnegative scalar
Regularization parameter, specified as a nonnegative scalar. Small, positive values of the regularization parameter can improve the conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance of the estimate often results in a smaller mean squared error when compared to leastsquares estimates.
Programmatic Use
Block Parameter:
regularizationParameter 
Type: character vector 
Values: real nonnegative scalar 
Default:
0 
Output datatype — Data type of output matrix X
fixdt(1,18,14)
(default)  double
 single
 fixdt(1,16,0)
 <data type expression>
Data type of the output matrix X, specified as
fixdt(1,18,14)
, double
,
single
, fixdt(1,16,0)
, or as a userspecified
data type expression. The type can be specified directly, or expressed as a data type
object such as Simulink.NumericType
.
Programmatic Use
Block Parameter:
OutputType 
Type: character vector 
Values:
'fixdt(1,18,14)'  'double' 
'single'  'fixdt(1,16,0)' 
'<data type expression>' 
Default:
'fixdt(1,18,14)' 
Tips
Use
fixed.forgettingFactor
to compute the forgetting factor, α, for an infinite number of rows with the equivalent gain of a matrix with m rows.Use
fixed.forgettingFactorInverse
to compute the number of rows, m, of a matrix with equivalent gain corresponding to forgetting factor α.
Algorithms
Qless QR Decomposition with Forgetting Factor
The Complex PartialSystolic Matrix Solve Using Qless QR Decomposition with Forgetting Factor block implements the following recursion to compute the uppertriangular factor R of continuously streaming nby1 row vectors A(k,:) using forgetting factor α. It's as if matrix A is infinitely tall. The forgetting factor in the range 0 < α < 1 prevents it from integrating without bound.
$$\begin{array}{c}{R}_{0}=\mathrm{zeros}(n,n)\\ \left[\sim ,{R}_{1}\right]=\mathrm{qr}\left(\left[\begin{array}{c}{R}_{0}\\ A\left(1,:\right)\end{array}\right],0\right)\\ {R}_{1}=\alpha {R}_{1}\\ \left[\sim ,{R}_{2}\right]=\mathrm{qr}\left(\left[\begin{array}{c}{R}_{1}\\ A\left(2,:\right)\end{array}\right],0\right)\\ {R}_{2}=\alpha {R}_{2}\\ \vdots \\ \left[\sim ,{R}_{k}\right]=\mathrm{qr}\left(\left[\left[\begin{array}{c}{R}_{k1}\\ A\left(k,:\right)\end{array}\right]\right],0\right)\\ {R}_{k}=\alpha {R}_{k}\\ \vdots \end{array}$$
Qless QR Decomposition with Forgetting Factor and Tikhonov Regularization
The output X_{k} after processing the k^{th} input A(k,:) is computed using the following iteration.
$$\begin{array}{c}{R}_{0}=\lambda {I}_{n}\\ \left[~,{R}_{1}\right]=\mathrm{qr}\left(\left[\begin{array}{c}{R}_{0}\\ A\left(1,:\right)\end{array}\right],0\right)\\ {R}_{1}=\alpha {R}_{1}\\ {X}_{1}={R}_{1}\backslash \left(R{\text{'}}_{1}\backslash B\right)\\ \left[~,{R}_{2}\right]=\mathrm{qr}\left(\left[\begin{array}{c}{R}_{1}\\ A\left(2,:\right)\end{array}\right],0\right)\\ {R}_{2}=\alpha {R}_{2}\\ {X}_{2}={R}_{2}\backslash \left(R{\text{'}}_{2}\backslash B\right)\\ \vdots \\ \left[~,{R}_{k}\right]=\mathrm{qr}\left(\left[\begin{array}{c}{R}_{k1}\\ A\left(k,:\right)\end{array}\right],0\right)\\ {R}_{k}=\alpha {R}_{k}\\ {X}_{k}={R}_{k}\backslash \left(R{\text{'}}_{k}\backslash B\right)\\ \vdots \end{array}$$
This is mathematically equivalent to computing A'_{k}A_{k}X = B, where A_{k} is defined as follows, though the block never actually creates A_{k}.
$${A}_{k}=\left[\begin{array}{c}{\alpha}^{k}\lambda {I}_{n}\\ \left[\begin{array}{cccc}{\alpha}^{k}& & & \\ & {\alpha}^{k1}& & \\ & & \ddots & \\ & & & \alpha \end{array}\right]A\left(1:k,:\right)\end{array}\right]$$
Forward and Backward Substitution
When an upper triangular factor is ready, then forward and backward substitution are computed with the current input B to produce output X.
$$X={R}_{k}\backslash \left({R}_{k}^{\text{'}}\backslash B\right)$$
Choosing the Implementation Method
Systolic implementations prioritize speed of computations over space constraints, while burst implementations prioritize space constraints at the expense of speed of the operations. The following table illustrates the tradeoffs between the implementations available for matrix decompositions and solving systems of linear equations.
Implementation  Throughput  Latency  Area 

Systolic  C  O(n)  O(mn^{2}) 
PartialSystolic  C  O(m)  O(n^{2}) 
PartialSystolic with Forgetting Factor  C  O(n)  O(n^{2}) 
Burst  O(n)  O(mn)  O(n) 
Where C is a constant proportional to the word length of the data, m is the number of rows in matrix A, and n is the number of columns in matrix A.
For additional considerations in selecting a block for your application, see Choose a Block for HDLOptimized FixedPoint Matrix Operations.
AMBA AXI Handshake Process
This block uses the AMBA AXI handshake protocol [1]. The valid/ready
handshake process is used to transfer data and control information. This twoway control mechanism allows both the manager and subordinate to control the rate at which information moves between manager and subordinate. A valid
signal indicates when data is available. The ready
signal indicates that the block can accept the data. Transfer of data occurs only when both the valid
and ready
signals are high.
Block Timing
The PartialSystolic Matrix Solve Using Qless QR Decomposition with Forgetting Factor blocks accept matrix A rowbyrow and matrix B as a single vector. After accepting the first valid pair of A and B matrices, the block outputs the X matrices row by row continuously.
For example, assume that the input A matrix is 3by3. Additionally
assume that validIn
asserts before ready
, meaning that
the upstream data source is faster than the QR decomposition.
In the figure,
A1r1
is the first row of the first A matrix,A1r2
is the second row of the first A matrix, and so on.validIn
toready
— From a successful A row input to the block being ready to accept the next row.validOut
tovalidOut
— Because the Forward Backward Substitution block runs continuously, it generates output at a constant rate. This is the delay between two adjacent valid outputs.Last row
validIn
tovalidOut
— From the last m^{th} row input to the block starting to output the solution.This block is always ready to accept B matrices, so
readyB
is always asserted.
The following table provides details of the timing for the PartialSystolic Matrix Solve Using Qless QR Decomposition with Forgetting Factor blocks.
Block  Operation  validIn to ready (cycles)  validOut to validOut
(cycles)  n^{th} Row
validIn to validOut
(cycles) 

Real PartialSystolic Matrix Solve Using Qless QR Decomposition with Forgetting Factor  Asynchronous  wl + 7  4*n^{2} + 25*n + 5 + 2*n*wl + 2*n*nextpow2(wl)  4*n^{2} + 25*n + 5 + 2*n*wl + 2*n*nextpow2(wl) + (wl + 6)*n + 2 
Complex PartialSystolic Matrix Solve Using Qless QR Decomposition with Forgetting Factor  Asynchronous  wl + 9  4*n^{2} + 25*n + 5 + 2*n*wl + 2*n*nextpow2(wl)  4*n^{2} + 25*n + 5 + 2*n*wl + 2*n*nextpow2(wl) + (wl + 7.5)*2*n + 2 
In the table, m represents the number of rows in matrix A, and n is the number of columns in matrix A. wl represents the word length of A.
If the data type of A is fixed point, then wl is the word length.
If the data type of A is double, then wl is 53.
If the data type of A is single, then wl is 24.
Hardware Resource Utilization
This block supports HDL code generation using the Simulink^{®} HDL Workflow Advisor. For an example, see HDL Code Generation and FPGA Synthesis from Simulink Model (HDL Coder) and Implement Digital Downconverter for FPGA (DSP HDL Toolbox).
In R2022b: The following tables show the post placeandroute resource utilization results and timing summary, respectively.
This example data was generated by synthesizing the block on a Xilinx^{®} Zynq^{®} UltraScale™ + RFSoC ZCU111 evaluation board. The synthesis tool was Vivado^{®} v.2020.2 (win64).
The following parameters were used for synthesis.
Block parameters:
n = 16
p = 1
Matrix A dimension: infby16
Matrix B dimension: 16by1
Input data type:
sfix16_En14
Target frequency: 250 MHz
Resource  Usage  Available  Utilization (%) 

CLB LUTs  334280  425280  78.60 
CLB Registers  261319  850560  30.72 
DSPs  12  4272  0.28 
Block RAM Tile  0  1080  0.00 
URAM  0  80  0.00 
Value  

Requirement  4 ns 
Data Path Delay  3.892 ns 
Slack  0.088 ns 
Clock Frequency  255.62 MHz 
References
[1] "AMBA AXI and ACE Protocol Specification Version E." https://developer.arm.com/documentation/ihi0022/e/AMBAAXI3andAXI4ProtocolSpecification/SingleInterfaceRequirements/Basicreadandwritetransactions/Handshakeprocess
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Slopebias representation is not supported for fixedpoint data types.
HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.
HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.
This block has one default HDL architecture.
General  

ConstrainedOutputPipeline  Number of registers to place at
the outputs by moving existing delays within your design. Distributed
pipelining does not redistribute these registers. The default is

InputPipeline  Number of input pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is

OutputPipeline  Number of output pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is

Supports fixedpoint data types only.
Version History
Introduced in R2020bR2023a: Smart unrolling for improved resource utilization
This block depends on a partialsystolic QR decomposition block. Since 23a, when you update the diagram, the loop which composes the partialsystolic pipeline in the QR decomposition block is unrolled. This updated internal architecture removes dead operations in simulation and generated code, thus requiring fewer hardware resources. This block simulates with clock and bittrue fidelity with respect to library versions of these blocks in previous releases.
R2022a: Support for Tikhonov regularization parameter
The Complex PartialSystolic Matrix Solve Using Qless QR Decomposition with Forgetting Factor block now supports the Tikhonov Regularization parameter.
R2021a: Reduced HDL resource utilization
This block now has an improved algorithm to reduce resource utilization on hardwareconstrained target platforms.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)