Main Content

Implement Hardware-Efficient QR Decomposition Using CORDIC in a Systolic Array

This example shows how to compute the QR decomposition of matrices using hardware-efficient MATLAB® code in Simulink®.

To solve a system of equations or compute a least-squares solution to the matrix equation AX = B using the QR decomposition, compute R and Q'B, where QR = A, and RX = Q'B. R is an upper triangular matrix and Q is an orthogonal matrix. If you just want Q and R, then set B to be the identity matrix.

In this example, R is computed from matrix A by applying Givens transformations using the CORDIC algorithm. C = Q'B is computed from matrix B by applying the same Givens transformations. The algorithm uses only iterative shifts and additions to perform these computations.

For more information on the algorithm used in this example, see Perform QR Factorization Using CORDIC

Overview

The Simulink model used in this example is:

fxpdemo_real_4x4_systolic_array_QR_model

To enter your own input matrices, A and B, open the block parameters of the corresponding enabled subsystem blocks to the left. After simulation, the model returns the computed output matrices, C = Q'B and R to the workspace. You can specify the number of CORDIC iterations in the block parameters of the transpose(Q)*B, R 4x4 Real CORDIC Systolic Array subsystem. If the inputs are fixed point, then the number of CORDIC iterations must be less than the word length. The accuracy of the computation improves one bit for each iteration, up to the word length of the inputs.

This model will work with fixed-point, double, and single data types.

To see how the algorithm performs the factorization, look under the mask of the transpose(Q)*B, R 4x4 Real CORDIC Systolic Array subsystem. The annotations indicate which rows of the matrix are being operated on to zero-out the sub-diagonal elements. The systolic array is set up for a 4-by-4 matrix A, but can be extended to any size by following the same pattern. This implementation works only with real input matrices.

To see the MATLAB code in the MATLAB Function block that performs the Givens transformations using CORDIC, continue to look under the block masks.

In this example, the number of rows and columns of A must be 4. Matrix B must have 4 rows and any number of columns.

Use QR to solve matrix equation Ax = B

The first step in solving the matrix equation AX = B is to compute RX = Q'B, where R is upper-triangular, Q is orthogonal and Q*R = A.

The following inputs are double-precision floating-point types, so set the number of iterations to be 52, which is the number of bits in the mantissa of double.

format
NumberOfCORDICIterations = 52;
A = 2*rand(4,4)-1;
B = 2*rand(4,4)-1;

Simulate the model to compute R and C = Q'B.

sim fxpdemo_real_4x4_systolic_array_QR_model
R
C
R =

    1.5149   -0.0519    1.7292   -0.3224
         0    0.9593   -0.0259   -0.0879
         0         0    0.2565    1.0888
         0         0         0   -0.6429


C =

    0.5942   -0.2382    0.0676   -0.9370
   -0.8887    0.6146   -0.5758    0.3051
    0.1725    0.7339    0.5409    0.5374
    0.8540    1.1078   -0.2183   -0.5620

Verify that back-substituting with R and C = Q'B gives the same results as MATLAB.

X = R\C
X_should_be = A\B
X =

   -7.1245  -12.1131   -0.6637    1.4236
   -0.8779    0.7572   -0.5511    0.3545
    6.3113   10.1759    0.6673   -1.6155
   -1.3283   -1.7231    0.3396    0.8741


X_should_be =

   -7.1245  -12.1131   -0.6637    1.4236
   -0.8779    0.7572   -0.5511    0.3545
    6.3113   10.1759    0.6673   -1.6155
   -1.3283   -1.7231    0.3396    0.8741

The norm of the difference between built-in MATLAB and the CORDIC QR solution should be small.

norm(X - X_should_be)
ans =

   3.2578e-14

Compute Q and R

To compute Q and R, set B equal to the identity matrix. When B equals the identity matrix, then Q = C'.

NumberOfCORDICIterations = 52;
A = 2*rand(4,4)-1;
B = eye(size(A,1),'like',A);
sim fxpdemo_real_4x4_systolic_array_QR_model

Q = C';

The theoretical QR decomposition is QR==A, so the difference between the computed QR and A should be small.

norm(Q*R - A)
ans =

   2.2861e-15

QR is not unique

The QR decomposition is only unique up to the signs of the rows of R and the columns of Q. You can make a unique QR decomposition by making the diagonal elements of R all positive.

D = diag(sign(diag(R)));
Qunique = Q*D
Runique = D*R
Qunique =

   -0.3086    0.1224   -0.1033   -0.9376
   -0.6277   -0.7636   -0.0952    0.1174
   -0.5573    0.3930    0.7146    0.1559
    0.4474   -0.4975    0.6852   -0.2877


Runique =

    1.4459   -0.8090    0.1547    0.3977
         0    1.1441    0.0809   -0.2494
         0         0    0.8193    0.1894
         0         0         0    0.4836

Then you can compare the computed QR from the model to the builtin MATLAB qr function.

[Q0,R0] = qr(A);
D0 = diag(sign(diag(R0)));
Q0 = Q0*D0
R0 = D0*R0
Q0 =

   -0.3086    0.1224   -0.1033   -0.9376
   -0.6277   -0.7636   -0.0952    0.1174
   -0.5573    0.3930    0.7146    0.1559
    0.4474   -0.4975    0.6852   -0.2877


R0 =

    1.4459   -0.8090    0.1547    0.3977
         0    1.1441    0.0809   -0.2494
         0         0    0.8193    0.1894
         0         0         0    0.4836

Use Fixed-Point for Hardware-Efficient Implementation

Use fixed-point input data types to produce efficient HDL code for ASIC and FPGA devices.

For more information on how to choose fixed-point data types that will not overflow, refer to example Perform QR Factorization Using CORDIC.

You can run many test inputs through the model by making A and B 3-dimensional arrays.

n_test_inputs=100;

The following section defines random inputs for matrices A and B that are scaled between -1 and 1, so set the fixed-point word length to 18 bits and fraction length to 14 bits to allow for growth in the QR factorization and intermediate computations in the CORDIC algorithm.

word_length = 18;
fraction_length = 14;

The best-precision number of CORDIC iterations is the word length minus one. If the number of CORDIC iterations is set to smaller than word_length - 1, then the latency and clock ticks to next ready signal will be shorter, but it will be less accurate. The number of CORDIC iterations should not be set any larger because the generated code does not support shifts greater than the word length of a fixed-point type.

NumberOfCORDICIterations = word_length - 1
NumberOfCORDICIterations =

    17

The random test inputs are concatenated so that at time k, the inputs are A(:,:,k) and B(:,:,k). Each element of A and B is a uniform random variable between -1 and +1.

A = 2*rand(4,4,n_test_inputs)-1;

Choose B to be the identity matrix so Q=C'.

B = eye(4);
B = repmat(B,1,1,n_test_inputs);

Cast A to fixed point, and cast B like A.

A = fi(A,1,word_length,fraction_length);
B = cast(B,'like',A);

Simulate the model

sim fxpdemo_real_4x4_systolic_array_QR_model

Calculate and plot the errors

norm_error = zeros(1,size(R,3));
for k = 1:size(R,3)
    Q_times_R_minus_A = double(C(:,:,k))'*double(R(:,:,k)) - double(A(:,:,k));
    norm_error(k) = norm(Q_times_R_minus_A);
end

The errors should be on the order of 10^-3.

clf
plot(norm_error,'o-')
grid on
title('norm(QR - A)')

%#ok<*NASGU,*NOPTS>