# Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

3 views (last 30 days)
Royi Avital on 20 Apr 2020
Commented: Royi Avital on 24 Apr 2020
Given a matrix X and its vector form I am after the most efficient way to build the matrices L and U which extracts the lower and upper triangle from X.
So in MATLAB code it would be something like that:
clear();
numRows = 3;
numCols = numRows;
mX = randn(numRows, numCols);
vX = mX(:);
% Lower Triangle are indices 2, 3, 6
mL = [ 0, 1, 0, 0, 0, 0, 0, 0, 0 ; ...
0, 0, 1, 0, 0, 0, 0, 0, 0 ; ...
0, 0, 0, 0, 0, 1, 0, 0, 0 ];
% Upper Triangle are indices 4, 7, 8
mU = [ 0, 0, 0, 1, 0, 0, 0, 0, 0 ; ...
0, 0, 0, 0, 0, 0, 1, 0, 0 ; ...
0, 0, 0, 0, 0, 0, 0, 1, 0 ];
assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));
assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));
I am after sparse represenation of mU and mL in the most efficient way.
My current implementation is given by:
function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
case(INCLUDE_DIAGONAL)
numElements = 0.5 * numRows * (numRows + 1);
diagIdx = 0;
case(EXCLUDE_DIAGONAL)
numElements = 0.5 * (numRows - 1) * numRows;
diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
elmntIdx = 0;
for jj = 1:numRows
for ii = (jj + diagIdx):numRows
elmntIdx = elmntIdx + 1;
vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
end
end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
elmntIdx = numElements + 1;
for jj = numRows:-1:1
for ii = (jj - diagIdx):-1:1
elmntIdx = elmntIdx - 1;
vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
end
end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end
Is there a more efficient way to generate vJ without extensive allocation of memory (In order to allow generating really large matrices)?
Thank You.

Royi Avital on 23 Apr 2020
I am not sure what you mean. But the mL and mU are exactly that. Each of their rows extracts a single element of vectroized matrix. I guess by T_i(:) you meant for the triangle matrix where single index of the trinagle is 1 and the rest is zero. This is exactly mU and mL from above.
Matt J on 23 Apr 2020
But that would mean your constraints are of the form mL*X(:)<=b. But since each row of mL contains only a single non-zero element, this means the constraint is equivalent to a simple bound X(j)<=b. In Matlab, you would never have to construct a matrix to represent such a constraint. You would use the vector input arguments lb and ub to specify those. I assume Gurobi has something similar.
Royi Avital on 23 Apr 2020
@Matt, I know that. Whenever I can use other features of the solver I user. I have cases I need those extractors in Matrix Form. I appericiate the dialogue. I think other who will read it will gain something. I still hope someone will bring a different point of view to the pattern of vJ. Though I guess @James' solution as practically as good as it gets (Also appericate if there is something to make it even faster).

James Tursa on 22 Apr 2020
Edited: James Tursa on 22 Apr 2020
Here is a mex routine that generates the sparse double matrices mL and mU directly, so no wasted memory in creating them. Seems to run about 3x-5x faster than m-code for somewhat large sizes.
/* S = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag)
*
* S = double sparse matrix
* numRows = integer > 0
* triangleFlag = 1 , extract lower triangle
* 2 , extract upper triangle
* diagFlag = 1 , include diagonal
* 2 , exclude diagonal
* where
*
* M = an numRows X numRows matrix of non-zero terms
* assert(isequal(S * M(:), mX(logical(tril(M, -1))))); % for lower
* assert(isequal(S * M(:), mX(logical(triu(M, 1))))); % for upper
*
* Programmer: James Tursa
* Date: 2020-April-22
*/
#include "mex.h"
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize numRows, triangleFlag, diagFlag, numElements;
mwIndex *Ir, *Jc;
mwIndex i, j, k, m;
double *pr;
if( nrhs != 3 || !mxIsNumeric(prhs) || !mxIsNumeric(prhs) || !mxIsNumeric(prhs) ||
mxGetNumberOfElements(prhs) != 1 || mxGetNumberOfElements(prhs) != 1 ||
mxGetNumberOfElements(prhs) != 1 ) {
mexErrMsgTxt("Need three numeric scalar inputs");
}
if( nlhs > 1 ) {
mexErrMsgTxt("Too many outputs");
}
numRows = mxGetScalar(prhs);
triangleFlag = mxGetScalar(prhs);
diagFlag = mxGetScalar(prhs);
if( numRows < 1 ) {
mexErrMsgTxt("Invalid numRows, should be > 0");
}
if( triangleFlag != 1 && triangleFlag != 2 ) {
mexErrMsgTxt("Invalid triangleFlag, should be 1 or 2");
}
if( diagFlag != 1 && diagFlag != 2 ) {
mexErrMsgTxt("Invalid diagFlag, should be 1 or 2");
}
if( diagFlag == 1 ) {
numElements = numRows * (numRows + 1) / 2; /* include diagonal */
} else {
numElements = (numRows - 1) * numRows / 2; /* exclude diagonal */
}
plhs = mxCreateSparse(numElements, numRows*numRows, numElements, mxREAL);
pr = (double *) mxGetData(plhs);
Ir = mxGetIr(plhs);
Jc = mxGetJc(plhs);
Jc = 0;
diagFlag--;
k = 0;
m = 1;
if( triangleFlag == 1 ) { /* Lower */
for( j=0; j<numRows; j++ ) {
for( i=0; i<numRows; i++ ) {
if( i >= j+diagFlag ) {
*pr++ = 1.0;
*Ir++ = k++;
Jc[m] = Jc[m-1] + 1;
} else {
Jc[m] = Jc[m-1];
}
m++;
}
}
} else { /* Upper */
for( j=0; j<numRows; j++ ) {
for( i=0; i<numRows; i++ ) {
if( i+diagFlag <= j ) {
*pr++ = 1.0;
*Ir++ = k++;
Jc[m] = Jc[m-1] + 1;
} else {
Jc[m] = Jc[m-1];
}
m++;
}
}
}
}
You mex the routine as follows (you need a supported C compiler installed):
mex GenerateTriangleExtractorMatrixMex.c
And some test code:
% GenerateTriangleExtractorMatrix_test.m
n = 300;
disp('m-code timing')
tic
GenerateTriangleExtractorMatrix(10000,1,1);
toc
disp('mex code timing')
tic
GenerateTriangleExtractorMatrixMex(10000,1,1);
toc
for k=1:n
numRows = ceil(rand*5000+100);
numCols = numRows;
triangleFlag = (rand<0.5) + 1;
diagFlag = (rand<0.5) + 1;
Mm = GenerateTriangleExtractorMatrix(numRows,triangleFlag,diagFlag);
Mx = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag);
if( ~isequal(Mm,Mx) )
error('Not equal');
end
end
disp('Random tests passed')
With a sample run:
>> GenerateTriangleExtractorMatrix_test
m-code timing
Elapsed time is 9.964882 seconds.
mex code timing
Elapsed time is 1.901741 seconds.
Random tests passed

Show 1 older comment
James Tursa on 22 Apr 2020
"... I was not aware how easy it is to work with sparse Matrices in Mex with MATLAB ..."
Well, I would not say it was easy ... I just happen to have a lot of experience with them so I know how to write correct C code for it. What is easy to do is mess it up and create invalid sparse matrices.
Regarding non-MATLAB code, you could write generic C code for the above sparse format without it being tied to an mxArray. Or it wouldn't be too hard to morph it into a different sparse format. Just depends on what you need (or more precisely, what the optimizer code will accept).
Royi Avital on 22 Apr 2020
@James, I meant I want to use all the abstractions of the MATLAB C API. I just want to use it in my own C files. Not for MEX but for general computing. Yet I guess MATLAB blocks that kind of use.
Royi Avital on 23 Apr 2020
By the way, I tried optimizing the code:
if( triangleFlag == 1 ) { // Lower Triangle
for( jj = 1; jj < numRows + 1; jj++ ) {
for( ii = 1; ii < jj + diagFlag; ii++ ) {
ll++;
Jc[ll] = Jc[ll - 1];
}
for( ii = jj + diagFlag; ii < numRows + 1; ii++ ) {
ll++;
Jc[ll] = Jc[ll - 1] + 1;
vV[kk] = 1.0;
Ir[kk] = kk;
kk++;
}
}
} else { // Upper Triangle
for( jj = 1; jj < numRows + 1; jj++ ) {
for( ii = 1; ii < jj + 1 - diagFlag; ii++ ) {
ll++;
Jc[ll] = Jc[ll - 1] + 1;
vV[kk] = 1.0;
Ir[kk] = kk;
kk++;
}
for( ii = jj + 1 - diagFlag; ii < numRows + 1; ii++ ) {
ll++;
Jc[ll] = Jc[ll - 1];
}
}
}
But for some reason even removing the branching inside the loop didn't improve results.
Really Nice! If nothing comes up I will mark this as an answer. Thank You!

Matt J on 23 Apr 2020
Edited: Matt J on 23 Apr 2020
Another approach to consider is to use my MatrixObj class
to construct an object that has the same effect as the operations mL*X and mL.'*Y, but doesn't require you to actually build the matrix,
N=5000;
tic;
mL0=GenerateTriangleExtractorMatrix( N, 1, 2);
toc
%Elapsed time is 0.678702 seconds.
tic;
B=tril(true(N),-1);
Bd=double(B(:));
mL=MatrixObj;
mL.Params.B=B;
mL.Params.Bd=Bd;
mL.Ops.mtimes=@(obj,z) z(obj.Params.B);
mL.Trans.mtimes=@mtimesT;
toc;
%Elapsed time is 0.086228 seconds.
function out=mtimesT(obj,z)
out=obj.Params.Bd;
out(obj.Params.B)=z;
end
In addition to requiring less time to construct, you can verify that it gives the same results as multiplications with mL and mL.',
>> X=rand(N^2,1); isequal(mL0.'*(mL0*X),mL.'*(mL*X))
ans =
logical
1
but with considerably less memory consumption:
>> whos mL mL0
Name Size Kilobytes Class Attributes
mL 1x1 219739 MatrixObj
mL0 12497500x25000000 390586 double sparse

#### 1 Comment

Royi Avital on 24 Apr 2020
This is really nice. Thank you for the effort!

Royi Avital on 21 Apr 2020
My current solution:
function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
case(INCLUDE_DIAGONAL)
numElements = 0.5 * numRows * (numRows + 1);
diagIdx = 0;
case(EXCLUDE_DIAGONAL)
numElements = 0.5 * (numRows - 1) * numRows;
diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
elmntIdx = 0;
for jj = 1:numRows
for ii = (jj + diagIdx):numRows
elmntIdx = elmntIdx + 1;
vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
end
end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
elmntIdx = numElements + 1;
for jj = numRows:-1:1
for ii = (jj - diagIdx):-1:1
elmntIdx = elmntIdx - 1;
vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
end
end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end
I like the memory allocation is kept to a minimum.
I wonder if there is a more efficient way to generate vJ. It is trivial to remove the inner loop and just count the number of elements yet in MATLAB it will mean each iteration will allocate memory (As we don't have iterators).

Tommy on 21 Apr 2020
The two methods are fairly similar - I also like that yours minimizes memory allocation. I ran a few simple fun tests:  I didn't dare try higher than 20,000 for numRows. It seems that your code may possibly perform better at higher values of numRows. In the second case (calculating both the upper and lower triangles) I had your code running both sets of for loops, one after the other (shown in red). In green is the result from your code if only the first set of loops runs, and you recognize that vJ for one triangle is easy to determine if you have vJ for the other triangle (N^2+1-flip(vJ)). So the only thing I'll conclude from this is, if you will eventually calculate both the lower and upper triangle matrices for a given size, it might be better to calculate them together and only find vJ once. I suppose it depends on how expensive N^2+1-flip(vJ) is.
Royi Avital on 21 Apr 2020
@Tommy, Really liked your analysis. Yes, when dealing with sparse matrices the whole point it making sure allocation is kept to minimum. I agree if one wants both, it is better to do the trick you mentioned.
Let's see if someone can think on a different pattern to populate vJ which is more efficient.