How can I make this code run faster with a sparse matrix
24 views (last 30 days)
Esegboria Osarhemen on 2 Mar 2019
Assuming I have the following code
For my real 'x' data, I have 30164675 values, alot of which a equal, so I should have more zero entries in the matrix except from those on the diagonal. But when I run this code, it takes too long. Am I using the sparse matrix correctly or is there any way to use vectorization and a sparse matrix to make this code faster?
John D'Errico on 2 Mar 2019
Edited: John D'Errico on 2 Mar 2019
How can I say this? You are using sparse in the very worst way possible. Well, you are.
Not only that, but your code will dump HUGE amounts of stuff into the command window at every iteration. So you very, very much need to learn when to terminate your lines of code with a semi-colon.
A simple rule about sparse is you never build such a matrix one element at a time. That forces MATLAB to stuff one new element into the matrix each time through the loop. It is not so inefficient if the matrix is a full one, since MATLAB does not need to shuffle elements around in memory at ach iteration. But a sparse matrix is stored differently, not storing the zeros. So single element insertion into a sparse matrix is terribly slow. NEVER do it that way.
One caveat is that if you used spalloc instead of sparse, and inserted the elements into the array in the proper way, you could reduce the time somewhat. But that would requires you to have a very good knowledge of sparse storage, and it would still be inefficient.
Instead, you need to create a list of the non-zero elements of the sparse matrix in advance., THEN call sparse ONCE with that entire list. (Read the help for sparse. Then read it again. I recall being confused the first time I read it.)
As far as how to find all of the elements in your list that are duplicates, I'd probably start with unique or ismember. The simplest way of course to create the matrix you have is this:
X = randi(10000,[1,20000]); % a random vector
Xs = sparse(X);
A = Xs' == Xs;
Name Size Bytes Class Attributes
A 20000x20000 880062 logical sparse
So I started with a set of 20000 integers, no larger than 10000. That tells me there will be 2 duplicates on average for each. As you can see, it creates a sparse logical matrix of the desired size.
The above code will work as long as you have release R2016b or later. Earlier releases would do it like this:
A = bsxfun(@eq,Xs',Xs);
The trick is to make the vector X sparse, so then the result will also be sparse. This works even though the vector is not truly a sparse one at all, but the final result needs to be so.
Could I have built that matrix using unique or ismember? Probably. But the direct test Xs'==Xs is the solution I would use. And I would NEVER, EVER solve the problem as you tried.