# Processing timer for Sparse matrix is too long?

1 view (last 30 days)
Nghia Nguyen Thanh on 22 Aug 2017
Edited: Cedric Wannaz on 19 Sep 2017
Dear All,
I am a new user with Matlab, and I meet a problem with Sparse matrix.
Time for processing with Sparse matrix is very long.
Here is my code:
lab=randi([1 7],1,100000);
M = sparse(100000,100000);
for i = 1:100000
for j = (i+1):100000
if lab(i,1) == lab(j,1)
M(j,i) = 1;
else
M(j,i) = 0;
end
end
end
Do I have any method for reduce timing process? Thanks and Best regard!
Tim Berk on 19 Sep 2017
Dear Nghia,
Looping through large arrays is very slow, especially in what is called 'nested loops' as you are using. Your nested loop requires 100000*100000/2 calculations (the 1/2 because of the (i+1) as starting value for j).
Lets assume that evaluating
if lab(1,i) == lab(1,j)
M(i,j) = 1;
else M(j,i) = 0;
end
Takes 10 micro seconds (which I think is quite a low estimate), than your script would take 1E5*1E5*0.5*1E-5 = 50000 seconds (14 hours).
Hope this clarifies why it is slow.
Often this can be solved by vectorization as explained here: https://uk.mathworks.com/help/matlab/matlab_prog/vectorization.html
Hope this helps.
Cheers,
Tim

Cedric Wannaz on 19 Sep 2017
Edited: Cedric Wannaz on 19 Sep 2017
You won't be able to do it, vectorizing or not.
If random integers are really in 1:7, the probability, when you pick two elements, that one is equal to the other is 1/7. On average, with your lab vector of n=1e6 elements and assuming that you are interested in the lower triangular part of M, this means n^2/2/7 non-zero elements, and hence
>> (16 * (1e6)^2/2/7 + 8 * 1e6 + 8) / 1e12
ans =
1.1429
more than 1.14 TB of RAM to store (just a single version of it).
Are random integers really in 1:7 or is the pool of integers larger, e.g. 1:n ? In the second case, you will gain a lot by vectorizing the approach. Often in these cases that involve sparse matrices, the best approach consists in building indices of non-zero elements ( I and J ), and performing a unique call to SPARSE at the end:
M = sparse( I, J, 1, n, n ) ; % Values are all 1 in your case.

Walter Roberson on 19 Sep 2017
Edited: Walter Roberson on 19 Sep 2017
Doing M(j,i) = 0 is a waste of time on a sparse array unless M(j,i) might already have a non-zero value. sparse() initializes to 0.
M = sparse(100000,100000);
is equivalent to
M = sparse([], [], [], 100000, 100000, 0)
which generates a sparse array that is expected to have 0 occupied cells in it. Every time a cell is written, the sparse array needs to be recreated to allocate room for the new entry. It is a lot more efficient to use
M = spalloc(100000, 100000, N)
where N is an estimate of the number of locations that will eventually be occupied with non-zero values. In your situation you can estimate N as 100000^2/7 which is roughly 1428571429. One element in seven occupancy is not very sparse.
Your inner loop could be vectorized, which could improve performance a lot.