How to perform row operations on a large MATRIX or CELL efficiently?
4 views (last 30 days)
Show older comments
Ashish Keshkamat
on 6 Oct 2017
Answered: Ashish Keshkamat
on 14 Nov 2017
I have a large CELL of the order 18000x4 (Or a MATRIX of the similar order) Each CELL element is a 'name' and therefore a string (In case of Matrix it is a 'number') The operation I need to do is explained in the following example (Please find relevant files attached). Example:
CELL INPUT
'1' '3' '4' []
'1' '3' '5' []
'1' '3' '6' []
'1' '4' '4' []
'1' '4' '5' []
'1' '4' '6' []
'2' '3' '4' []
'2' '3' '5' []
'2' '3' '6' []
'2' '4' '4' []
'2' '4' '5' []
'2' '4' '6' []
'5' '4' [] []
'5' '7' [] []
'6' '8' '4' []
'6' '8' '7' []
'9' '10' [] []
CELL OUTPUT
'1' '4' [] []
'1' '3' '5' []
'1' '3' '6' []
'2' '4' [] []
'2' '3' '5' []
'2' '3' '6' []
'4' '5' [] []
'5' '7' [] []
'4' '6' '8' []
'6' '7' '8' []
'10' '9' [] []
There are 2 main operations done to get the output:
1) Delete the duplicates in the rows. (1 4 4) ---> (1 4) and (2 4 4) ---> (2 4) in the above example.
2) After step (1), compare every row and delete the supersets of the row if any. (1 3 4) (1 4 5) (1 4 6) deleted because they are supersets of (1 4). Similarly, (2 3 4) (2 4 5) (2 4 6) are deleted because they are supersets of (2 4)
I have written a MATLAB function called 'minimize' which does this operation. But, when I need to do this for a large CELL of order 8656x4, MATLAB keeps running and there is no solution after 48hours. I need to get this done for even larger orders of the CELL and within a time frame of 1-2 hours.
Could you please provide any method/suggestion to efficiently solve this?
2 Comments
Carl
on 10 Oct 2017
Edited: Carl
on 10 Oct 2017
Hi Ashish. I see from your .mat files that you have analogous matrices of type 'cell' and 'double'. Just to clarify, is it only slow when performing those operations on the 'cell' data? Or do you see the same performance when operating on the 'double' type matrices as well?
Initial suggestions for improving performance would be to take a vectorized, rather than loop-based, approach to copying data. For example, change:
for k = 1:size(temp_name,2)
minimal_row(i,k) = temp_name(k);
end
to
minimal_row(i,:) = temp_name;
Or,
for j = 1:size(number_matrix,2)
if number_matrix(i,j) == 0
break;
else
temp_name(j) = number_matrix(i,j) ;
end
end
to
temp_name = number_matrix(i,:);
temp_name = temp_name(temp_name ~= 0)
This type of syntax can be optimized more efficiently.
Accepted Answer
More Answers (1)
Sharath Chandran
on 11 Oct 2017
Edited: Sharath Chandran
on 3 Nov 2017
Hi Ashish,
I reckon, this has more to do with your Algorithm.
Let us consider a matrix of dimension 'm x n' ('m' rows & 'n' columns). The time complexity of the first operation that you perform is O(m * n). This roughly equals to '34624' number of computations which is reasonable number.
On the other hand the time complexity of the second operation that you are performing is O(m^2 * n), which roughly equals to at least '299705344' number of computations, after plugging in the value 'm=8656' and 'n=4'. This is clearly the bottleneck of your current algorithm and hence it needs to be optimized.
Alternate approach:
1. After performing step 1, store each row in a Hash-Map - O(m) (let us ignore the time-complexity of storing data in Hash-Map for the time being).
2. For every row, generate subsets and check if any subset is present in the Hash-Map. This should not be an overhead as max. no. of subsets for a 4-element row is 2^4-2 = 14. Here we are subtracting '2' as we are not considering the empty set and the set itself - O(2^n).
3. If any subset is already present in the Hash-Map, reject this set (row) since one of its subset already exists.
4. Total time complexity - O(m * 2^n). This roughly equals to at least '138496' number of computations which is a drastic improvement.
I hope this was helpful.
Please comment below in case you have any queries.
_
Regards,
MathWorks Technical Support Team
Note: This is not the exact time complexity analysis as we have not taken into account the complexity of storing data in the Hash-map.
See Also
Categories
Find more on Deploy to C++ Applications Using mwArray API (C++03) in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!