how to make this faster?

11 views (last 30 days)
Yulai Zhang
Yulai Zhang on 13 Aug 2022
Commented: Yulai Zhang on 15 Aug 2022
I have A and B two large arrays of the same size, say, 2000 by 2000 by 5000 in size.
The operation below is very slow and uses lot of memory:
A(B==1)=2;
Is there a better way of doing this?
Thanks!
  5 Comments
Jan
Jan on 13 Aug 2022
@Yulai Zhang: Please mention, what the type of A and B is. Is B logical or double? Can you work with INT8 instead?
Even on a machine with a lot of RAM (1 TB seems sufficient for your problem), writing 150 GB of data needs some time.
dpb
dpb on 13 Aug 2022
Could sparse arrays possibly help here?
Or possibly processing smaller chunks at a time could actually help throughput rather than trying to address the whole thing in one big bite?

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 13 Aug 2022
The programming model of
A(B==1)=2;
is as if it first compares each element of B to 1, creating a temporary logical array, and then calls the subsassgin function to assign 2 to the locations in A referred to by the logical array.
That is, the programming model calls for B==1 to be completely completed before the assignment starts. It does not internally optimize the code to a loop along the lines of
for K = 1 : numel(A)
if B(K) == 1; A(K) = 2; end
end
except in some internal programming language (generated machine code for example)
Would it be possible in theory to detect that the indexing expression is "simple" and optimize to this kind of loop? Perhaps. But it would have to be fairly constrained. For example,
A = zeros(3,5,'sym');
syms B
A(A <= 0) = -1
A = 
A(end) = B
A = 
try
A(A <= 0) = -2
catch ME
fprintf('assignment failed\n')
lasterror
end
assignment failed
ans = struct with fields:
message: 'Error using evalin↵Unrecognized function or variable 'x'.' identifier: 'MATLAB:UndefinedFunction' stack: [5×1 struct]
A
A = 
The error message is wacky (should be an error about being unable to decide whether the comparison is true.
The important point here is to notice that A did not get changed by the failed assignment: that in the general case you have to fully evaluate the indexing expression first in case it fails.
In the simple case of a comparison of a double array to a constant, there cannot be a failure (provided the double array does not exceed the size of the original matrix... but there can be a failure if the double array is larger...)
So optimization would sometimes be possible, but mostly not.
What you could do in order to reduce the memory load is to process in chunks, such as a group of 10 three dimensional panes at a time. However if you do that then in order to get the assignment right you would have to calculate the locations of all of the indices using find() and sub2ind() -- because there is no way to ask for something like
A(logical2DArray, [], 5)
meaning that you wanted the logical2DArray to expand to indicate the first two dimensions.
It would certainly be possible to reduce memory pressure using these techniques, but it would probably not be faster if you do have available memory.
  2 Comments
Walter Roberson
Walter Roberson on 13 Aug 2022
The straightforward one-at-a-time loop would use the least memory and if you used linear indexing the indexing calculations would not be too bad, but it would be slow.
Hmmm... perhaps this might not be too bad.
PerGroup = 1e7; %1e7 should give 2000 iterations
for K = 1 : PerGroup : numel(A)-PerGroup+1
start = (K-1)*PerGroup;
idx = find(B(start+1:K*PerGroup) == 1);
A(start + idx) = 1;
end
Yulai Zhang
Yulai Zhang on 15 Aug 2022
Thanks for your time and ideas.
Yes, I think for this particular case the one-at-a-time loop might be a good option since it does not need much memory.

Sign in to comment.


Jan
Jan on 14 Aug 2022
Some ideas:
A = rand(2000, 2000, 5000);
B = randi([0,3], 2000, 2000, 5000);
tic;
A(B==1)=2;
toc
tic;
for k = 1:numel(A)
if B(k)==1
A(k) = 2;
end
end
toc
tic;
parfor k = 1:numel(A)
if B(k)==1
A(k) = 2;
end
end
toc
tic
A = reshape(A, 4e6, 5000);
B = reshape(B, 4e6, 5000);
parfor k = 1:500
a = A(:, k);
a(B(:, k)==1) = 2;
A(:, k) = a;
end
toc

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!