# how to make this faster?

3 views (last 30 days)
Yulai Zhang on 13 Aug 2022
Commented: Yulai Zhang on 15 Aug 2022
I have A and B two large arrays of the same size, say, 2000 by 2000 by 5000 in size.
The operation below is very slow and uses lot of memory:
A(B==1)=2;
Is there a better way of doing this?
Thanks!
dpb on 13 Aug 2022
Could sparse arrays possibly help here?
Or possibly processing smaller chunks at a time could actually help throughput rather than trying to address the whole thing in one big bite?

Walter Roberson on 13 Aug 2022
The programming model of
A(B==1)=2;
is as if it first compares each element of B to 1, creating a temporary logical array, and then calls the subsassgin function to assign 2 to the locations in A referred to by the logical array.
That is, the programming model calls for B==1 to be completely completed before the assignment starts. It does not internally optimize the code to a loop along the lines of
for K = 1 : numel(A)
if B(K) == 1; A(K) = 2; end
end
except in some internal programming language (generated machine code for example)
Would it be possible in theory to detect that the indexing expression is "simple" and optimize to this kind of loop? Perhaps. But it would have to be fairly constrained. For example,
A = zeros(3,5,'sym');
syms B
A(A <= 0) = -1
A =
A(end) = B
A =
try
A(A <= 0) = -2
catch ME
fprintf('assignment failed\n')
lasterror
end
assignment failed
ans = struct with fields:
message: 'Error using evalin↵Unrecognized function or variable 'x'.' identifier: 'MATLAB:UndefinedFunction' stack: [5×1 struct]
A
A =
The error message is wacky (should be an error about being unable to decide whether the comparison is true.
The important point here is to notice that A did not get changed by the failed assignment: that in the general case you have to fully evaluate the indexing expression first in case it fails.
In the simple case of a comparison of a double array to a constant, there cannot be a failure (provided the double array does not exceed the size of the original matrix... but there can be a failure if the double array is larger...)
So optimization would sometimes be possible, but mostly not.
What you could do in order to reduce the memory load is to process in chunks, such as a group of 10 three dimensional panes at a time. However if you do that then in order to get the assignment right you would have to calculate the locations of all of the indices using find() and sub2ind() -- because there is no way to ask for something like
A(logical2DArray, [], 5)
meaning that you wanted the logical2DArray to expand to indicate the first two dimensions.
It would certainly be possible to reduce memory pressure using these techniques, but it would probably not be faster if you do have available memory.
##### 2 CommentsShowHide 1 older comment
Yulai Zhang on 15 Aug 2022
Thanks for your time and ideas.
Yes, I think for this particular case the one-at-a-time loop might be a good option since it does not need much memory.

Jan on 14 Aug 2022
Some ideas:
A = rand(2000, 2000, 5000);
B = randi([0,3], 2000, 2000, 5000);
tic;
A(B==1)=2;
toc
tic;
for k = 1:numel(A)
if B(k)==1
A(k) = 2;
end
end
toc
tic;
parfor k = 1:numel(A)
if B(k)==1
A(k) = 2;
end
end
toc
tic
A = reshape(A, 4e6, 5000);
B = reshape(B, 4e6, 5000);
parfor k = 1:500
a = A(:, k);
a(B(:, k)==1) = 2;
A(:, k) = a;
end
toc
##### 1 CommentShowHide None
Yulai Zhang on 15 Aug 2022
let me try...
Thanks a lot!