how to make this faster?

Question

Yulai Zhang on 13 Aug 2022

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1778825-how-to-make-this-faster

Commented: Yulai Zhang on 15 Aug 2022

I have A and B two large arrays of the same size, say, 2000 by 2000 by 5000 in size.

The operation below is very slow and uses lot of memory:

A(B==1)=2;

Is there a better way of doing this?

Thanks!

5 Comments
Show 3 older commentsHide 3 older comments

Jan on 13 Aug 2022

@Yulai Zhang: Please mention, what the type of A and B is. Is B logical or double? Can you work with INT8 instead?

Even on a machine with a lot of RAM (1 TB seems sufficient for your problem), writing 150 GB of data needs some time.

dpb on 13 Aug 2022

Could sparse arrays possibly help here?

Or possibly processing smaller chunks at a time could actually help throughput rather than trying to address the whole thing in one big bite?

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 13 Aug 2022

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1778825-how-to-make-this-faster#answer_1026060

Open in MATLAB Online

The programming model of

A(B==1)=2;

is as if it first compares each element of B to 1, creating a temporary logical array, and then calls the subsassgin function to assign 2 to the locations in A referred to by the logical array.

That is, the programming model calls for B==1 to be completely completed before the assignment starts. It does not internally optimize the code to a loop along the lines of

for K = 1 : numel(A)

if B(K) == 1; A(K) = 2; end

end

except in some internal programming language (generated machine code for example)

Would it be possible in theory to detect that the indexing expression is "simple" and optimize to this kind of loop? Perhaps. But it would have to be fairly constrained. For example,

A = zeros(3,5,'sym');

syms B

A(A <= 0) = -1

A =

A(end) = B

A =

try

A(A <= 0) = -2

catch ME

fprintf('assignment failed\n')

lasterror

end

assignment failed

ans = struct with fields:

message: 'Error using evalin↵Unrecognized function or variable 'x'.' identifier: 'MATLAB:UndefinedFunction' stack: [5×1 struct]

A

A =

The error message is wacky (should be an error about being unable to decide whether the comparison is true.

The important point here is to notice that A did not get changed by the failed assignment: that in the general case you have to fully evaluate the indexing expression first in case it fails.

In the simple case of a comparison of a double array to a constant, there cannot be a failure (provided the double array does not exceed the size of the original matrix... but there can be a failure if the double array is larger...)

So optimization would sometimes be possible, but mostly not.

What you could do in order to reduce the memory load is to process in chunks, such as a group of 10 three dimensional panes at a time. However if you do that then in order to get the assignment right you would have to calculate the locations of all of the indices using find() and sub2ind() -- because there is no way to ask for something like

A(logical2DArray, [], 5)

meaning that you wanted the logical2DArray to expand to indicate the first two dimensions.

It would certainly be possible to reduce memory pressure using these techniques, but it would probably not be faster if you do have available memory.

2 Comments
Show NoneHide None

Walter Roberson on 13 Aug 2022

Open in MATLAB Online

The straightforward one-at-a-time loop would use the least memory and if you used linear indexing the indexing calculations would not be too bad, but it would be slow.

Hmmm... perhaps this might not be too bad.

PerGroup = 1e7;     %1e7 should give 2000 iterations
for K = 1 : PerGroup : numel(A)-PerGroup+1
    start = (K-1)*PerGroup;
    idx = find(B(start+1:K*PerGroup) == 1);
    A(start + idx) = 1;
end

Yulai Zhang on 15 Aug 2022

Thanks for your time and ideas.

Yes, I think for this particular case the one-at-a-time loop might be a good option since it does not need much memory.

Sign in to comment.

Answer 2

Jan on 14 Aug 2022

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1778825-how-to-make-this-faster#answer_1026295

Open in MATLAB Online

Some ideas:

A = rand(2000, 2000, 5000);
B = randi([0,3], 2000, 2000, 5000);
tic;
A(B==1)=2;
toc
tic;
for k = 1:numel(A)
   if B(k)==1
      A(k) = 2;
   end
end
toc
tic;
parfor k = 1:numel(A)
   if B(k)==1
      A(k) = 2;
   end
end
toc
tic
A = reshape(A, 4e6, 5000);
B = reshape(B, 4e6, 5000);
parfor k = 1:500
    a = A(:, k);
    a(B(:, k)==1) = 2;
    A(:, k) = a;
end
toc

1 Comment
Show -1 older commentsHide -1 older comments

Yulai Zhang on 15 Aug 2022

let me try...

Thanks a lot!

Sign in to comment.

how to make this faster?

5 Comments
Show 3 older commentsHide 3 older comments

Answers (2)

2 Comments
Show NoneHide None

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

how to make this faster?

5 Comments Show 3 older commentsHide 3 older comments

Answers (2)

2 Comments Show NoneHide None

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

5 Comments
Show 3 older commentsHide 3 older comments

2 Comments
Show NoneHide None

1 Comment
Show -1 older commentsHide -1 older comments