Clear Filters
Clear Filters

Sudden drop in speed for large matrices. RAM full -> Prevention?

3 views (last 30 days)
Dear all,
I am doing image analyses with a toolbox called "PIVlab". At work, we recently bought a new camera with 25 MP resolution (previously only 2 MP). That is very nice, but it also means that the sizes of several matrices increase substantially, and now suddenly, memory managment seems to become a very important task to keep the processing speed high. I identified the operation that seems to slow the code down with very large matrices, and wrote a minimal code example to reproduce the behaviour:
clear all %#ok<CLALL>
close all
clc
% Usually, in PIV, we have input image A and input image B which are
% captured with a short pause in between. These images are then cut into
% small pieces of e.g. 64x64 pixels. Each of these "interrogation areas"
% in image A is then cross-correlated with the same part from image B.
% With the cross-correlation code in PIVlab, it is possible to to this for
% a 3D matrix (saves a lot of time, but apparently runs into RAM problems
% when matrices are large).
counter=0;
stack_sizes=1000:10000:200000 ;%these numbers are fine to demonstrate the effect on a 16 GB RAM laptop
for size_of_the_stack=stack_sizes
%% generate some quick + dirty "particle" image pairs, arranged in a 3D matrix
A=rand(64,64,size_of_the_stack);
A(A<0.98)=0;
A = imgaussfilt(A,0.9);
B = circshift(A,round(rand([1,1])*10)); %displace the second image
%% Quick fix: converting to single already saves 50% of memory, apparently without negative effects.
A=single(A);
B=single(B);
%% do the cross-correlation with FFT
result_conv = zeros(size(A),'single'); %same starting conditions for all repetitions of the loop %#ok<PREALL> %
tic
% The following line of code eats up all the RAM
result_conv = fftshift(fftshift(real(ifft2(conj(fft2(A)).*fft2(B))), 1), 2); %cross-correlation code in PIVlab
counter=counter+1
calc_time(counter)=toc;
time_per_subimage(counter)=calc_time(counter)/size_of_the_stack;
end
%% plot results
bar(stack_sizes,time_per_subimage*1000)
grid on;
xlabel('stack size')
ylabel('time per correlation in ms')
These are my results. The speed suddenly decreases.
A quick look at the task manager already shows that speed decreases in the moment where the cross-correlation takes all RAM and when virtual memory has to be used (indicated by the sudden increasy in SSD activity).
It is obvious that this line of code is the cause for the drop in speed with large matrices:
result_conv = fftshift(fftshift(real(ifft2(conj(fft2(A)).*fft2(B))), 1), 2);
How could I prevent that RAM is filled up and speed drops? Is there a way to anticipate problems, and then maybe split the arrays into smaller chunks, saving them temporarily to the hard disk...?
The problem is slightly more complicated in the real application, because there is a lot of other processing happening, and everything is run in parallel with the parallel computing toolbox... But finding a solution for the above minimal example might already bring me on the right track... Thanks for your input!
  13 Comments
William Thielicke
William Thielicke on 15 Sep 2021
Ok, so this seems to work, now I have the best of both worlds: Fast speed for smaller matrices due to vectorization, and still ok performance for really large matrices with a for-loop:
clear all %#ok<CLALL>
close all
clc
% Usually, in PIV, we have input image A and input image B which are
% captured with a short pause in between. These images are then cut into
% small pieces of e.g. 64x64 pixels. Each of these "interrogation areas"
% in image A is then cross-correlated with the same part from image B.
% With the cross-correlation code in PIVlab, it is possible to to this for
% a 3D matrix (saves a lot of time, but apparently runs into RAM problems
% when matrices are large).
counter=0;
stack_sizes=1000:20000:400000 ;%these numbers are fine to demonstrate the effect on a 16 GB RAM laptop
for size_of_the_stack=stack_sizes
%% generate some quick + dirty "particle" image pairs, arranged in a 3D matrix
A=rand(64,64,size_of_the_stack,'single');
B=A;
%% do the cross-correlation with FFT
result_conv = zeros(size(A),'single'); %same starting conditions for all repetitions of the loop %#ok<PREALL> %
tic
% The following line of code eats up all the RAM
if numel(A)>577536000 %large matrices become slow do to limited RAM, serial processing is faster
disp('serial')
for i=1:size(A,3)
result_conv(:,:,i) = fftshift(fftshift(real(ifft2(conj(fft2(A(:,:,i))).*fft2(B(:,:,i)))), 1), 2); %cross-correlation code in PIVlab
end
else %smaller matrices benefit from parallelization
disp('vectorized')
result_conv = fftshift(fftshift(real(ifft2(conj(fft2(A)).*fft2(B))), 1), 2); %cross-correlation code in PIVlab
end
counter=counter+1
calc_time(counter)=toc;
time_per_subimage(counter)=calc_time(counter)/size_of_the_stack;
end
%% plot results
bar(stack_sizes,time_per_subimage*1000)
grid on;
xlabel('stack size')
ylabel('time per correlation in ms')
Bjorn Gustavsson
Bjorn Gustavsson on 15 Sep 2021
You might cut off some time for the smaller cases if you move the pre-allocation inside the "large-stack" part of the if-condition (you might even remove it if you can run the loop from the last image to the first in the stack), when you assign the output from the fftshift(-correlation-calculation in the small-stack-case it doesn't utilize the pre-allocation (if I've understood things properly...)

Sign in to comment.

Answers (0)

Categories

Find more on Image Processing Toolbox in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!