Unexpected speed decrease of 2D Fourier Transform on GPU when iFFTed
3 views (last 30 days)
Show older comments
I am applying a first FFT2 on a stack of images, croping a part of it, and iFFT2 this part:
For example on GPU: FFT2(1920*1240*30 (single) ) -> crop to 320*207*30 (single) -> iFFT2(320*207*30 (single) )
1920/6=320
1240/6=207
Here you may observe the time of execution, normalized to the number of single data processed, for each function:
Note that the yellow line (FFT2+crop1/6+iFFT2) is more than an order of magnitude slower than the purple line which has 36 more data to process with iFFT2 !
Any idea on what is happening here?
Here is the script I have used:
clear
n=10;
cx=1920;
cy=1240;
FPT=2:5:50;
fpt=size(FPT,2);
b=zeros(1,fpt);
for kk=1:8
for ii=1:fpt
ii
I=gpuArray(single(rand(cy,cx,FPT(1,ii))));
Ia=gpuArray(single(rand(round(cy/6),round(cx/6),FPT(1,ii))+1i.*rand(round(cy/6),round(cx/6),FPT(1,ii))));
mask=zeros(cy,cx,FPT(1,ii));
mask(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),round(cx/2)-round(cx/12):round(cx/2)+round(cx/12))...
=(ones(size(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),2),size(round(cx/2)-round(cx/12):round(cx/2)+round(cx/12),2)));
mask=gpuArray(single(mask));
tic
for jj=1:n
switch kk
case 1
tic
B=fft2(I);
case 2
tic
B=fft2(I);
C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),...
((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:);
case 3
tic
B=fft2(I);
C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),...
((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:);
D=ifft2(C);
case 4
tic
B=fft2(I);
C=ifft2(B);
case 5
tic
B=fft2(I);
C=B.*mask;
D=ifft2(C);
case 6
tic
B=fft2(I);
C=B.*mask;
D=ifft2(C);
E1=imresize(abs(D),1/6);
E2=imresize(angle(D),1/6);
case 7
tic
C=fft2(I);
B=ifft2(Ia);
case 8
tic
B=ifft2(Ia);
end
end
b(1,ii)=toc/n; % b is the time of execution normalized to
%the amount of data and the number of time a case has been evaluated
end
hold on
plot(b)
clear A B C D I E1 E2
end
b is the variable plotted in the above graphic.
My graphic card is the GeForce RTX 2080 Ti.
Any help will be appreciated.
Thanks,
Tual
0 Comments
Accepted Answer
Joss Knight
on 8 Jun 2019
I modified your code inserting wait(gpuDevice) before each tic and toc and got a much more sensible graph:
0 Comments
More Answers (1)
Bruno Luong
on 3 Jun 2019
If you want a fast FFT, make your data length power of 2, or product of small integers.
166 is bad since the prime factor is 2 * 83..
See Also
Categories
Find more on Fourier Analysis and Filtering in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!