Why do some calculations like the FFT produce different results when performed on a GPU?

37 views (last 30 days)
I am using the Parallel Computing Tbx and my computer's GPU to speed up calculations, but sometimes, the results are not identical. E.g. FFT produces different results. Why is that?
  8 Comments
Walter Roberson
Walter Roberson on 21 Oct 2022
I am having trouble finding explicit statements but single precision and double precision numbers are given in this link and the double precision is 1/32 of the single precision https://www.electroniclinic.com/nvidia-geforce-rtx-3090-ti-complete-review-with-benchmarks/

Sign in to comment.

Accepted Answer

Mahmoud Hammoud
Mahmoud Hammoud on 9 Feb 2011
Higher-level algorithms like the FFT ultimately boil down to basic arithmetic operations that can yield (acceptably) different results when performed in different environments due to the very nature of floating-point arithmetic.
At the lowest level, the "environment" includes the processor (e.g., Intel vs. AMD chip) to which - all other things held constant - such differences may be attributed. Then comes the compiler which directly impacts how the computations are translated to machine code. That is, even if the calculations exhibit the same order in C code, this is not necessarily the case at the instruction level if two different compilers are used. More or less at the same level are the math libraries which are highly optimized according to processor type and are themselves compiled.
Moreover, these constitutents of the computing environment are not themselves the root cause of the (potential) discrepancies, they rather contribute to the fundamental issue getting manifested, namely the limited amount of precision available (most real numbers cannot be represented using 64 bits). This in turn gives rise to round-off "errors" in the representation which find their way to the end results when the order of operations does eventually get changed due to a different environment. As such, it should come as no surprise when two different algorithms produce slightly different results, even if the environment is the same, since in most cases, both results do not represent the "real" result.
The FFT on the GPU vs. on the CPU is in a sense an extreme case because both the algorithm AND the environment are changed: the FFT on the GPU uses NVIDIA's cuFFT library as Edric pointed out whereas the CPU/traditional desktop MATLAB implementation uses the FFTW algorithm. In addition to being different implementations of the FFT algorithm (i.e., the steps involved in computing the DFT are essentially different), one is written using CUDA which in turn builds on different lower-level (basic) math libaries, while the other uses the libraries of the host platform. The fact that different C compilers are used should also be added to the equation. Given all these interplaying factors, the differences in the results are still very small. As Edric noted, the differences for lower level trigonometric and elementary functions become hardly noticeable.
  2 Comments
Stephen Lange
Stephen Lange on 29 Apr 2013
Great answer! Small amplifications.
1) Per Wikipedia, "Double precision floats deviate from the IEEE 754 standard: round-to-nearest-even is the only supported rounding mode for reciprocal, division, and square root. In single precision, denormals and signalling NaNs are not supported; only two IEEE rounding modes are supported (chop and round-to-nearest even), and those are specified on a per-instruction basis rather than in a control word; and the precision of division/square root is slightly lower than single precision."
2) Standard PC "64 bit" registers are 80 bits wide, so they do much different rounding.
3) Per some guys on the IEEE floating point standards committee (where I lurked back in 2010,) modern parallel, asynchronous computations will not necessarily be repeatable with the same executable on the same box. Modern asynchronous processing means that, per Walter, instruction execution order can't be guaranteed from one run to the next.

Sign in to comment.

More Answers (2)

Edric Ellis
Edric Ellis on 8 Feb 2011
Walter is quite right that any change to the order of operations changes the result. In the case of FFT on the GPU, we use NVIDIA's "CuFFT" library to give a high performance FFT implementation. The highly threaded communicating nature of FFT on the GPU inevitably leads to discrepancies.
In general, we strive to make our GPU algorithms give the numerically consistent "MATLAB answer". For many of the elementwise non-communicating algorithms (sin, cos, plus, ...), we achieve that (within an "eps" or maybe two); but as the complexity of the algorithm increases, so does the discrepancy. (For example, the parallel version of "sum" on the GPU is a vastly different implementation compared to the obvious single-threaded approach).

Walter Roberson
Walter Roberson on 8 Feb 2011
Nearly any change in the exact order of operations used to perform a calculation can result in different outcomes due to precision or round-off limitations.
If exact reproducibility of the calculation in different implementations is important, then you very likely should not be using the parallel processing toolbox -- not unless you have studied Numerical Analysis for a few years.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!