How come i get different output answers with the same matlab version, the same code installed on two different computers?
74 views (last 30 days)
Show older comments
Hi all,
I have written a quite long code to simulate dam break problem, in which a matrix needs to be solved out to get the pressure contour for the water. and i got a problem when i run the code on my desktop and laptop for the comparison purpose. the results obtained from these two computers agree well at a few beginning steps. after around 200 steps, desktop one has the singularity problem when solving the matrix while laptop can keep running. eventually, laptop one stops at 900 steps also due to singularity problem.
I compared the two results one step by one step and found that at first a few steps, the results are same. after a certain timestep, there are small differences in two results. the results are the same up to first 15 digits. it's only different on the 16th digit like 1620.66808491866 with 1620.66808491869. and then the error is accumulating, and become bigger and bigger.
I am pretty sure the codes i am using are the same, the version of matlab are the same, but how come it gives me different answers. Can anyone answer my questions? Thanks in advance.
3 Comments
Ryan Zhao
on 22 Oct 2015
Actually I had similiar experience. But my case is that, changing the version of Matlab (from 64-bit to 32-bit version) so that the two PCs has the same compiler, same matlab codes. But this does not make a difference. Still, my code can run on only one computer. Could you tell me how did you solve this question eventually? Thank you.
Accepted Answer
Mahdi
on 26 May 2014
I am assuming that your problem is ill-conditioned or is very sensitive and depends greatly on the numbers from each step (or the steps before). I believe that the reason that you're getting different results is because you're probably using 32-bit version of MATLAB on one machine and 64-bit on the other.
If you install the same version on both computers (32 on both), I think you will get the same results. Put simply, this change happens because of the number of digits/computations each computer is able to store.
1 Comment
Mahdi
on 26 May 2014
I would also strongly suggest to find a different approach to your problem that is not so ill-conditioned.
More Answers (3)
Walter Roberson
on 22 Oct 2015
Round off error can be different from run to run in some cases:
- Different number of cores when using explicit parallelization
- Different order that results were available in when using explicit parallelization
- With sufficiently large matrices and some patterns of operations, MATLAB may call into highly optimized libraries that use parallelization; with different number of cores or different orders that the cores report in, the results can be different
- When the CPU manufacturers differ, in particular Intel vs AMD, the set of available instructions for high performance operations can differ
- On MS Windows, Intel makes available the MLK library for high performance mathematics; AMD provides a similar library that is not identical
- Within any given CPU manufacturer, instruction sets and instruction timings differ between model lines, affecting the order that cores report results
- Within any given model line of a CPU manufacturer, different designs (e.g. i3 vs i5 vs i7) of the architecture have different instruction timings, affecting the order that cores report results
- Within any one design of a CPU (e.g. i3 vs i5 vs i7), different releases have different microkernels, some of which will be improvements in speed and some of which will be bug fixes
- Within any one design and release of a CPU (e.g., one release of i7), different manufacturering techniques are used, such as copper vs aluminum, 7 micro vs 11 micro; these can result in different instruction timings, affecting the order that different cores report in
- For any one design an implementing technology, the devices are sometimes constructed to shut down or enable portions of the device depending upon load and device temperature, in order to lower device power drain and manage thermal load; this can affect timings and so the order that different cores report in
- different GPUs manufacturers can result in different round off
- different GPUs models within one manufacturer have different memory size and numbers of processing units, resulting in work being dispatched in different orders and gathered in different orders
- The system has to keep processing whatever else is going on, so there are going to be interrupts and process switching that affect the order that results are returned in
I talk about the order that results are returned in because it has to be remembered that floating point operations are not transitive or distributive. 0.1 + 0.2 - 0.3 might not have the same result as 0.1 - 0.3 + 0.2. Especially for matrix inversions, this can make a really big relative difference. (This is one of the reasons why matrix inversions should be avoided.) When you are using multiple cores to do summations, the segment produced by any one core might be identical between runs, but the order the values are added in can vary due to chance circumstances, leading to overall results that differ.
3 Comments
Alan Bindemann
on 16 Jan 2020
Edited: Alan Bindemann
on 16 Jan 2020
We have a Simulink model, coded as an S-Function subsystem, running on two machines, each with Intel processors (One i5 and another a Xeon). When the S-Function is fed identical inputs on each machine we get different results. The differences are initially small, and then grow to noticable levels over time due to the accumulation of errors internal to the S-Function.
It was our understanding that, once coded as an S-Function, different Intel processors would still produce identical floating point results.
Can someone give additional details as to why S-Functions (and models coded out to .dll's) might give different results? We thought that once things were reduced to x86 instructions, we could expect repeatable results, regardless of the processor being used.
Thanks!
Andrew Roscoe
on 23 Oct 2023
Edited: Andrew Roscoe
on 25 Oct 2023
After a lot of digging I finally get to this webpage and document set:
and the pdf version:
It is worth a read.
In particular, you CAN force MATLAB to use AVX2 across all machines if they are all AVX2 capable, so for example the Xeon and other new i7 CPUs will be "restrained" from AVX512 to AVX2. The performance loss does not seem to be significant on the Xeon-processor machine I tried, and it allows me to get bitwise-perfect matching MATLAB (and more significantly, Simulink) similations between multiple workstations running different i5/i7/Xeon processors, that otherwise produce DIFFERENT results. I don't see any evidence (at least so far) that the AVX2 usage on the AVX512-capable machines is producing incorrect results,
To get it to work, start MATLAB via a batch file that sets the environment variable MKL_ENABLE_INSTRUCTIONS, or, set that environment variable directly via Windows settings, BEFORE starting MATLAB.
In batch file:
set MKL_ENABLE_INSTRUCTIONS=AVX2
"C:\Program Files\MATLAB\R2021b\bin\matlab.exe -singleCompThread"
(or similar).
Then try version('-blas') at the MATLAB command prompt and check that all Xeon-type processors now say "AVX2" not "AVX512".
I am finding that I also need to constrain all MATLAB sessions to the same number of threads across workstations, as well as the same "AVX2" setting, to guarantee numerical repeatability. Practically this is easiest by just using a single thread for MATLAB/Simulink. This can be achieved either by using the -singleCompThread option when starting MATLAB, or by executing
maxNumCompThreads(1);
early in the MATLAB script that configures a simulation.
I did experiment with the other environment variable setting:
set MKL_CBWR=AVX2,STRICT
This did not seem to have any effect on MATLAB when I tried version('-blas'); I don't know why that doesn't work as per the documentation.
Also I did experiment with the other environment variable setting:
set MKL_DEBUG_CPU_TYPE=5
This DID work, and seemed to be equivalent to
set MKL_ENABLE_INSTRUCTIONS=AVX2
but it isn't as well documented as
set MKL_ENABLE_INSTRUCTIONS=AVX2
so I choose the latter solution.
Peter Monk
on 19 Sep 2018
Edited: Peter Monk
on 19 Sep 2018
I also see a problem running on macs with osx 10.13.6 and matlab 2018a running on an i7 or i5 based machine. This code gives different results
format long; A=hilb(7); x_exact=.1*(1:7)'; b=A*x_exact; x_comp=A\b; resid=norm(A*x_comp-b,1)
On my i7 machine "resid" is non-zero (true) whereas on the i5 machines "resid=0" which is misleading since x_exact and x_comp differ in the 10th decimal place or so.
0 Comments
Ba Mo
on 4 Oct 2020
Edited: Ba Mo
on 4 Oct 2020
facing the same problem, running on two HPCs (clusters). problem is, the better more powerful HPC returns better results; but it's always crowded.
i dont face singularities or runtime errors; but the output of a stochastic optimization problem differs by 1%-2%. This 1%-2% becomes noticeable when your objective function is in the order of 10^8
0 Comments
See Also
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!