MATLAB Answers

2

Dual Xeon and Matlab R2015b - different behaviour on physical vs virtual machine

Asked by Alex R. on 23 Feb 2016
Latest activity Edited by Albert Franck on 8 Aug 2017
Dear all,
I've read some old questions and answers saying that Matlab can use only one CPU, even if the machine has multiple CPUs. I'm not sure if that has changed in R2015b (has it?) but I'm seeing some odd behaviour.
On a 2 x Xeon E5-2670 machine (16 threads per CPU, 32 threads total) on which I have Ubuntu 14.04 installed, I can see Matlab R2015b using max 16 threads for operations that should use all 32 threads, e.g. embarrassing parallel tasks such as x = x.^2 etc. In Linux htop shows only 16 threads at 100% the other 16% are idle ... but the 16 active threads span both CPUs, they are never either threads 0-15 or 16-31, so matlab is in fact using 2 CPUs, but somehow only max 16 threads.
Furthermore!
I also have access to a 2 x Xeon E5620 (8 threads per CPU, 16 threads total) which runs VMware ESX 5.5 with a virtual machine that is configured to expose 2 CPUs, each with 8 threads. The VM also runs Ubuntu 14.04, which does indeed report two CPUs and 16 total threads (see below) ... and, to my surprise, Matlab here actually uses all 16 threads for the same x=x.^2 task. htop shows all 16 threads at 100%.
Anyone has any clues?
p.s. I currently don't have access to the vSphere client to see exactly how the ESX was configured and if it has anything to do with it, but if Linux sees 2 physical CPUs and matlab can only use 1 CPU then the above is strange. Either Linux is aware it's running as a VM and pools the CPUs somehow, or ESX does something funny. I'm new to ESX.
$ grep ^physical /proc/cpuinfo | uniq
physical id : 0
physical id : 1
$ grep ^processor /proc/cpuinfo
processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5
processor : 6
processor : 7
processor : 8
processor : 9
processor : 10
processor : 11
processor : 12
processor : 13
processor : 14
processor : 15

  2 Comments

I just wrote some MEX C++ code compiled against OpenMP where I request "max" number of threads from OpenMP and do some intensive stuff. Sure enough, it maxes out 32 threads when executed from Matlab on the Dual Xeon E5-2670 machine. On the same machine, Matlab's fft() or x=x.^2 etc however only max out 16 threads. As far as I know, Matlab's fft() is compiled against FFTW and OpenMP so it should in theory use all 32 threads.
Is there some hard limit in Matlab for max 16 threads? That would explain why I'm seeing max 16 threads used in both the above machines.
One other interesting finding. I did the following two tests.
  • Disabled Hyperthreading (HT) in BIOS, leaving 8 cores (8 threads) per CPU for the two CPUs, so 16 threads in total. Matlab then used all 16 threads quite happily for stuff like x=x.^2 or fft() etc.
  • I enabled Hyperthreading but disabled half the cores for each CPU, leaving 4 cores (8 threads) per CPU for the two CPUs, so also 16 threads in total. Matlab then used only 8 threads in all tests. My OpenMP stuff that requests max threads continues to max out the number of threads (16).
So it looks that the issue is HT on multi processor systems. Though, interestingly, HT was enabled in the Vmware ESX and Matlabn there uses all threads, but I'm not sure how hardware under ESX is exposed.
Any tips to make Matlab use all 32 threads on the Dual E5-2670 for its internal functions will be highly appreciated!

Sign in to comment.

3 Answers

Answer by Philip Borghesani on 23 Feb 2016

MATLAB attempts to detect hyper-threading and may choose not use it. We have found that, for the well optimized math libraries, using hyper-threading frequently causes lower performance then without so on hyper-threaded machines you may see that only half your total cores are used.
The number of physical processors has no effect on MATLAB it sees no difference from multiple cores on one processor.

  2 Comments

Is there a way to ask Matlab to always use/ignore HT?
It's odd that on the VM machine with 16 threads it uses HT, but on the physical machine with 32 threads it refuses to use HT.
Most of my OpenMP code that use all threads does run faster with HT disabled, but other code actually runs slower. It would be nice to be able to control it.
Take a look at maxNumCompThreads and search for information on it. It does not control all threading and may not work in the future but it is a start. Try searching this site and the web for other information.
What i should have asked before was why do you care? Are you seeing poor performance with a specific configuration? We have seen some poorly configured VMs that have shown bad performance with R2015b.

Sign in to comment.


Answer by Albert Franck on 6 Aug 2017

Hello!
Very interesting topic. I then decided to contribute to it.
In fact, I have quite the same issue, using the R2017a trial version.
I know Matlab well, and was curious to test the PCT on a dual mobo computer.
At this point, I'm very disappointed.
To sum up:
- I have a bi-xeon computer (2 xeon 2686 V4, 18/36 cores => total = 36 physical/72 logical cores) - Using PCT with 1 or 2 xeons(removing one..) at max usage, performances are the same.. - Using one xeon, I am able to run at 100% usage with 36 logical cores - Using two xeons, I am not able to run more than 58-60% (8-10% due to the 2nd cpu at idle, see 2nd print screen below) cpu usage with 36 or more logical cores (I try until 50 for instance)=> I got the same performances between 36 and 50 cores...(I did no try above 50 because I lack memory and I know this is not the issue) - In the case of the 2 xeons, when using for instance 50 cores, it seems that at the beginning (when starting the parallel pool and shortly after launching the parfor), more than one CPU is really used. But then, 30sec (average) after running the code, only one processor is really used by Matlab (always the CPU1) and at 100%. It is like Matlab decide to use only 1CPU (I think this is a central point). This is by the way another issue because the same CPU is always used in this case is and temp are high arround 90-100% usage (decreasing CPU lifetime..well anyway...). Moreover in this case, I am 100% sure that the 2nd CPU is not used. Too bad.. - I have also try running multiple instances => same issue.. - I did not try using 2 VM for instance - I did not try testing MJS (and I cannot on the trial because I need the DCS toolbox). I am pretty confident that this will not solve the issue because I know well how MJS work - Off course I maded some benchmarks to compare 1 and 2 cpu cases. Again, performance results are almost the same!
My guess: for me (and for the moment), Matlab is not able to deal with the 2 cpu case at 100% usage (without considering virtualization which I've not yet tested). Since using only one cpu there is no issue, I think that hypertreading is not the bad guy because using 1 cpu one could easily overcome this issue (using maxnumcompthreads=nbr logical cores).
By the way, if I set up another computer using the 2nd xeon, I know that theses 2 computer will give me the power I want...but I don't want to spend more money and this is the reason why dual mobo are maded for...
What's more, I know it is not a CPU issue because with others softwares I am able to use both CPU's at 100% usage (in fact I usually not go above 90/95% because it is usually proved inefficient...well known debate!), see for instance a cinebenchR15 test below (God please, I'd like to see the same CPU usage using Matlab PCT with 2 xeons!!)
@Matlab specialists:Is there a way to fix this issue?
You will find below some print screen.
1. Using one CPU
2. Using two CPU (same case for each case with parallel pool = 36 to 50 workers)
3. Cinebench R15 (able to run 2 CPU at 100%...)

  0 Comments

Sign in to comment.


Answer by Albert Franck on 8 Aug 2017
Edited by Albert Franck on 8 Aug 2017

FYI, a Matlab technician solved the issue. You will found below his answer:
"I am writing in reference to your Technical Support Case #02692353 regarding 'PCT on a dual cpu computer'.
This might relate to a bug introduced in R2016a and will only be visible on Windows machines with a large number of cores.
When a Windows system has over 64 logical cores it manages these cores by splitting them into groups (numbered from 0) each group containing no more than 64 processors. Threads are assigned to a process group and then scheduled by the Operating System on the cores available within that process group.
The workaround is to set the environment variable in Windows:
Name of Variable: KMP_AFFINITY Value to set to: respect,none
This must happen before MATLAB is started.
The Linux system does not trigger this bug."

  0 Comments

Sign in to comment.