Why matlab slow on dual AMD CPUs?

33 views (last 30 days)
DH Lai
DH Lai on 18 Apr 2022
Edited: Pei Chen on 31 Jan 2023
I have a computer with dual AMD CPUs (128 physical cores,256 threads) + windows server 2022.
I tried bench on matlab r2022a:
>> version -blas
ans =
'Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210611 for Intel(R) 64 architecture applications (CNR branch auto)'
>> bench(10)
ans =
0.2706 0.1448 0.3600 1.6615 0.5450 1.0057
0.2939 0.1559 0.3607 1.7002 0.5481 0.9659
0.2997 0.1479 0.3587 1.7551 0.5394 0.9560
0.2996 0.1489 0.3726 1.7851 0.5363 0.9630
0.3188 0.1484 0.3694 1.6770 0.5399 0.9960
0.3114 0.1577 0.3593 1.6708 0.5470 0.9519
0.2875 0.1673 0.3632 1.6559 0.5336 0.9570
0.3041 0.1550 0.3730 1.7058 0.5635 0.9559
0.3103 0.1703 0.3540 1.6632 0.5370 0.9911
0.2685 0.1957 0.3530 1.7371 0.5506 0.9732
>> version -lapack
ans =
'Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210611 for Intel(R) 64 architecture applications (CNR branch auto) supporting Linear Algebra PACKage (LAPACK 3.9.0)'
When I used the BLAS and LAPACK implementations included in AMD Optimizing CPU Libraries (AOCL) (https://www.mathworks.com/matlabcentral/answers/396296-what-if-anything-can-be-done-to-optimize-performance-for-modern-amd-cpu-s), the result is:
>> version -blas
ans =
'AOCL BLIS 3.1.0'
>> bench(10)
ans =
0.4119 0.1380 0.3516 1.0771 0.5866 1.0437
0.4100 0.1419 0.3562 1.0471 0.5732 0.9655
0.4165 0.1363 0.3540 1.0626 0.5528 0.9593
0.4205 0.1399 0.3589 1.0618 0.5550 0.9739
0.4124 0.1409 0.3563 1.0450 0.5522 0.9673
0.4075 0.1394 0.3574 1.0498 0.5511 0.9650
0.4195 0.1567 0.3518 1.0737 0.5531 0.9563
0.4235 0.1406 0.3552 1.0620 0.5460 0.9620
0.4093 0.1415 0.3641 1.0535 0.5417 0.9658
0.4101 0.1410 0.3547 1.0728 0.5443 0.9313
>> version -lapack
ans =
'AOCL libFLAME 3.1.1, supports LAPACK 3.10.0'
Both results do not seem to perform well(the first four items).
I also found that matlab could not use all the physical cores when it executes codes that can be vectorized, such as *, .*, \ and so on. In fact, it only uses 64 threads 32 physical cores.(this is no good)
However, if I set maxNumCompThreads to 32, it will use 32 threads 32 physical cores.(this is nomal)
And it only uses one cpu anyway.(this is bad)
In short, I hope matlab will perform better on dual AMD CPUs, or what else can I try to do?

Answers (2)

Alvaro
Alvaro on 20 Jan 2023
I am not sure why you are getting this performance drop with the AMD libraries, it might be a good idea reaching out to support for this one.
Note, however, that built-in functions do not necessarily use all cores (more cores is not always better):
If you wish to control how your cores are used, then there are functions in the Parallel Computing Toolbox such as parpool, parfor, etc that are designed for this.

Pei Chen
Pei Chen on 31 Jan 2023
Edited: Pei Chen on 31 Jan 2023
I have the same problems.
I think it is the cross NUMA problem, Current version of Matlab does not try to bind the task to the fixed NUMA area,
so some very big delay happens for multithread apps, if you restrict matlab with only one thread you will get normal performance data for that core. If you use parfor with process mode, epyc's performance is also good.
But when you work with the internal multithread functions like lapack or parfor with thread mode, it is really bad. I hope mathworks can fixed this problem in the future versions. it will become a big problem for two sockets or four socekts workstation, and AMD EPYC.
You can check this issue with the demo "Thread Vs process in matlab 2022b" on a epyc system, you will find the thread mode is 5 times more slow compared to the process mode.
BTW matlab also can not support for more than 64 workers on windows, which is a limit for EPYC.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!