openmp options changed by MATLAB?

After I tried parfor (with default options) in MATLAB (R2021a) my parallel computations in Rcpp Rstudio, which uses openmp, became much slower. My code in Rstudio is written in C++ and compiled through the Rcpp package. But now when I use more than one thread in Rstudio the computations are much slower than they were before. Could it be that MATLAB changed some default settings in my compiler? In that case, how could I reverse the changes? I am using ubuntu 20.04. I have an AMD processor.
In C++ I use a very simple loop with no interactions among threads (completely parallel):
omp_set_num_threads(nproc);
#pragma omp parallel for schedule(static)
for (int ii=0; ii<=nproc-1; ii++){
// command to send job to thread ii
}
I think it is something to do with MATLAB creating a pool of 12 workers. This is what MATLAB did when using parfor. Now my parallel computing in Rstudio behaves very differently (and slowly). For example, if I specify 32 threads in my code (which is 50% of the 64 I have), the total CPU use (according to top) is only around 20% (which is roughly 12 over 64). If I specify only 5 threads (which is just about 8%), then the total CPU use is also around 20%. However, if I specify 3 threads, the total CPU is 15%, and with one thread it is 9%.
Before I used MATLAB the CPU use was proportional to the number of threads I specified. Now it is not. How can I undo the settings for parallel computing made by MATLAB?
I checked the values of many internal variables of openmp and the problem was not there:
int f=omp_get_num_threads();
int ddd=omp_get_dynamic();
Rcout << "get thread " << f << std::endl;
Rcout << "get dyn " << ddd << std::endl;
Rcout << "several " << omp_get_thread_limit() << " " << omp_get_max_threads() << " " << omp_get_nested() << " " << omp_get_proc_bind() << " " << omp_get_default_device() << " " << omp_get_max_task_priority() << " " << " " << omp_get_max_active_levels() << " " <<
" " << std::endl;
Rcout << "meeting " << omp_get_num_places() << " " << omp_get_place_num_procs << " " << omp_get_place_num << " " << omp_get_partition_num_places() << std::endl;
All these variables had the same values as in another computer where things were working correctly.

 Accepted Answer

Roberto
Roberto on 31 Jul 2021
Edited: Roberto on 15 Dec 2021
I solved the problem by reinstalling Ubuntu, following instructios in:
Now my c++ code is as fast as before using c++ in Rstudio: 4 times faster when I use 5 threads, 20 times faster when using 32 threads.
Although it has worked well, I guess it is not necessary to reinstal Ubuntu to undo the parallel computing settings made by MATLAB. If you know any other way please let me know.

More Answers (2)

Walter Roberson
Walter Roberson on 28 Jul 2021
Roberto, did you deliberately configure your cluster profile (which might be profile named 'default') to provide more than one thread per worker? The default is one thread per worker. https://www.mathworks.com/help/matlab/ref/maxnumcompthreads.html
Also double-check OMP_NUM_THREADS in case it is getting set to 1.
I see some restrictions on using OpenMP application interface together with parfor() when using MATLAB Coder, but I am not currently clear as to whether that has implications for cases where code is not being generated. https://www.mathworks.com/help/coder/ref/parfor.html

6 Comments

Roberto
Roberto on 28 Jul 2021
Edited: Walter Roberson on 28 Jul 2021
I just used parfor with no options.
I also tried the examples for parallel computing in this page:
In my C++ code in Rstudio I check OMP_NUM_THREADS, it is the value I fix, it is not a problem, I think. When I run in parallel it slows down a lot, but before it did not get slower.
Walter, maybe the problem is OMP_NUM_THREADS, as you mentioned.
Although I use omp_set_num_threads(2), the command omp_get_num_threads() returns the value of 1. How can I change this?
int nproc=2;
omp_set_num_threads(nproc);
int f=omp_get_num_threads();
Rcout << "get thread " << f << std::endl;
To confirm, are you setting OMP_NUM_THREADS environment variable?
I had hoped to find something by looking at the environment variable in a parfor session, by using
system('printenv')
I could see some variables being set, but I did not see anything that might constrain the number of threads.
The only command I use is:
omp_set_num_threads(nproc);
#pragma omp parallel for
for (int ii=0; ii<=nproc-1; ii++){
end
Actually, if I use omp_get_num_threads() inside the loop, it gives me the correct number of threads, so it is not limiting them to one, I think.
Bruno Luong
Bruno Luong on 28 Jul 2021
Edited: Bruno Luong on 28 Jul 2021
The number of threads used can be quite complicated, see
May be one of those variables interact when run with/without MATLAB parfor loop.
Whew! That is surprisingly complicated!

Sign in to comment.

Roberto
Roberto on 22 Dec 2021
Although reinstalling Ubuntu 20.04 solves the problem, it is not necessary to do so. Another solution is to uninstall OpenBlas.
The problem is that Matlab or/and Matlab/Dynare (not sure which one) installs OpenBlas, and in Ubuntu 20.04 it has a bug, which has been reported elsewhere: https://github.com/xianyi/OpenBLAS/issues/2642
The solution is to uninstall OpenBlas. The following worked for me:
sudo apt-get remove libopenblas0
sudo apt-get remove libopenblas0-pthread
The code slowed when operating with matrices. OpenBlas was causing the threads to conflict with each other. Before removing OpenBlas I tried OPENBLAS_NUM_THREADS=1, but it didn't solve the performance issue. After removing OpenBlas my computations in Rstudio using c++ are as fast as they used to be (much much faster than using OpenBlas).

Categories

Find more on Parallel Computing Toolbox in Help Center and File Exchange

Products

Release

R2021a

Asked:

on 28 Jul 2021

Answered:

on 22 Dec 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!