Why a Matlab function is faster than the corresponding MEX?

Problem Statement:
I created a MEX function that is equivalent to the following Matlab function
function d = Mex_for_fun(x,y)
% Inputs: x and y are matrices of n (generic) rows and 2 columns
d=zeros(length(x),1);
for i=1:length(x)
if x(i,1)<=y(i,1) && x(i,2)<=y(i,2) && (x(i,1)<y(i,1) || x(i,2)<y(i,2))
d(i)=1;
else
d(i)=0;
end
end
end
The MEX function is simply obtained by using the Matlab coder in the app menu bar.
By comparing the run times of the two methods i used the following script:
%Test
Np=1000;
fitness=rand([Np,2])*1000;
all_perm=nchoosek(1:Np,2);
all_perm=[all_perm;[all_perm(:,2) all_perm(:,1)]];
x=(fitness(all_perm(:,1),:));
y=(fitness(all_perm(:,2),:));
%Matlab Function
tic
d_matlab_fun=Mex_for_fun(x,y);
toc
%Mex function
tic
d_mex=Mex_for_fun_mex(x,y);
toc
The results from the time on my pc are:
0.029305 seconds for the Mex function.
0.011935 seconds for the Matlab function.
Question:
Why is this the case? Shouldn't the Mex function be faster since it is compiled in C?
Thank you for the attention.

8 Comments

The JIT or engine execution also compile the matlab code. So MEX in some case is non longer superior.
People need to take out of their head the idea of for-loop == slow. This is no longer true since few latest MATLAB versions.
Out of curiosity, what happens to the timings if you remove the if statement, converting to
for i=1:length(x)
d(i) = x(i,1)<=y(i,1) && x(i,2)<=y(i,2) && (x(i,1)<y(i,1) || x(i,2)<y(i,2));
end
Without the If statement the results are the following:
0.015404 seconds for the MEX function.
0.010788 seconds for the Matlab function.
So there is an overall imrpovement, however these times slightly change at each run of the test script.
Thank you for the explaination. However at this point i am curious to know when using a Mex is advantagious. Just as an example, in the test script i gave here i used:
all_perm=nchoosek(1:Np,2)
However in my test script i used a Mex function called VChooseK (https://it.mathworks.com/matlabcentral/fileexchange/26190-vchoosek) which is substantially faster than nchoosek when Np is a large number. Why is that so?
But Jan's VCookeK is manually designed and very carefully. This take a big amount of work and tuning.
You cannot compare the automatic translation such as coder.
In my experience, simple for-loop with arithmetic operations like yours, it is not worth to mex it. But if you have complicated algorithm and willing to spend time to look at availabe algoritms, etc.. then you can save time.
I would niot rely on coder for time is important, and it is probably first to evaluate what is te bottleneck of your code and focus on that part first.
@Bruno Luong Just to be clearer, my objective is to make a multi-objective particle swarm optimizer faster. The bottle neck is the following line of code:
d= all(x<=y,2) & any(x<y,2)
which has the same output as the function written above and is really slow when x and y are long matrices.
So in essence you would suggest to create a mex for the whole function instead of creating it for the single bottleneck? It's kind of counter intuitive that i would have better results if there is not an improvement for the bottleneck.
Your for-loop is pretty straigh-forward, there is no much to be optimize about it unfortunately. You might try to create d with logical, and assign directly the boolean expression instead of branching with "if", migh be reverse the test so that less logical testing will be performed with your data. Again you have to mex manually so you can optimize it. Just forget coder to do the job for you.
Ok thank you very much @Bruno Luong, you were very helpful.

Sign in to comment.

 Accepted Answer

It is important to remember that under the hood, a MEX file is simply a function that calls a C/C++ (or sometimes Fortran) subroutine. So when we are comparing the execution time of a MATLAB script to that of a MEX file, we are comparing the amount of time it takes the MATLAB script to be interpreted to the amount of time it takes the generated C/C++ code to execute.
Now it is reasonable to think that compiled C code should execute faster than M code is interpreted. However, it has been many years since MATLAB has been a true interpreted language. Today it is a just-in-time (JIT) compiled language with a large library of pre-compiled routines that are optimized for the target that MATLAB is running on. With MEX code generation, we are generating portable C code. Sometimes the MATLAB libraries are even multi-threaded while the generated code is not. Generally speaking, when an application mostly exercises pre-compiled binaries in MATLAB, or things that the JIT compiler handles well, the more realistic expectation is for MATLAB to be faster than the generated C code. The times when we see large speed-ups tend to correspond to those functions which are still implemented as complicated MATLAB functions in MATLAB. One might expect, therefore, to see speedups with QUADGK or QUAD2D, for example, but not with FFT.
With a simple script, we are really just comparing the C compiler to the MATLAB JIT compiler. For instance, the "mod" function is executing the exact same binary code in MATLAB as it is in a MEX file generated with MATLAB Coder. There just is not any opportunity for speedup here.

2 Comments

@Bruno Luong and I have been (politely) arguing about whether MATLAB is currently a "just-in-TIme (JIT) compiled language".
At one point I was told by Mathworks staff that MATLAB is no longer considered to be "just-in-Time": that instead, as of the introduction of the Execution Engine, scripts and functions are now internally compiled at parse time, rather than Just-In-Time (JIT implies optimization when a section of cide is executed.)
But the public documentation does not make that clear, and Bruno's interpretation is that the current language still qualifies as being "JIT".
Timing tests that we do to resolve the matter tend to be ambiguous. It is common for the first execution of a loop (such as here in Answers) to take notably longer than other iterations. But only common -- my timing experience shows that it is common for the 3rd (typically, sometimes 2nd or 4th) iteration of a loop to be the longest version, with a lot of variability in the first 6 iterations, after which the maximum time tends to cut down considerably.
Thank you @Apeksha Bagrecha for claering this out.
My observation on the simple loop withh expression on scalar (could be array element though) is that MATLAB become extremely efficient, and even my equivlent manual MEX cannot beat it. II then assume that MATLAB compile the for-loop code and not calling a library (since calling function is alway slow). My assumptionmight be wrong though.
In any case It is much harder to improve speed using MEX programing for sure.

Sign in to comment.

More Answers (0)

Categories

Find more on MATLAB Coder in Help Center and File Exchange

Products

Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!