improve and speed up parfor loop

Hello,
I have a code that has a 10000 iteration. The code involves a Monte Carlo simulation using Normal distributions. Number of simulation is 4,000,000. I tried to use parfor to speed up the code. However, when I compare its time to for loop is almost the same.
Is there a way to speed up the code so it works with parfor loop?
Thanks,
Here is my code
clc;
clear;
close all;
...
pool = parpool('local', str2num(getenv('SLURM_TASKS_PER_NODE')));
...
A=readmatrix("x.csv");
runs = 4000000;
results=zeros(10000,1);
meanG=constant;
sdG=constant;
parfor j=1:x
mean=A(j,1); %
sd=A(j,2);
guss=A(j,3); %
for n=1:0.5:40
B=normrnd(mean,sd,[1,runs]);
F=equation
G=normrnd(F*meanG,F*sdG,[1,runs]);
%Other calculation to calculate C
if C>10
d=equation;
break
end
end
record(j)=d;
end

1 Comment

Maybe if you can show something more and exaplain what this code does someone can help you

Sign in to comment.

Answers (2)

Matt J
Matt J on 17 Apr 2020
Edited: Matt J on 17 Apr 2020
We can't see all the operations in your loop, but the ones we can see are pretty basic ones. Operations as common and basic as those would probably be coded already to utilize a multicore CPU very efficiently, so there probably isn't much room for improvement with parfor. To get a clearer idea how much improvement is possible, though, we would need to see screen shots of your CPU usage and the usage of all its cores (e.g., from the Task Manager, if you are on a Windows OS).
Some of the randomization steps you are doing though look like they could be hoisted out of the loop, e.g.,
B=normrnd(mean,sd,[81,runs]);
for n=1:0.5:40
F=equation
...
end

9 Comments

Thanks, That helps alot. i am using high performance computing cluster. I am requesting 20 cpu and I can assing any memory for it. I thought the parfor will help when I do that by factor of 20. but it did not.
Matt J
Matt J on 18 Apr 2020
Edited: Matt J on 18 Apr 2020
We need to see what percentage of CPU usage occurs when the ordinary for-loop is running, and what percentage is used on each of the 20 cluster CPUs when parfor is being used.
According to Cluster, it was 99 % for both parfor and for loop. i am not sure what is the problem.
Do you share the cluster? Does the 99% usage represent your jobs, or other peoples' as well?
Yes, it is only represent the 20 CPU that I have requested.
Matt J
Matt J on 18 Apr 2020
Edited: Matt J on 18 Apr 2020
But if other users are using the same CPUs then, you might be using only 10% of the 99%.
I do not think so. I am submitting the Job as batch and I request the amount that I need. These tasks I request should not be used by someone else.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --time=24:00:00
#SBATCH --mem-per-cpu=10GB
#SBATCH --job-name=invertRandArray
#SBATCH --error=parallel.%J.err
#SBATCH --output=parallel.%J.out
Matt J
Matt J on 21 Apr 2020
Edited: Matt J on 21 Apr 2020
I don't know bash very well, but the nodes=1 suggests to me that you are not running on multiple CPUs. Or, if you are, your for-loop has access to them as well, just as if you were running on a single 20-core CPU. If this is the case, then once again your for loop and your parfor loop have access to the exact same computing hardware, and there is no guarantee that you will get significant speed-up.
It might tell us more if you show us the output of,
>> gcp
It might tell us more if you show us the output of,
Never mind this part. Raymond has pointed out that your workers are obviously non-remote.

Sign in to comment.

It's possible that your code is already making use of mulitple cores (i.e linear algebra); therefore, running local Workers may just offset this. Try running MATLAB in single thread mode (-singleCompThread) and then benchmark your code again.
You might consider posting a bit more of you code to provide more guidance for your parfor.
  1. As it's written, A is not a sliced input, it's a broadcast variable, which could impact performance.
  2. Is record(j) supposed to be results(j)?
  3. For a particular iteration of j, what happens if C is never greater than 10 (and d does not get defined)?
  4. Again, without all of the code, it's hard to make the following recommendation, but I would consider refactoring your code as such:
parfor j = 1:x
results(j) = unit_of_work(A,runs,j);
end
function d = unit_of_work(A,runs,j)
mean=A(j,1); %
sd=A(j,2);
guss=A(j,3); %
for n=1:0.5:40
B=normrnd(mean,sd,[1,runs]);
F=equation
G=normrnd(F*meanG,F*sdG,[1,runs]);
%Other calculation to calculate C
if C>10
d=equation;
break
end
end
end

4 Comments

Matt J
Matt J on 21 Apr 2020
Edited: Matt J on 21 Apr 2020
It's possible that your code is already making use of mulitple cores (i.e linear algebra); therefore, running local Workers may just offset this.
No, the OP has said that he is running on a cluster.
I thought he's running MATLAB on the cluster. You can still run local workers on a remote cluster. Local workers is "local" to where you're running the MATLAB client, not necessarily your desktop.
I see, but I think the OPs intention is to have non-local workers.
Doesn't appear that way. Notice the reference to local here:
pool = parpool('local', str2num(getenv('SLURM_TASKS_PER_NODE')));

Sign in to comment.

Categories

Asked:

on 17 Apr 2020

Commented:

on 21 Apr 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!