multiple optimization problems under parfor

39 views (last 30 days)
Hi all,
I have encounted an issue when i tried to run 10000-by-9 optimizations under parfor. Don't really know what went wrong though.
Here's the code,
function [min,max]=opt_iden_set(params,constr,new_sol_array)
%input:
% params: a 9 by 1 vector of optimvar type
% constraints: M by 10 optimconstr type
%output: min max of each variable.
%min: M by 9
%max: M by 9
M=size(constr,1);
x0.q=0;
x0.d=0;
x0.t=0;
x0.g=0;
x0.ft=0;
x0.tb=0;
x0.vv_bar=1;
x0.phi_bar=2;
x0.rho_bar=0;
min=zeros(M,9);
max=zeros(M,9);
tic
poolobj = parpool('local');
disp("Finding identified sets...")
for i=1:M
if i>1 && all((new_sol_array(i,:)==new_sol_array(i-1,:)),2)
min(i,:)=min(i-1,:);
max(i,:)=max(i-1,:);
else
constraints=constr(i,:);
parfor p=1:9
prob = optimproblem;
prob.Constraints=constraints;
options = optimoptions('fmincon',Display = 'none');
%use fmincon, ref: https://www.mathworks.com/help/optim/ug/optimization-decision-table.html
prob.Objective= params(p);
prob.ObjectiveSense='minimize';
[~,fval]= solve(prob,x0,Options=options);
min(i,p)=fval;
prob.ObjectiveSense='maximize';
[~,fval]= solve(prob,x0,Options=options);
max(i,p)= fval;
end
end
end
disp("Identified sets found!")
toc
delete(poolobj)
end
Here params is a vector of 9 optimvars, constr is a 10000 by 10 optimconstr type variable, and new_sol_array is simply the string version of constr so i can compare. Each row of constr contains the constraints for a single optimization problem, which is min/max of elements of params. I want to parallel the optimization for each (or M rows), because there seem to be 10000 by 9 independent optimization problems.
However, I keep getting things like
OptimizationVariables appearing in the same OptimizationProblem must have distinct "Name" properties.
Make a new variable with a different "Name" property, or retrieve the original variable using the Variables property.
or
Warning: A worker aborted during execution of the parfor loop. The parfor loop will now run again on the remaining workers.
> In distcomp/remoteparfor/handleIntervalErrorResult (line 245)
In distcomp/remoteparfor/getCompleteIntervals (line 395)
In parallel_function>distributed_execution (line 746)
In parallel_function (line 578)
Error using distcomp.remoteparfor/getCompleteIntervals
The parallel pool that parfor was using has shut down. To start a new parallel pool, run your parfor code again or use parpool.
Exception in thread "CommGroup select thread com.mathworks.toolbox.distcomp.pmode.io.DirectCommunicationGroup@299d3663": java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at com.mathworks.toolbox.distcomp.pmode.io.DirectCommunicationGroup$ChannelHandle.<init>(DirectCommunicationGroup.java:1168)
at com.mathworks.toolbox.distcomp.pmode.io.DirectCommunicationGroup.doSelect(DirectCommunicationGroup.java:799)
at com.mathworks.toolbox.distcomp.pmode.io.DirectCommunicationGroup.run(DirectCommunicationGroup.java:664)
at java.lang.Thread.run(Thread.java:748)
Preserving jobs with IDs: 3 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile local. To create 'myCluster' use 'myCluster = parcluster('local')'.
Error using internal.matlab.desktop.editor.clearAndSetBreakpointsForFile
An internal runtime error occurred.
Warning: 5 worker(s) crashed while executing code in the current parallel pool. MATLAB may attempt to run the code again on the remaining workers of
the pool, unless an spmd block has run. View the crash dump files to determine what caused the workers to crash.
Warning: 5 worker(s) crashed while executing code in the current parallel pool. MATLAB may attempt to run the code again on the remaining workers of
the pool, unless an spmd block has run. View the crash dump files to determine what caused the workers to crash.
when i tried to pull the parfor in the outmost for loop.
Does anyone know why? I could really use some help here. Thanks!

Accepted Answer

Raymond Norris
Raymond Norris on 2 Jun 2022
I'm assuming all of this works fine when you run a for-loop instead of a parfor loop.
To address the first warning, I'm not an Optim guy, but look at this to see if it'll get you started.
To address the second warning, my guess is that one of the workers crashed because of out of memory issues. Notice the error message
Exception in thread "CommGroup select thread com.mathworks.toolbox.distcomp.pmode.io.DirectCommunicationGroup@299d3663": java.lang.OutOfMemoryError: unable to create new native thread
To avoid deadlocks (e.g., where one worker is waiting on another worker that is no longer running), MATLAB stops the pool if a worker ever goes down. For parfor loops, where workers don't communicate with each other, an option is start the parallel pool with the SpmdEnabled flag set to false, as such
poolobj = parpool('local', 'SpmdEnabled',false);
For example
>> pool = parpool("local",4, "SpmdEnabled",true);
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 4).
>>
>> parfor idx = 1:4
if idx == 3, exit, else, pause(10), end
end
Error using distcomp.remoteparfor/getCompleteIntervals
An unexpected error occurred during PARFOR: Error in remote execution of parfor: Java is shutting down
The client lost connection to worker 3. This might be due to network problems, or the interactive communicating job might have
errored.
>> pool.NumWorkers
ans =
0
>> % Harden the pool
>> pool = parpool("local",4, "SpmdEnabled",false);
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 4).
>>
>> parfor idx = 1:4
if idx == 3, exit, else, pause(10), end
end
Error using distcomp.remoteparfor/getCompleteIntervals
An unexpected error occurred during PARFOR: Error in remote execution of parfor: Java is shutting down
>> pool.NumWorkers
ans =
3
But that won't prevent the worker(s) from crashing if it runs out of memory. To address that, you need to understand (a) how much data is being passed back and forth in the parfor loop and (b) how much temporary data is being created within the parfor loop.
To see how much data is being passed back and forth, look at ticBytes and tocBytes. As far as how much is being created within the parfor loop, switch back to the for-loop and watch your system monitor. who might also help. I would say that a single MATLAB can run, but when you have 2, 4, 8, etc. workers running, all consuming the same amount of memory each, your reaching your system's capacity.

More Answers (1)

M Mirrashid
M Mirrashid on 5 Jun 2022

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!