Batch processing alternative parameterizations of a program that uses parfor loops

14 views (last 30 days)
I have a script ('model.m') that iteratively solves a large system of equations using a parfor loop over 20 dimensions. Denote the solution by the array X. I would like to run this script N times under different parameterizations of a single exogenous variable, where the parameter space is contained in param_vec. Since I have access to a cluster with 500 cores, I would like to be able to run 25 alternative parameterizations simultaneously.
As a simple example, the inefficient way to compute my N solutions would be:
for it=1:N
param = param_vec(it); %select parameter for iteration #it
model % call script that iteratively solves the system of equations,
% where the solution method uses a parfor loop over 20 dimensions.
output_save{it} = X; % store the output for iteration #it.
end
This is obviously inefficient because each call to model.m leaves 480 cores on the cluster idle. I'm aware that I cannot use a parfor loop above instead of the for loop because that would nest the parfor loop inside model.m.
In theory, I could open up 25 matlab instances, each with parpool(20), and run these alternative parameterizations manually. Obviously this would be highly impractical.
I am unfamiliar with batch processing, but my understanding is that opening up 25 matlab instances as described above is a naive batch approach. I am wondering if it is possible to automate this with matlab's internal batch processing functions, or to do so using command prompt.

Answers (1)

Damian Pietrus
Damian Pietrus on 24 Jan 2024
Hello Brandon,
Since you have access to a cluster, this would be a good fit for MATLAB Parallel Server. This will allow you to submit multiple batch jobs to the cluster at once, each being able to have its own pool of workers. You can check to see if it is installed/licensed on the cluster with the following steps:
  1. Open MATLAB (command line is fine)
  2. Run the 'ver' command
  3. Look for 'MATLAB Parallel Server' listed in the products. If it's there, proceed to the next step. If not, speak with your cluster admin about possibly installing it.
  4. Run 'license checkout MATLAB_Distrib_Comp_Engine'. If the licensing is fully setup, you'll get an answer of '1'.
Assuming Parallel Server is available, you'll then need to configure a cluster profile to make MATLAB aware of the cluster. This process changes depending on where you are submitting your jobs from (on the cluster or from your own machine) and what the cluster scheduler is. For that reason, you can reach out directly to me (dpietrus@mathworks.com) or support (support@mathworks.com) for steps on how to do that.
Once your cluster profile is configured, you can use the MATLAB batch command to submit multiple jobs to the scheduler at once. The basic format is something like this:
% Call the HPC cluster profile
c=parcluster('HPC_Profile_Goes_Here');
% Submit a single batch job with 20 workers
job1 = c.batch(@my_parallel_function, numOutputs, {input1, input2}, 'Pool', 20);
% Check job state
job1.State
% Fetch outputs if job is complete
myresults = job1.fetchOutputs
You can also submit bulk jobs with a simple function
function jobs = submit_jobs
% Call the HPC cluster profile
c=parcluster('HPC_Profile_Goes_Here');
sims = [54 162 324 648];
for sidx = 1:length(sims)
% Run code with different number of iterations
jobs(sidx) = c.batch(@my_parallel_function, 1, {sims(sidx)}, 'Pool', 20);
end
end
This should hopefully be enough to get you started. If you do end up reaching out to support, please reference this post in your communication.
  2 Comments
Brandon
Brandon on 24 Jan 2024
This has helped me to get started, but I am running into an error that's causing the batch job to crash.
Some context: The function that I am trying to batch process ('OLG.m') basically has three pieces: (1) load initial data; (2) parfor loop over multiple dimenions and obtain candidate solutions to objective functions; (3) consolidate candidate solutions from part (2). The code iterates over (2)-(3) until a fixed point is found.
Piece (1) loads a variable called z_time, which is used in pieces (2) and (3). However, the error below seems to indicate that the error occurs in piece (3). I'm a bit confused how that's possible given the order in which that variable is used.
Error using parallel.Job/fetchOutputs
An error occurred during execution of Task with ID 1.
Caused by:
Error encountered while getting input data from the task.
Error using parallel.task/MJSTask/hGetProperty
Error: File: OLG.m Line: 1695 Column: 113
Invalid syntax for calling function 'z_time' on the path. Use a
valid syntax or explicitly initialize 'z_time' to make it a
variable.
Damian Pietrus
Damian Pietrus on 26 Jan 2024
Do you have a sample of the code that you'd be able to share? I'm curious if the batch job is being called properly. I'd try submitting just one sample batch job outside of your main code loop to verify that it can at least run correctly and that you are getting the outputs that you require.
I'm also curious if you could put the entirety of your code into one large job rather than submitting many smaller jobs

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!