Parrellel server dir() for files local to the server

Question

Christopher McCausland on 25 Apr 2022

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1704900-parrellel-server-dir-for-files-local-to-the-server

Commented: Raymond Norris on 10 May 2022

Hello,

I wish to move computation from a local machine with .m files stored locally to a parrellel compute server with the .m files stored on the server.

Processing the files sequentially on my local machine this usally looks something like this.

Files = dir('C:\my_data');                   % Retrieve all patients .m files names 
for i=1:length(Files)                        % 
    load(strcat('C:\my_data,Files(i).name')) % Load each file in turn
    
    % Put functions to run on data
    
end

I now want to move this compute to a parrellel server, I have a PS liscense and the server is validated, I have also uploaded the files to the server.

However I cannot figure out how to call the dir() command so that it queries the files on the server (as they are about 1Tb total in size - so too large to transfere to the remote server eachtime). I had though it would look something like this;

Files = dir('~/home/user/Database/Physionet/training/'); % Rather than query locally, querey the data on the server

However the directory isn't found correctly, Can anyone explain to me how to point to this data on the parrellel compute server? Or if anyone has suggestions on better ways to do this please let me know!

Kind regards,

Christopher

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Raymond Norris on 25 Apr 2022

2
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1704900-parrellel-server-dir-for-files-local-to-the-server#answer_950745

Open in MATLAB Online

For starters, you don't want to hard code files/paths in your code. Your code should be functions so that you can pass in root folder locations to where you want to read/write. I'll show you an example, but first a couple of questions.

How do you submit your code to the cluster? Are you using parpool or batch. For example

c = parcluster('cluster');
pool = c.parpool(16);
Files = dir('~/home/user/Database/Physionet/training/');
parfor i=1:length(Files)
    % Had a typo in your line.  Also, will want to make sure Files(i).name
    % is always a MAT-file (think at least about . and ..)
    load(strcat('C:\my_data',Files(i).name)) 
    ...
end

Or

c = parcluster('cluster');
job = c.batch(@mycode,...,'Pool',16);

I'm guess you want the former, but you probably gonna need the latter. It also depends on what you're going to do with the data after the parfor finishes (or while it's running). I have a thought, but you might need to update to R2022a.

13 Comments
Show 11 older commentsHide 11 older comments

Christopher McCausland on 26 Apr 2022

Edited: Christopher McCausland on 26 Apr 2022

Hi Raymond,

Thank you for the reply!

I usually hard code in paths for testing purposes as I have used uigetdir() in the past but it can be a little annoying for lot of testing. I would really like to see how you would suggest to do it if there is a better way though!

In terms of the additional information, I should include that I have used MATLAB for some time but this is my first time using MATLAB with a cluster (as the processing time was taking to long locally).

To submit a job to the cluster I have been using Enviroment -> Parallel -> Select a default cluster -> my cluster and run the job as this seemed to be handeling the submission part?
I hadn't be using parfor to loop around the files as the load() command is incompatiable with this, if you know of a way around this I would love to know it. I have a large amount of computing resource so if I can compute different patient files in parallel this should massivly speed things up!

Sorry if these are all simple questions, the MATLAB docuemntation is very well maintained but there isn't as much information on the parrellel server end (I guess because all servers are diffrent, and only a small precentage of the MATLAB comunity uses this functionality) but I would love to see an 'idiots' guide to parrellel server processing several hundred files through the same set of functions!

In terms of 2022a, I could get it put on the server but this would take time. I would love to know the idea and possiably use it in the future.

In terms of what my processing pipeline looks like: I have 1000 eight hour recordings of 13 channels @ 200Hz. I want to take six of these channels which represent brainwaves and preform statistical analysis on then in the time and frequency domians. Currenlty I have been locally cycling though using for() each of the 1000 files, loading them with load() into my workspace, putting them through a set of custom functions to extract statistical properties and using saving some workspace variables with save() before moving on to the next file and rinise and repeat, this takes quite some time as you can imagine. If there is a better way to do this with a cluster I would love to know!

Thank you in advance, if you need any more information just let me know!

Christopher

Raymond Norris on 27 Apr 2022

Open in MATLAB Online

Here's how I would write your code

function TF = mccausland(brain_in_dir,brain_out_dir)
% BRAIN_IN_DIR is the the directory of all the brainwave MAT-files.
% BRAIN_OUT_DIR is the directory to store all the resulting MAT-files.
% Assume everything works fine.  Will return true (success) by default.
TF = true;
try
    if nargin==0
        % If no directories where provided, assume everything is in the
        % current folder.  This is helpful if we're running everything
        % local on our machine and don't want to have to pass in the
        % folders to read/write to each time.
        brain_in_dir = pwd;
        brain_out_dir = pwd;
    end
    brain_mat_files = dir(fullfile(brain_in_dir,'*.mat'));
    if isempty(brain_mat_files)
        % Didn't find any brainwave files (either BRAIN_IN_DIR is bad or
        % there aren't any MAT-files in it.  In either case, exit early.
        error("Failed to find any brainwave files in ""%s"".", brain_in_dir)
    end
    % Ensure output directory exists.  If not, create it first.
    if exist(brain_out_dir)~=7 %#ok<EXIST>
        disp("Creating folder:" + brain_out_dir)
        [PASSED, emsg, eid] = mkdir(brain_out_dir);
        if PASSED~=true
            % Invalid BRAIN_OUT_DIR name or failed write permissions.  In
            % either case, exit early.
            error(eid,emsg)
        end
    end
    parfor bidx = 1:numel(brain_mat_files)
        % BRAIN_FILE is a structure, containing all MAT-files.  We
        % specifically will need "brain_file.name" and "brain_file.folder".
        brain_file = brain_mat_files(bidx);
        % Operate on BRAIN_FILE.  Write results to BRAIN_OUT_DIR.
        unit_of_work(brain_file, brain_out_dir)
    end
catch E
    % Found an error.  Display the last error and return failed.
    disp(E.message)
    TF = false;
end
end
function unit_of_work(bfile, brain_out_dir)
brain_file = fullfile(bfile.folder,bfile.name);
% Load brain file
load(brain_file) %#ok<LOAD>
% For this example, let's assume the variables "ch1" and "ch2" were
% stored in the MAT-file.
ch1 = rand(1,1000);
ch2 = rand(1,1000);
% Work with brainwaves / perform statistical analysis
...
    % Save results (for example, channel variables)
% Need to determine unique output file name.  In this case, we'll use
% the name of the input brainwave file and concatenate a suffix
[~, ifile] = fileparts(bfile.name);
rfile = fullfile(brain_out_dir,ifile + "_results");
save(rfile,"ch1","ch2")
end

This way you could run your code locally on a smaller set of brainwave files but then also run it on the cluster. To submit your job to the cluster, try the following

function job = submit_brainwave_job()
cluster = parcluster();
idir = "~/home/user/Database/Physionet/training/";
odir = "~/home/user/Database/Physionet/training/RESULTS";
job = cluster.batch(@mccausland,1,{idir,odir}, ...
    "AutoAddClientPath",false, "CaptureDiary",true, ...
    "CurrentFolder",".", "Pool",100);
end

Obviously, change the Pool size to something that best fits your cluster/algorithm.

Christopher McCausland on 29 Apr 2022

Hi Raymond,

This is very lovely code. It's really intresting to see how MATLAB staff code and pick up a few tips and tricks! Thank you so much.

I have a few questions, some of which are technical and some of which are more to understand your throught process. I am student and I always try to learn as much as I can from people that know what they are doing so I can improve myself.

1) In terms of the paths function mccausland, if I still have to pass in the variables brain_in_dir and brain_out_dir as a string is this not still technically hard coding? I know I could use uigetdir() for this locally but not so much on the cluster.

I think having this as a function rather than a script will reduce the initial workspace transfer overhead to the cluster which is good, is that why you have suggested to move to a function based approach?

2) If mccausland is run without input you use pwd too look in the current directory (pretty smart!) Is this the prefered error handeling techique for file paths etc?

3) To run this, currenlty I have a script file which acts as a 'main' to call fucntions, would you suggest changing this to a function and just running from the command window instead?

4) What is the differance between batch and parpool? I have read the docuemention but I cannot figure out what the fundemental diffrence between the submission methods are?

5) Function unit_of_work, is there any method to load multiple files and compute seperate patients in parrellel or is this a terriable idea?

6) submit_brainwave_job() thats a really clean way to do things! In terms of the function file being coppied to the worker will this also copy the 'helper functions'/additional required functions files?

7) "AutoAddClientPath",false - This is because we are adding our own paths to the cluster storage with idir,odir?

8) "CurrentFolder","." - Will the current folder be okay on the local machine or should this be moved to the cluster? What does it mean by this being the folder the script/fucntion executes in?

Sorry for all the questions I really appreciate you taking the time to answer them and share your knowledge!! Once again thank you for the code above too, it's so helpful to see how it's done professionaly and I have learnt a lot to apply to my own work moving forward!

Kind regards,

Christopher

Christopher McCausland on 2 May 2022

Open in MATLAB Online

Hi Raymond,

Thank you so much for taking the time again. You have been absolutely brillent! My one final question is this, within mccausland you have the following block of code:

parfor bidx = 1:numel(brain_mat_files)
    % BRAIN_FILE is a structure, containing all MAT-files.  We
    % specifically will need "brain_file.name" and "brain_file.folder".
    brain_file = brain_mat_files(bidx);
    % Operate on BRAIN_FILE.  Write results to BRAIN_OUT_DIR.
    unit_of_work(brain_file, brain_out_dir)
end

I am correct in saying:

This will proccess multiple .mat files similtanously (Basically replacing the orignal FOR loop but the parrellel nature will mean it processes faster)
How can this block 'load' .mat file data (i think thats what the variable brain_file is doing?) without using load()? Equally I know you can't use load() in a parfor loop.
Lastly, while not a concern currently how would you do the same but with say a .csv file?

I think this will answer all my questions, so I will accept the answer after this one and give you peace for now! Thank you so so much for taking the time to reply to all of these, the knowledge has been wonderful and so helpful! I am already starting to apply it to my own programming!

Kind regards,

Christopher

Raymond Norris on 3 May 2022

Open in MATLAB Online

Hi @Christopher McCausland. See my responses below.

Within mccausland you have the following block of code. I am correct in saying:

This will process multiple .mat files simultaneously (Basically replacing the original FOR loop but the parallel nature will mean it processes faster).

Correct. Each iteration will process a MAT-file. MATLAB will chunk up the MAT-file names and disperse them to the workers. It's then up to unit_of_work what to do with them. A brute force approach would be for MATLAB to give each worker an equal number of MAT-files to processes. However, MATLAB doesn't assume it takes the same amount of time to process each iteration. Therefore, MATLAB will give a subportion to each. When the worker has processed all the subiterations, MATLAB will give out more, so that ideally all workers will be busy for the duration of the parfor.

How can this block 'load' .mat file data (i think that's what the variable brain_file is doing?) without using load()? Equally I know you can't use load() in a parfor loop.

The following is how we select the brain file to use, but it's not what loads the brain file, that's done in unit_of_work

brain_file = brain_mat_files(bidx);

It's helpful to understand why load/save are problematic in parfor-loops. In fact, your code can call load/save.

load

Take the following example:

parfor idx = 1:N
    %%%%%%%% run on another machine %%%%%%%%
    load XY
    z = x .* y .* idx;
    %%%%%%%% run on another machine %%%%%%%%
end

Of course the parfor block doesn't need to "run on another machine", but I want to emphasize a point here, which is, what would it take to run this code in a completely different process on a completely different machine (maybe different OS)? We need the file XY and we need the "tokens" x, y, and idx. What do I mean by "token"? MATLAB evaluates the code and identifies the elements x, y, and idx to either be variables or functions. You see the code x and y and think it must be defined in XY.mat, but you wouldn't if the MAT-file was called MONDAY.mat. Likewise, MATLAB doesn't know to deduce from the MAT-file name that x and y are in it. If MATLAB doesn't know what the token is, then it assumes it's a function (that can be resolved on the workers running on the other machine). MATLAB tells the workers that x and y are functions and that idx is a variable. Then the workers run the block of code, load XY, which in turn creates the variables x and y, contradicting what it was told by MATLAB.

There are a couple of ways to resolve this.

Solution #1

parfor idx = 1:N
    %%%%%%%% run on another machine %%%%%%%%
    data = load('XY');
    z = data.x .* data.y .* idx;
    %%%%%%%% run on another machine %%%%%%%%
end

I'm not really a fan of this, because that's not how I'd write it in a for-loop. But in a pinch, it'll work.

Solution #2

What I coded.

parfor idx = 1:N
    %%%%%%%% run on another machine %%%%%%%%
    unit_of_work(idx)
    %%%%%%%% run on another machine %%%%%%%%
end
function unit_of_work(idx)
    load XY
    z = x .* y .* idx;
end

This also has its drawbacks. What if we need x and y later in the code? The way it's written scope the variables to the subfunction. But the reason these work are because we're not introducing new variables to the parfor-loop on the fly (e.g., eval would also cause issue). In solution #1, we already know that d is a variable. In solution #2, we aren't introducing any new variables at all. All the code is pushed into our refactored code, unit_of_work (I just use that name, it can be called anything).

save

This is a little trickier. Unlike load, you can't call save directly in a parfor-loop, but it also can be refactored and called in a subfunction, like solution #2. save requires knowing something about the workspace that called it. Take the following example

A  = rand;
for idx = 1:N
    save RESULT
end

What gets stored in RESULT? MATLAB looks at its workspace and finds A, idx, and N, which are all stored in RESULT. Now write this as a parfor

A  = rand;
parfor idx = 1:N
    %%%%%%%% run on another machine %%%%%%%%    
    save RESULT
    %%%%%%%% run on another machine %%%%%%%%
end

MATLAB didn't send the workers any variables and the workers can't reach back asking for variables, so how can it save A, idx, and N? Here's a workaround:

A  = rand;
parfor idx = 1:N
    %%%%%%%% run on another machine %%%%%%%%    
    unit_of_work(A,idx,N)
    %%%%%%%% run on another machine %%%%%%%%
end
function unit_of_work(A,idx,N)
save RESULT A idx N
end

You probably want a unique MAT-file name -- refer back to what I showed in the code.

Two salient points here

For parfor loops to work properly, they must (ultimately) behave and provide the same results as a for-loop, but quicker.
Whenever you modify your code to parallelize it, go back and run it as a for-loop to ensure you haven't changed the behavior/output.

Lastly, while not a concern currently how would you do the same but with say a .csv file?

There are a couple of small changes and one larger change. To begin with, you'll want to pass in the file format to source. Let's make fext to be the file extension (update submit_brainwave_job as well).

function TF = mccausland(brain_in_dir,brain_out_dir,fext)

and then use it here

brain_mat_files = dir(fullfile(brain_in_dir,fext));

Replace any references to "mat" in the code, for instance change the variable to

brain_files = dir(fullfile(brain_in_dir,fext));

The piece that requires more work is the file "reader" (and maybe "writer" if you want to write non MAT-files). Before, we knew it was MAT-files, so we just call load. Now we need to provide the workers with the proper function to call to read the data.

Here's a modified version

function TF = mccausland2(brain_in_dir,brain_out_dir,fext)
% BRAIN_IN_DIR is the the directory of all the brainwave files.
% BRAIN_OUT_DIR is the directory to store all the resulting files.
% FEXT is the file extension of file we want to read.
% Assume everything works fine.  Will return true (success) by default.
TF = true;
try
    if nargin==0
        % If no directories where provided, assume everything is in the
        % current folder.  This is helpful if we're running everything
        % local on our machine and don't want to have to pass in the
        % folders to read/write to each time.
        %
        % Default to reading and writing MAT-files.
        brain_in_dir = pwd;
        brain_out_dir = pwd;
        fext = 'mat';
    end
    
    brain_files = dir(fullfile(brain_in_dir,fext));
    if isempty(brain_mat_files)
        % Didn't find any brainwave files (either BRAIN_IN_DIR is bad or
        % there aren't any files in it.  In either case, exit early.
        error("Failed to find any brainwave files in ""%s"".", brain_in_dir)
    end
    
    % Ensure output directory exists.  If not, create it first.
    if exist(brain_out_dir)~=7 %#ok<EXIST>
        disp("Creating folder:" + brain_out_dir)
        [PASSED, emsg, eid] = mkdir(brain_out_dir);
        if PASSED~=true
            % Invalid BRAIN_OUT_DIR name or failed write permissions.  In
            % either case, exit early.
            error(eid,emsg)
        end
    end
    
    % Define readers and writers for the brain files
    switch fext
        case 'mat'
            helper_fcns.reader_fcn = @load;
            helper_fcns.writer_fcn = @save;
        case {'csv', 'txt'}
            % Select the oppropriate reader/writer from the list.  Using an
            % example here (readmatrix, writematrix)
            % https://www.mathworks.com/help/matlab/import_export/supported-file-formats-for-import-and-export.html
            helper_fcns.reader_fcn = @readmatrix;
            helper_fcns.writer_fcn = @writematrix;
        otherwise
            error('Unsupported file format: %s', fext)           
    end
    
    parfor bidx = 1:numel(brain_files)
        % BRAIN_FILES is a structure, containing all brain files.  We
        % specifically will need "brain_file.name" and "brain_file.folder".
        brain_file = brain_files(bidx);
        % Operate on BRAIN_FILE.  Write results to BRAIN_OUT_DIR.  Provider
        % helper functions for reading/writing brain files.
        unit_of_work(brain_file, brain_out_dir, @helper_fcn)
    end
catch E
    % Found an error.  Display the last error and return failed.
    disp(E.message)
    TF = false;
end
function unit_of_work(bfile, brain_out_dir, hfcns)
brain_file = fullfile(bfile.folder,bfile.name);
% Read in brain file
hfcns.reader_fcn(brain_file)
%%%%% CAUTION %%%%%
% For this example, let's assume the variables "ch1" and "ch2" were
% stored in the file.
ch1 = rand(1,1000);
ch2 = rand(1,1000);
% Work with brainwaves / perform statistical analysis
...
% Write results (for example, channel variables)
% Need to determine unique output file name.  In this case, we'll use
% the name of the input brainwave file and concatenate a suffix
[~, ifile] = fileparts(bfile.name);
rfile = fullfile(brain_out_dir,ifile + "_results");
hfcns.writer_fcn(rfile,"ch1","ch2")
end

OK, here's the tricky part here (CAUTION). You need reader_fcn to return the same format of data so that you can work on the data "in the blind". By that I mean, load will load variables stored in the MAT-file. But readmatrix will assign the data to your output variable, for instance.

% MAT-file (load individual variables into the workspace)
hfcns.reader_fcn("foo.mat");
% MAT-file (load all variables into the structure A)
A = hfcns.reader_fcn("foo.mat");
% CSV file (load all data into the variable A)
A = hfcns.reader_fcn("foo.csv");

In the first example, MATLAB automagically import variables ("ch1", "ch2", etc.). But for the CSV file, we store everything in the variable A. You need to adapt to both so that the rest of your code isn't switching between working with MAT-files or working with CSV files.

The webpage I listed above in the comments gives a whole slew of file formats and output types (cell arrays, tables, etc.). You'll want to give some thought how best to design this so that your code is as flexible as possible. One option is to store data in your MAT-files as a single table. Then you'll know if your data is a structure, you can index into to get the table. For example

T = hfcns.reader(brain_file);
% T is either a struct containing a table (read from a MAT-file) or it is a
% table (read from a CSV file);
if isa(T, 'struct')
    % The variable "T" is a struct that contains all the variables in the
    % MAT-file.  In this case, there's only one variable, also called "T"
    % and is a table.  The table T contains all the channels.
    T = T.T;
end
% At this point now, regardless of how we read the data in, we have a
% variable T, which is of type table.  The columns are the channels.

You'll want to do something similar for writing your results as well if you don't want to only write MAT-file. This type of obfuscating allows you to operate on a whole slew of data formats (databases, HDF5, etc.) without unit_of_work ever knowing what it's working on.

Christopher McCausland on 4 May 2022

Open in MATLAB Online

Hi Raymond,

I understand what you mean now, thank you!

I lost access to the cluster for a few days while it was shut down for upgrades so I am finally able to do more testing with the code you outlined. One issue that I am facing is that I get the following error when running;

>> diary(ans)
--- Start Diary ---
--- End Diary ---
Task with properties: 
ID: 1
State: finished
Function: @parallel.internal.cluster.executeFunction
Parent: Job 1
SchedulerID: 6720
StartDateTime: 04-May-2022 11:35:20
RunningDuration: 0 days 0h 0m 4s
Error: none
Warnings: Worker unable to add the following folders to the MATLAB search path at the start of the job:
<none>
This can occur when the worker has a different file system to the client. Try one of the following:
* Do not include these folders in the 'AdditionalPaths' parameter when creating a job.
* Do not include these folders in the 'AdditionalPaths' field of the cluster profile.
* Set the 'AutoAddClientPath' parameter to false when creating a job to prevent adding folders from your client's MATLAB search path.
Warning Stack: JobPathHelper>JobPathHelper.addAdditionalPaths (line 108)
dctEvaluateTask>iAddJobDependencies (line 473)
dctEvaluateTask>iEvaluateTask/nEvaluateTask (line 206)
dctEvaluateTask>iEvaluateTask (line 175)
dctEvaluateTask (line 81)
distcomp_evaluate_filetask_core>iDoTask (line 154)
distcomp_evaluate_filetask_core (line 52)
distcomp_evaluate_filetask (line 17)
 Task with properties: 
                   ID: 2
                State: finished
             Function: @parallel.internal.pool.poolWorkerFcn
               Parent: Job 1
          SchedulerID: 6720
        StartDateTime: 04-May-2022 11:35:20
      RunningDuration: 0 days 0h 0m 6s
                Error: none
             Warnings: Worker unable to add the following folders to the MATLAB search path at the start of the job:
                           <none>
                         This can occur when the worker has a different file system to the client. Try one of the following:
                           * Do not include these folders in the 'AdditionalPaths' parameter when creating a job.
                           * Do not include these folders in the 'AdditionalPaths' field of the cluster profile.
                           * Set the 'AutoAddClientPath' parameter to false when creating a job to prevent adding folders from your client's MATLAB search path.
        Warning Stack: JobPathHelper>JobPathHelper.addAdditionalPaths (line 108)
                       dctEvaluateTask>iAddJobDependencies (line 473)
                       dctEvaluateTask>iEvaluateTask/nEvaluateTask (line 206)
                       dctEvaluateTask>iEvaluateTask (line 175)
                       dctEvaluateTask (line 81)
                       distcomp_evaluate_filetask_core>iDoTask (line 154)
                       distcomp_evaluate_filetask_core (line 52)
                       distcomp_evaluate_filetask (line 17)

I can tell that the error is probably related to idir,odir not being added to the search path however I don't understand why they aren't and the warning seems to conflict if they are or not with the <none> handle.

I was going to add these paths with 'AdditionalPaths' and I might give this a go anyways however I can see the output log warning me not to do this. As the whole thing finishes in under ten seconds its not running as it should but returns as finished with these two warnings rather than errors even though the code hasn't completed as expected.

I am hoping this it the last thing and I can leave you in peace soon!

Kind regards,

Christopher

Raymond Norris on 4 May 2022

Open in MATLAB Online

As noted, this is a warning (not error) and is a red harring. You could reproduce this as such

>> c = parcluster("local");
>> j = c.batch(@pwd,1,{},'AdditionalPaths','\\does\not\exist');
>> j.wait
>> j.Tasks(1)
ans = 
Task with properties: 
ID: 1
State: finished
Function: @parallel.internal.cluster.executeFunction
Parent: Job 26
StartDateTime: 04-May-2022 11:13:34
RunningDuration: 0 days 0h 0m 2s
Error: none
Warnings: Worker unable to add the following folders to the MATLAB search path at the start of the job:
\\does\not\exist
This can occur when the worker has a different file system to the client. Try one of the following:
* Do not include these folders in the 'AdditionalPaths' parameter when creating a job.
* Do not include these folders in the 'AdditionalPaths' field of the cluster profile.
* Set the 'AutoAddClientPath' parameter to false when creating a job to prevent adding folders from your client's MATLAB search path.

If the workers couldn't find the idir, then brain_files should be empty and MATLAB should throw the error message

Failed to find any brainwave files in "/path/to/files".

Next, we don't see any error from creating the output folder, so I'm assuming we can get to the parfor-loop. What I would suggest doing is putting disp statements throughout the code to see how far things are getting (including in and after the parfor-loop), and displaying the size of brain_files -- is it the number of files you expected? Display the name of the results file (should be the full path) -- is it getting stored where you think it is?

Christopher McCausland on 10 May 2022

Hi Raymond,

As a final round up for anyone following, to find your directories locally on clusters the easiest method is just to use home -> Parallel -> manage clusters -> select relevent cluster -> properties -> edit -> files and folders -> manually specify folders to add to the workers search path OR use AdditionalPaths.

This will add the paths for workers and is quite a nice way of doing things!

Lastly, Raymond, thank you so much for all the help and being so patient, I really appreciate all the time and effort you put in, i've learnt a lot from you! The in depth answers were brillent!

Kind regards,

Christopher

Raymond Norris on 10 May 2022

Keep in mind that if you place the additional folder names in the profile, they will be used for each job you submit to the cluster. Adding it to the call to batch explicitly sets it for that job. In the case of adding paths, there's no overhead to speak of. However, wait until you need to debug a job where you can't understand why a job fails, only to discover that you included another path (listed in the profile) that was shadowing your other function. Listing the additional paths in the call to batch doesn't solve this issue, but it hopefully at least puts it in your face that you are adding /home/cmcausland/work/... to your job.

Sign in to comment.

Parrellel server dir() for files local to the server

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

13 Comments
Show 11 older commentsHide 11 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Parrellel server dir() for files local to the server

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

13 Comments Show 11 older commentsHide 11 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

13 Comments
Show 11 older commentsHide 11 older comments