How to update variable within a matfile inside a parfor loop?

8 views (last 30 days)
I have a very expensive loop that I'm trying to parallelize, and part of this loop involves updating an entry in a 4D array inside a matfile (I must save results to disk and access them through a matfile pointer due to RAM limitations). However, I get an error that says that the matfile pointer variable cannot be classified. As an illustrative example of what I'm trying to do, consider the code below:
testOut = []; % create variable to try and update
save('TestFile.mat', 'testOut', '-v7.3'); % save variable into accessible matfile
FileOut = matfile('TestFile.mat','Writable',true); % Set up pointer to matfile
nrows = 200; ncols = 200; nplanes = 32; nvolumes = 10; % Set up dimensions of 4D array
FileOut.testOut = single(zeros(nrows,ncols,nplanes,nvolumes)); % Set initial size of variable in matfile
parfor i = 1:nvolumes
FileOut.testOut(:,:,:,i) = i; % artificial example, point is that I want to update variable using fourth dimension index
end
The error this code would report is:
Error: The variable FileOut in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview".
Basically, the loop is performing an independent calculation on a 3D volume image each time, and each resulting volume image is saved in a 4D array using the fourth dimension to mark volume image number. However, I often have thousands of these images, and so the 4D array must be saved to disk and accessed from a matfile to avoid overloading the RAM. I'd like to adapt my code to use parfor, but I can't figure out how to get parfor to play nicely with the matfile pointer. Can anyone help me out here, please? I understand that sliced variables must be used with parfor, and that variables of the form I'm using here are not allowed, but I can't figure out a solution...

Accepted Answer

Edric Ellis
Edric Ellis on 18 Oct 2016
It might be possible to overcome the "slicing" problems you're seeing here - but you're still left with the fundamental underlying problem that you're trying to get multiple worker processes to write to the same file concurrently. That is never going to work well, as the writes will conflict and almost inevitably corrupt your file.
What I'd suggest is having each worker write to a temporary .mat file during the parfor loop, and then run a post-processing stage to collect the results. (I'm presuming here that computing stuff to go into the file takes a long time, but accessing the data in the file is relatively inexpensive). I'm going to use the parallel.pool.Constant from R2015b, but the same can be achieved using the Worker Object Wrapper.
This example is a bit involved, hopefully it's clear what's going on. You'll need to adapt things a little to get them to work with your multi-dimensional data.
%%step 1: create a mat-file per worker using SPMD
spmd
myFname = tempname(); % each worker gets a unique filename
myMatfile = matfile(myFname, 'Writable', true);
end
%%step 2: create a parallel.pool.Constant from the 'Composite'
% This allows the worker-local variable to used inside PARFOR
myMatfileConstant = parallel.pool.Constant(myMatfile);
%%Step 3: run PARFOR
parfor idx = 1:100
resultToSave = idx * 100;
matfileObj = myMatfileConstant.Value;
% Append into 'testOut', storing the index
matfileObj.testOut(1, idx) = resultToSave;
matfileObj.gotResult(1, idx) = true;
end
%%Step 4: accumulate the results on the client
% Here we retrieve the filenames from 'myFname' Composite,
% and use them to accumulate the overall result
outmatfile = matfile('out.mat', 'Writable', true);
for idx = 1:numel(myFname)
workerFname = myFname{idx};
workerMatfile = matfile(workerFname);
workerOutSz = size(workerMatfile, 'testOut');
for jdx = 1:workerOutSz(2)
if workerMatfile.gotResult(1, jdx)
outmatfile.out(1, jdx) = workerMatfile.testOut(1, jdx);
end
end
end
  7 Comments
Aditya Nanda
Aditya Nanda on 13 Apr 2021
Hi Edric! this is a great answer. I have a very similar issue that I am struggling with-> the only difference being my "resulttoSave" is multidimenisonal
For instance, here is the code (this is Step 3 in the above example)
parfor idx = 1:100
resultToSave = someFunction;
matfileObj = myMatfileConstant.Value;
% Append into 'testOut', storing the index
matfileObj.testOut(1:n,1:t, idx) = resultToSave;
matfileObj.gotResult(1, idx) = true;
end
% %
The variable resultToSave is 3 dimensional (double ). When I run it like this, I get the error :
Variable resultToSave has 2 dimensions thus indexing in third dimension is not possbile. Please help. I am not experienced with using spmd. Thanks
Edric Ellis
Edric Ellis on 19 Apr 2021
Hi @Aditya Nanda, please could you post a new question with some slightly more detailed reproduction steps that reproduce the problem in a standalone way. (Feel free to pop a link in a comment here so I'll get notified).

Sign in to comment.

More Answers (1)

Jason Climer
Jason Climer on 11 Apr 2018
It's worth noting that accumulating the results into a matfile instead of the local memory is prohibitively slow.
  1 Comment
Timur Mokaev
Timur Mokaev on 11 Apr 2018
Sure, but I guess, here we consider the case when accumulated results may not fit in local memory. Also, if for some reason the whole numerical procedure crashes, one will lose all the results stored in local memory.

Sign in to comment.

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!