How can i save a large structure array without performace issues?

Hello,
we are importing data with an .Net-dll into Matlab. The data will be saved in a structure like this.
for measure_id = 1:length(messungen)
...
for signal = 1:signal_names.Length
signal_name = signal_names(signal);
...
for RPC_Nr = 1:step.Length
for shot_idx ...
M(measure_id).ETCurve(ETC_Nr).(sprintf('%s', char(signal_name)))(shot_idx).data = DATA
end
M(measure_id).RPCurve(RPC_Nr).(sprintf('%s_t', char(signal_name))) = DATA
...
end
end
...
if measure_id < length(messungen)
save('data.mat', 'M', '-v7.3')
end
end
There are 6 different structure types for the "first level", like M(1).ETCurve, M(1).RPCurve, M(1).TP, ...
With this structure it is easy to acces the needed data like M(1).RPCurve(2).Q or M(1).ETCurve(2).U(100).data and so on.
One measurement has about 0,5 to 1,5 GB of data, when saved to a mat-file. After the 4th measurement saving and loading the workspace will take some time and after the 6th measurement it will take more than 30 minutes...
Has anyone an idea, wyh this will get so slow and can i improve the performance.
Tanks

5 Comments

I don't have a specific suggestion for you, I'll just point out that for data files that large you'll almost certainly want to manage them differently. In particular, due to RAM limitations, you should probably not be loading all this data into memory at once There are alternative techniques for loading only part of a data set at a time. Start here.
Ok, RAM limitations are currently not a problem. We have 32GB RAM, a XEON E3-1505M v5 and local SSD storage.
I checked the Task Manager in Windows 10 and during saving the CPU load is high, RAM and disc are low.
When i add the '-nocompression' flag in the save command, saving is abount 20% faster.
Sorry, I shouldn't have said only "due to RAM limitations". Read-write and CPU limitations are just as relevant considerations for why you may want to consider an alternative approach to working with such large data files.
Having said that, if you're insistent on using this approach, you might try running the profiler to see what exactly is slowing things down. For example, is it the saving of the file? Or is it something in the construction of the variable M itself?
To me it looks like you overwrite the variable named M in every iteration.
If each variable value is less than 2GB version, v7, is a faster alternative than v7.3
Thank you per isakson.
I changed the whole import stuff to the variable M. Before saving i create a variable M1, M2, M3, ... instead of overwriting the list of M(:).
This is much faster and has a linear behavior with the number of measurements.
...
M.RPCurve(RPC_Nr).(sprintf('%s_t', char(signal_name))) = DATA
...
varName = sprintf('M%d', measure_idx);
eval([varName ' = M;'])
tic
if exist(workspace, 'file') == 2
save(workspace, varName, '-append')
else
save(workspace, varName, '-v7.3')
end
toc

Sign in to comment.

 Accepted Answer

I changed the whole import stuff to the variable M. Before saving i create a variable M1, M2, M3, ... and append this variable to the workspace, instead of overwriting the list of M(:).
This is much faster.
% import stuff
...
M.RPCurve(RPC_Nr).(sprintf('%s_t', char(signal_name))) = DATA
...
% save measurement
varName = sprintf('M%d', measure_idx);
eval([varName ' = M;'])
if exist(workspace, 'file') == 2
save(workspace, varName, '-append')
else
save(workspace, varName, '-v7.3')
end

More Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!