Why is save/load write/read soooo slooooow?

I used a save myworkspace at a point when I want to interrupt a long process, and sometime later I do a load myworkspace to restore the interrupt point and continue.
The typical mat file produced by this procedure is 18 GB. The save takes 1200 seconds, the load takes 350 seconds. The times are the same using a single HDD or a SSD cluster in Raid0 configuration.
On the same computer I can use the OS to copy a 18 GB file in 65 seconds on HDD, 30 seconds on SSD.
So it appears that Matlab save/load have some serious bottlenecks. Is there anything I can do differently? Is this something I should raise to Mathworks Support?

9 Comments

Cedric
Cedric on 3 Sep 2013
Edited: Cedric on 3 Sep 2013
And how long does Excel take to open or save a 18GB file? If you think about it, you'll realize that it takes some time to encode/decode data, which adds an overhead (that can be quite significant) to basic disk I/O operations.
It doesn't seem reasonable that encoding should take so long. Maybe a lot of time is spent in data compression - even when there is little gained by doing so. There should be a save function with extra smarts or switches to enable faster r/w times.
Cedric
Cedric on 8 Sep 2013
Edited: Cedric on 8 Sep 2013
I am not absolutely certain (because it depends on MATLAB version and save switches), but I think that the lib underlying save/load is HDF5. You could try using it directly, which would give you a better control.
I am not sure how to proceed. Mostly I am using save and load to dump and restore the matlab workspace. So as you know, there is not a whole lot of brainy work needed. Simply execute save MYWORK. Come back a day later, execute load MYWORK and I am back where I stopped earlier.
Can I do that with HDF5 calls? I am not a frequent HDF5 user, not sure how to follow up on your suggestion.
Anyway, thanks for the tip. Does it mean save is not efficient for regular data i/o? Or is the problem only with saving workspace to a mat file?
netCDF and HDF5 are the two major libs/tools for managing large datasets storage/retrieval. They are both supported by MATLAB through sets of low to high level functions:
From there, you'll have to perform a few tests saving/loading specific variables using regular save/load vs HDF5 functions (vs. netCDF), because I cannot tell you what improvement you'll get specifically with your setup and data. A last option could be to build your own C/MEX export function. But in all cases, I guess that you'll have to work/learn/test quite a bit before you get a real improvement, because nothing is straightforward with 18GB files.
Thanks. Another reader pointed me to function savefast in FileExchange: 39721-save-mat-files-more-quickly. It in fact uses hdf5 calls. I got 10-20X improvements on test data. So there is an answer, but only partially. The routine does not handle structures. When it sees a structure in the list of items it passes it on to matlab save.
I think the real answer is to ask Mathworks to include a '-no compression' switch in a future release.
I think that even this partial answer will be useful information for the next persons who are facing this issue and who find your thread!

Sign in to comment.

 Accepted Answer

Doc says:
save(filename, ..., version) saves to MAT-files in the specified version:
'-v4', '-v6', '-v7',or '-v7.3'.
and
'-v7.3'7.3 (R2006b) or laterVersion 7.0 features plus support for data
items greater than or equal to 2 GB on 64-bit systems.
18GB requires v7.3, which is slow. v6 is significantly faster. v6 might be an alternative if you can split the data into <2GB chunks.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!