Saving very large data. How do preallocate and stream to disk?
3 views (last 30 days)
Show older comments
Hello all. I am working with a specific data set and could use a few tips. These are images of objects moving in a movie. Because the data is taking up so much hard drive space, we are coming up with ways to minimize this storage problem. Basically we find the pixels of interest in each frame from of the movie and save the pixel position and intensity as a sparse matrix. This seems to work well as we can then recreate everything that we were actually interested in from the original movie. I'm not clear what the best way is to save these consecutive sparse matrices. I can store them all in a cell array or structure, but since they each have a different number of pixels of interest per frame, I'm not sure how to preallocate for speed and memory issues. Additionally, since we are going through many frames of the movie (and multiple movies in parallel), it would be great to stream this data to be saved on the hard drive, rather than store in Matlab's memory. Thanks for any advice!
0 Comments
Answers (1)
Cedric
on 6 May 2014
Edited: Cedric
on 6 May 2014
I would avoid using sparse matrices unless you really need matrices for computing, and store indices and values in numeric arrays that can be preallocated, even by block (e.g. by 100MB increments). I would also avoid using cell arrays. Instead, I would use a frame index, e.g.
X = [ 10, 5, 22, 1, 8, 9, ... ]
Y = [ 5, 8, 20, 30, 7, 6, ... ]
I = [ .... ]
F = [ 1, 1, 1, 2, 3, 3, ... ]
where F is the frame ID. This way I would deal only with the simplest and most efficient data structure available in MATLAB.
If you want to optimize memory usage, you can also prealloc these arrays using the smallest (size-wise) suitable class (e.g. uint8 or 16 for X and Y), but I suspect that it will introduce a time overhead. It will also prevent you from computing directly with the stored values (except for basic operations, you'll have to type cast relevant blocks back to double first).
Note 1: if you perform comparisons with sparse matrices (as mentioned in comments below), keep in mind that in MATLAB sparse matrices are only working with doubles. Also, you could build a "smarter" way of coding elements positions (typically the one used internally for storing sparse matrices [ ref ]), but encoding would take time. Yet, you have this possibility to exchange time efficiency for memory efficiency.
4 Comments
Cedric
on 6 May 2014
Edited: Cedric
on 6 May 2014
I edited my answer while you were commenting, so Jose-Luis wrote basically what is in the second part of my answer.
Whether it takes more space than sparse matrices depends what you were doing with sparse matrices. In fact, a sparse matrix is roughly a set of three vectors: non-zero values, row indices, and column start positions [ ref ].
Cedric
on 6 May 2014
.. but wait a day or two before accepting any answer, because there is certainly room for discussion here.
See Also
Categories
Find more on Performance and Memory in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!