Save file sizes not lining up
11 views (last 30 days)
Show older comments
So I am attempting to save a relatively large work space. Before realizing I could use the '-7.3v' tag, I instead wrote a script to chop up the struct MATLAB wouldn't save, and save it in 5 slices. According to the warning prompt, MATLAB says the size of this struct is over 2 GB. The pecuilar part is this
- The 'slice and save' method produces 6 files with a sum size of about 1.3 GB, including the aforementioned 'Over 2 GB' struct (5 of the pieces actually come out to under a GB.
- When I did use the '-7.3v' tag in save, it took 12 minutes to save and resulted in a file which was 10 GB. Absolutely no idea where all this data is coming from.
Any insight would be appreciated.
0 Comments
Accepted Answer
dpb
on 18 Dec 2019
Any insight would depend upon knowing precisely what the "slice and dice" consisted of and the specifics of the struct
There's overhead in a struct; the more fields etc in it the more overhead to keep track. Presuming you saved pieces as arrays to rebuild fields would be one way I'd see. And, save has some overhead itself that is also undoubtedly somewhat more for higher level data forms than simple arrays. That is counteracted by compression algorithms for at least arrays; I'd presume also effective for content of struct variables. So, there is no easy answer.
>> x=linspace(0,1,1001); % arbitrary double vector
>> whos x % memory footprint
Name Size Bytes Class Attributes
x 1x1001 8008 double
>> save x x
>> whos -file x % reflects in memory size in storage
Name Size Bytes Class Attributes
x 1x1001 8008 double
>> dx=dir('x.mat') % file size < half actual in memory
dx =
struct with fields:
name: 'x.mat'
folder: 'C:\Users\Duane\Documents\MATLAB\Work'
date: '18-Dec-2019 13:46:01'
bytes: 3515
isdir: 0
datenum: 7.377775736226852e+05
>>
>> clear s % let's compare struct to array
>> s.x=x; % same content
>> whos s
Name Size Bytes Class Attributes
s 1x1 8184 struct
>> save s s % as before, reflects memory but 8184-8008 = 176 bytes overhead
>>
>> whos -file s.mat
Name Size Bytes Class Attributes
s 1x1 8184 struct
>> ds=dir('s.mat') % memory on disk
ds =
struct with fields:
name: 's.mat'
folder: 'C:\Users\Duane\Documents\MATLAB\Work'
date: '18-Dec-2019 13:38:54'
bytes: 3542
isdir: 0
datenum: 7.377775686805556e+05
>> ds.bytes-dx.bytes % additional overhead of 27 bytes. Recaptured most of the 176 in fact
ans =
27
>> s.y=rand(size(s.x)); % let's add another field
>> whos s % exactly double in memory
Name Size Bytes Class Attributes
s 1x1 16368 struct
>> save s s % now save new struct w/ two fields...
>> ds2=dir('s.mat')
ds2 =
struct with fields:
name: 's.mat'
folder: 'C:\Users\Duane\Documents\MATLAB\Work'
date: '18-Dec-2019 13:42:18'
bytes: 11132
isdir: 0
datenum: 7.377775710416667e+05
>>
the last shows sizable jump to support the second field in the struct relative to one field--of course the randomized data would likely contribute to that as well in not compressing so effectively, but I didn't pursue that difference
5 Comments
dpb
on 19 Dec 2019
A brainless test case for "plain-jane" array in a struct doesn't seem to exhibit the storage explosion solely owing to the number of field names, however. Seems as though something else must be at play--perhaps the structs are also structs or other complex data storage?
% script to explore storage explosion of number of fields in struct
N=200;
clear s sz
j=0;
for i=0:N
s.(num2str(i,'F%03d'))=rand(100);
if mod(i,10)==0
save s s
d=dir('s.mat');
j=j+1;
sz(j)=d.bytes;
end
end
[[1:j].' sz(:) [0;diff(sz(:))]]
results in
>> strucsave
ans =
1 75900 0
2 832879 756979
3 1590020 757141
4 2347039 757019
5 3104090 757051
6 3861162 757072
7 4618163 757001
8 5375220 757057
9 6132333 757113
10 6889418 757085
11 7646468 757050
12 8403449 756981
13 9160600 757151
14 9917598 756998
15 10674682 757084
16 11431799 757117
17 12188796 756997
18 12945953 757157
19 13702855 756902
20 14460026 757171
21 15216907 756881
>>
>> whos s
Name Size Bytes Class Attributes
s 1x1 16115376 struct
>>
Walter Roberson
on 19 Dec 2019
The version 5 .mat file format is documented, and the version 7 format is an extension of it (possibly a couple of more object types but otherwise compatible). The version 7.3 format is completely different though.
Unfortunately every struct entry needs to be stored separately, becuase it is permitted for S(1).field to be a different size and datatype from S(2).field
struct can be convenient for organizing storage, but they are not the most efficient of data storage. More efficient is arrays. You might even consider using tables instead of struct, as long as all of the fields with the same name are scalars of the same datatype: MATLAB stores tables as one cell array with an array for each variable, so there is only one datatype per variable instead of one datatype for each indexed instance of the variable.
More Answers (0)
See Also
Categories
Find more on Structures in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!