Is it possible to create a sparse binary (.bin) file on disk?
Show older comments
I have a project where I would like to save my results to a binary (.bin) file that is stored on disk. Results need to be saved as they are generated (so that memory can be cleared), but the order in which these results are added to the binary file is not necessarily sequential (e.g., first I write to bytes 1-100, then 1001-1100, then 301-400, etc.).
In order to write non-sequentially to a binary file, I believe that file needs to be pre-allocated on the disk in some form or another. Is it possible to create a "sparse" binary file that has an area on disk set aside but which does not require writing zeros to every bit in the .bin file? I know how many bytes the file will take up when I am done saving to it, so this isnt a problem. Alternately, is there a way for me to write non-sequentially to a binary file without pre-allocating it first?
Thanks.
Accepted Answer
More Answers (2)
Jan
on 13 Mar 2017
You can use this to expand (or shrink) a file efficiently: FEX: FileResize. It is twice as fast as appending zeros with fwrite.
function InsertData(File, Data, Format, Pos)
fid = fopen(File, 'r+');
if fid == -1
error('*** %s: Cannot open file: %s', mfilename, File);
end
fseek(fid, 0, 1); % Spool to end
Len = ftell(fid);
if Pos > Len
FileResize(File, Pos);
end
fwrite(fid, Data, Format);
fclose(fid);
end
If multiple worker write to the same file... Hm. I'm not sure what happens, when two works access the same file and one writes into the section which is expanded by the other currently.
What about inventing your own "sparse" file format?
function InsertData(File, Data, Format, Pos)
fid = fopen(File, 'a');
if fid == -1
error('*** %s: Cannot open file: %s', mfilename, File);
end
Header = [ndims(data), size(data)];
fwrite(fid, Header, 'uint64');
fwrite(fid, Data, Format);
fclose(fid);
end
A method for reading or creating full files in a post-processing will be equivalently easy. The file is read or spooled in blocks afterwards, but this will not be dramatically slower.
1 Comment
Anthony Barone
on 13 Mar 2017
Walter Roberson
on 10 Mar 2017
0 votes
Unfortunately, No.
The POSIX standard operation that allows for sparse files is to fseek() to a location past end of file and write data there; the file system is then permitted to leave "holes" in the parts where nothing has been written.
Unfortunately, in MATLAB, if you fseek() beyond the end of file, the location "sticks" at the end of file.
Therefore, in MATLAB, if you want to write to a scattered location, the general write procedure is:
- fopen() without the 't' (text) attribute (important!), with 'a' access (not 'w' or 'w+' or 'a+' for this purpose)
- fseek() to end of file
- ftell() to determine the position of the end of file, in bytes
- if the current end of file is before the place you need to be, fwrite() 0's to the place you need to be; otherwise fseek() to the place you need to be
- fwrite() the data you want
The general read procedure is:
- fopen() without the 't' (text) attribute (important!), with 'r' or 'a' or 'a+' access (not 'w' or 'w+') -- it is fine to keep the file open with 'a' access for reading and writing
- fseek() to the position you need to be
- ftell() to determine the position you ended up in, in bytes
- if the current position is before the place you need to be, the data has not been written yet, so act appropriately
- otherwise fread() the data, keeping in mind that you might encounter end of file if you were not consistent about the blocksize -- or even if the end of file happened to be exactly at the place you want to start reading
You can modify this procedure to test that the entire block of data is available before you read it.
3 Comments
Anthony Barone
on 13 Mar 2017
Walter Roberson
on 13 Mar 2017
Anthony Barone
on 13 Mar 2017
Categories
Find more on Low-Level File I/O in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!