How to read, reshape, and write large data?

3 views (last 30 days)
Hello!
I have a data matrix data = m x n, which I want to transform into a single column vector data(:), and write this vector to an output file.
% read 100 rows of data
data = [];
for idx_row = 1:100
A = fscanf(fileID,formatSpec);
data = cat(1,data, A);
end
% Convert to int16
data =data*10^6;
data = int16(data);
% Write to file
fp = fopen([filepath 'data.dat'], 'wb');
fwrite(fp, data(:),'int16');
fclose(fp);
The problem is that the size of data is very large to fit in the memory (e.g. 100 x 1e10). And, each row of the data is saved in separate file, and I must read them separately.
I can read a single row, which works file, but when I try to add more rows, the computer runs out of memory rather quickly. :(
Also, when creating a large array to fill the data in runs into the same problem regarding out of memory -
data = nan(100,1e10)
Error using nan
Requested 100x10000000000 (7450.6GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive.
How can I make it work? Thanks in advance!
  2 Comments
Rik
Rik on 29 Jul 2021
If you don't have 8TB of RAM, you can't create such a large array (and even if you had, it could still be a problem, as memory needs to be contiguous). Using int16 to preallocate your array will help, but only by a factor 4.
You will have to do this chunk by chunk.
Chunru
Chunru on 29 Jul 2021
8TB is way too big for today's system. However, the array size is not exactly limited by the RAM size. It is limited by the virtual memory size that OS manages (which may use hard disk as part of memory hierarchy). Of course, the speed will be affected when data are exchanged between RAM and hard disk very frequently.

Sign in to comment.

Accepted Answer

Chunru
Chunru on 29 Jul 2021
Edited: Chunru on 29 Jul 2021
You can read a small portion each time and write to the file. This way you will not use a lot of memory.
blocksize = 1e6;
nfiles = 100;
for i=1:nfiles
% fileID(i) = fopen(...)
end
fp = fopen([filepath 'data.dat'], 'wb');
data = zeros(nfiles, blocksize);
% you may need special treatment for last block
for iblock=1:nblocks
for i = 1:nfiles
% for large file, using fwrite and fread for speed
% fscanf and fprintf are slow and take much more disk space
data(i, :) = fread(fid, ...); % read a block of data from each file
%A = fscanf(fileID,formatSpec);
%data = cat(1,data, A);
end
% write data
fwrite(fp, int(data(:)*1e6),'int16');
end
fclose all
  2 Comments
Rik
Rik on 29 Jul 2021
The problem is that you need the first element from every file, then the second element from every file, etc.
And about the coding style: I would suggest using fclose(fp);, instead of closing all files. That habit will get you when you do have multiple files open.
Chunru
Chunru on 29 Jul 2021
Instead of read first element from every file, we read a block of data from every file (obviously for speed). You don't need all data from a single file before doing partial reshaping. "fclose all" is a lazy way here as I am tired of another for loop to close all the files.

Sign in to comment.

More Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!