Save a large array into equal length .csv files?
15 views (last 30 days)
Show older comments
Hi Guys, I am trying to save an adjusted very large data set into equal length .csv files. I am using the following script from this link with my own database:
%%Step 1 - create a tall table
varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds1 = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
'SelectedVariableNames', varnames);
tt = tall(ds1);
%%Step 2 - operate on tall table
tt.TotalDelay = tt.ArrDelay + tt.DepDelay;
%%Step 3 - use tall/write to emit .mat files
writeDir = tempname
mkdir(writeDir);
write(writeDir, tt);
%%Step 4 - use parfor to parallelise the writetable loop
ds = datastore(writeDir);
N = numpartitions(ds, gcp);
csvDir2 = tempname
mkdir(csvDir2);
parfor idx1 = 1 : N
idx2 = 0;
subds = partition(ds, N, idx1);
while hasdata(subds)
idx2 = 1 + idx2;
fname = fullfile(csvDir2, sprintf('out_%06d_%06d.csv', idx1, idx2));
writetable(read(subds), fname);
end
end
I am adapting the script in step 4 to the following in order to specify that each .csv file has 20000 rows:
RequiredDataRowsPerFile = 20000;
ds = datastore(writeDir,'ReadSize',RequiredDataRowsPerFile);
It works to some degree as there is an impact; however, the outcome does not generate an equal distribution of .csv files in terms of number of rows (of course the last file will always be different).
I would appreciate any help. Thanks
Tim
0 Comments
Answers (0)
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!