Best way to read in massive amount of data from csv files?
4 views (last 30 days)
Show older comments
I am working with two CSV files ~25 GB each. When I read in one file all at once, I get a vector of size 9.8 GB. I only have about 24 GB of RAM, and with two vectors and further computations it is putting quite a strain on my computer. I was wondering if it was better in this case to read in the files piece by piece, and keep going back to them to read in the next data segment, or if I should load in all the data into memory at once? Either way I have to go through all the data, and timing is a consideration since at the present moment it takes nearly 20 minutes for my computer to read in one entire file into a vector. I imagine this time would increase were I to constantly go back and make more, albeit smaller, calls to csv read with row indexing?
0 Comments
Answers (1)
Stephane Dauvillier
on 25 Jun 2019
Hi,
If you have huge data file(s), you may want to look at datastore.
First datastore can apply on fileS, folder.
Then if you don't specify it, datastore wit not deal file by file but block by block and will simply treat another file when the current one is finish.
For column data, you can specify which column to really import (very effective if you know you only wants some of the columns and not everyone).
Is your files have the same number of column and they contains the same "data" (I mean for instance column 1 in your two file represent the ssame observation like Name, age, height, ....)?
Look at the documentation page
0 Comments
See Also
Categories
Find more on Sources in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!