MATLAB Answers

AA
0

Long time to load 40 gig file

Asked by AA
on 20 Oct 2017
Latest activity Commented on by Steven Lord
on 20 Oct 2017
Hi, I have a gaming laptop which I recently bought for 4000€. Massive CPU and lots of ram. However when I want to load a 30 gig matlab file from my hard drive which consists of variables with huge cell array tables, I still need a lot of time. How can I make this process faster? Is data store useful?

  4 Comments

Show 1 older comment
AA
on 20 Oct 2017
It is on an ssd
Jan
on 20 Oct 2017
Please mention the details: How long is "a lot of time"? What is the acceptable speed for you? Which MAT format is used?
Steven Lord
on 20 Oct 2017
There's some additional information that would be useful in trying to determine what's consuming most of the time and whether there's a better approach you can use.
  • Is this a text file, a binary file, a MAT-file, etc?
  • Do you need all the variables in the file at once or could you do what you need accessing each variable in turn, keeping only one at a time in memory?
  • Can you post a small sample (not all 30-40 GB, but 5-10 lines) of what the data looks like in the file, and whether the file has a consistent format throughout or if the format changes periodically
  • Copy and paste the command that you're using to try to read in the file, or describe the interactive process you're using (if you're using the Import Tool, for example.)
  • Can you clarify what "a lot of time" means: a couple minutes, half an hour, an hour, etc.?
  • Can you clarify what "lots of ram" means: how many GB?

Sign in to comment.

Tags

2 Answers

Answer by Edric Ellis
on 20 Oct 2017

You don't mention how you're loading the data right now. If you have Parallel Computing Toolbox, you can read the data in parallel using parfor together with datastore. Or, better still, use tall arrays, which can automatically take advantage of Parallel Computing Toolbox.

  0 Comments

Sign in to comment.


Answer by Jeremy Hughes on 20 Oct 2017

"cell array tables"
A good bet is that the cell arrays are the issue. Each cell of the array takes a 114 bytes of overhead. Without knowing anything about your data, except that you described them as tables, I suggest looking at the MATLAB datatype, table.
Data can be efficiently stored as tables for many common usages. I wish I could give a better answer, but I'd need to see what's in the file.
Cheers,
Jeremy

  0 Comments

Sign in to comment.