"readcell()" command does not read my entire file
Show older comments
I am trying to read a csv file that is 43000 kb in size with readcell.
The file is mixed with numbers and strings, and when I try to read a smaller csv file of the same type it reads it no problem.
When I read the bigger file it reads only part of it.
How can I solve this issue?
11 Comments
Steven Lord
on 24 Jun 2025
Can you show the readcell command you ran that only read in part of the CSV file as well as the line in the CSV file where readcell stops importing the data (as well as a few lines before and after that line for context)? You won't be able to attach the whole CSV file since it's big, but showing what's around where readcell stops (or extracting those lines to a new, smaller file and attaching that file, in case there are non-printable characters) may help in determining what stops readcell from reading in all the data.
dpb
on 24 Jun 2025
"You won't be able to attach the whole CSV file since it's big,..."
With things like this, it's simply not possible to answer without the data because the problem is going to be data-related -- unless there's a memory issue but that usually will give indications if so.
The alternative to try is to first see how many lines it does read (and is the last line complete?) and then retry with
C1=readcell('yourfile.csv','FileType','text'); % read with expected failure
N=height(C1); % how many lines did it return?
C2=readcell('yourfile.csv','FileType','text','NumHeaderLInes',N); % try to pick up from there
...
You can also try to subtract some number of lines from N to see about the idea of something in the file at the point of failure.
Or, alternatively, use readlines and then inspect the content around the point of failure...so many options...
dpb
on 25 Jun 2025
Did you try any of the above experiments?
Normally, one would get an "Out of Memory" error if memory really were the issue...suppose it is possible otherwise.
What error, if any do you get?
You can look into the MATLAB tools for large datasets including tall and supporting tools if memory really is the issue.
Walter Roberson
on 25 Jun 2025
There is an experiment you can do to determine whether the problem is due to file size, or due to file content.
Prepare a second version of the file with a bunch of the leading content removed, but still leaving in some. If you are able to read about as many records out of the second version as from the first version, then you are running out of memory. If instead the reading stops at the same content location, then you know that there is an issue with the content.
My earlier suggested experiment above with the second read should illustrate the same thing, particularly if as noted were to subtract a few lines from the value of N.
Although 43MB doesn't seem terribly big other than when add in the overhead of cell arrays.
Walter Roberson
on 25 Jun 2025
Maybe somehow the user is running out of Java memory, and increasing the Java memory might solve / delay the problem ?
Wouldn't you expect the user would have received the "OutOfMemoryError: Java heap space" error if so? Or a regular "OutOfMemory" error if actual memory limit?
I guess on the same track, user should check Preferences in Workspace and see if, perchance, the MATLAB array size limit is set on and to something less than 100% of RAM. Or, if checked, uncheck and see if will work if swap to disk space--if has modern machine using SSD instead of conventional, performance might not be too bad...
But, it would be better if would simply upload the few lines around the point of failure; surely there isn't anything that sensitive that without context anybody could make detrimental use of. Then again, of course, "policy is policy" despite reason/logic.
Tevel
on 26 Jun 2025
dpb
on 26 Jun 2025
"Regarding the Java heep space and RAM alocation, I don't know how to do any of that."
Click on the "Preferences" icon in the toolstrip "Environment" section and explore...all kinds of tweaks you can make there.
The Java heap memory setting is under "General" while the array size limit is under "Workspace"
" because I have switched to using "fopen" instead."
Of course, fopen by itself doesn't do anything except return a file handle; it takes other explicit code to acutally read the file content. It would be interesting to see the full code used...I was going to suggest one could revert to lower-level i/o as an alterntive, but lacking the file format that wasn't really much of an option.
It would be a very interesting exercise to understand if, indeed, MATLAB is failing to successfully read a file with readcell that it can read/store in memory otherwise; that would be fodder very significant to Mathworks in enchancing performance and finding/fixing wasteful memory use.
"Although 43MB doesn't seem terribly big other than when add in the overhead of cell arrays."
What are the dimensions of the CSV file -- how many variables and of what type per field? Are the string data fields of varying liength or some known size (or at least maximum)? How many rows would be typical?
It can be demonstrated about the overhead of a cell array for simple cases to get an estimate of how much memory should be required...
d=ones; md=whos('d');
c={d}; mc=whos('c');
fprintf('Double: %d, Cell: %d, Overhead: %d bytes\n',md.bytes, mc.bytes, mc.bytes-md.bytes)
Double: 8, Cell: 112, Overhead: 104 bytes
d=ones(1,2); md=whos('d');
c={d}; mc=whos('c');
fprintf('Double: %d, Cell: %d, Overhead: %d bytes\n',md.bytes, mc.bytes, mc.bytes-md.bytes)
Double: 16, Cell: 120, Overhead: 104 bytes
d=ones(2); md=whos('d');
c={d}; mc=whos('c');
fprintf('Double: %d, Cell: %d, Overhead: %d bytes\n',md.bytes, mc.bytes, mc.bytes-md.bytes)
Double: 32, Cell: 136, Overhead: 104 bytes
d=ones; d=[d d]; md=whos('d');
c=num2cell(d); mc=whos('c');
fprintf('Double: %d, Cell: %d, Overhead: %d bytes\n',md.bytes, mc.bytes, mc.bytes-md.bytes)
Double: 16, Cell: 224, Overhead: 208 bytes
From which one can deduce the cell array overhead is 104 bytes per cell element over the base data storage. The same can be shown for character arrays with 2 bytes/element instead of 8, of course.
Consequently, given today's typical memory footprint, an extra N*104 bytes per cell could begin to add up with very long and wide files...
But, to bring the same data into MATLAB as one variable array would require the same overhead to put the disparate types into a cell array so the internal footprint would be the same. @Tevel didn't tell/show us what alternate form was used with fopen; but if textscan can succeed while readcell fails, then there's a major flaw in @readcell as it (textscan) must return a cell array if data types are mixed as well.
Answers (0)
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!