Big data processing, Datastore function

3 views (last 30 days)
Vincent Thevenot
Vincent Thevenot on 31 May 2015
Answered: Aaditya Kalsi on 1 Jun 2015
Hi,
I have to deal with big files with 3 tabular spaced columns. But I’m out of memory, the files contain several millions of rows. So I try to use "datastore" function, and it works very well, but Matlab return an error when the file contains more than 594000 rows.
Here is the message :
Error using matlab.io.datastore.TabularTextDatastore/read (line 41)
The data in Files does not appear to be tabular, with the same number of fields in each row and in each column. Verify the Text Format and Advanced Text Format
Properties.
Error in test_datastore (line 17)
s=read(ds);
It seems to be a problem with the format, but I tried with different part of the file, and Matlab always return this message if there is more than 594000 rows.
Here is my code (very simple, just to test the function) :
ds=datastore('essai_RM7_1_test_3.txt','ReadVariableNames',0,'TextscanFormats',{'%q','%f','%f'},'RowDelimiter',' ');
ds.RowsPerRead = 100000;
count = 0;
while hasdata(ds)
s=read(ds);
count = count + 1
end
count
Here is some rows of the file :
24/04/2015 09:58:06.220351 -1.143072E-2 1.277841E-1
24/04/2015 09:58:06.220957 2.736964E-3 9.289337E-2
24/04/2015 09:58:06.221562 -7.244674E-3 3.169246E-2
24/04/2015 09:58:06.222167 2.487282E-2 -6.050338E-2
24/04/2015 09:58:06.222773 1.344811E-1 -1.312878E-1
24/04/2015 09:58:06.223378 7.464026E-2 -1.944335E-1
24/04/2015 09:58:06.223984 -6.966816E-2 -2.088179E-1
24/04/2015 09:58:06.224589 -5.196927E-2 -1.842140E-1
24/04/2015 09:58:06.225195 6.998909E-2 -1.819939E-1
So, does anybody encountered this kind of problem ? Is there a different way to deal with such a big file ? I have to perform different calculus (FFT, RMS, …)
Thanks in advance for your help

Answers (1)

Aaditya Kalsi
Aaditya Kalsi on 1 Jun 2015
It seems like there is an issue with the data within the file at around row 594000. You could try:
while hasdata(ds)
[s, info]=read(ds);
disp(info); % DISPLAY CURRENT STATE
count = count + 1
end
This will tell you where was the last successful read.
I have a suspicion that the second file is different from the first and that is the error you are seeing.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!