applying time range to multiple txt files very slow
1 view (last 30 days)
Show older comments
Hi there,
I have a large set of ".txt" data files. I then apply timerange to extract data between specific dates and times. My script looks something like this:
warning off
ds_loc = 'Z:\data\*.txt';
ds = datastore(ds_loc);
ds.ReadSize = 1000000;
ds.Delimiter = ' ';
ds.MultipleDelimitersAsOne = 1;
ds.SelectedFormats(1) = {'%{dd/MM/yyyy HH:mm:ss}D'};
warning on
% create time table
tt = tall(ds);
ttab = table2timetable(tt)
strt_time = '03/24/2018 10:00:00'
end_time = '03/25/2018 00:00:00'
warning off
S1 = timerange(strt_time,end_time);
warning on
ttab(S1,:)
The above script takes a long time to execute depending on the number of files in the datastore location i.e. "Z:\data". Is there a better way do this?
7 Comments
dpb
on 22 Aug 2018
Edited: dpb
on 22 Aug 2018
I was just commenting on the problem with sequence of defining the date format...it seems as though datastore reads data (how much I've no idea) to infer format on creation of the object but you can't tell it a priori what the date format is but have to do that with a property internal to the object. That means, it would seem, that if it gets it wrong it has to recompute or reread all that information that's a waste of time; if it did get it right at least that part is ok but historically when a format wasn't given processing was significantly longer than when one was; I don't know if that effect is true here or not.
As far as speeding up the retrieval, I don't have any real suggestions as I've not had opportunity to try to use any of the large data tools "in anger" so don't know their idiosyncracies at all.
Just how big are the files and how many are there? Might it possibly turn out to be faster to simply loop through them explicitly rather than using the overhead of the magic behind the scenes datastore object?
Are they all the same form or does the index vector have to be updated for each file? It appears that timerange makes the assumption of a fixed index across the population.
Answers (0)
See Also
Categories
Find more on Data Preprocessing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!