Converting partial strings using datetime
5 views (last 30 days)
Show older comments
I am working with a large CSV datafile (16GB) and one column includes the date and time, for example:
{'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)'}
I'm currently using code like this to import from the CSV:
% formats for importing data from CSV
tsFormats = {'%q'};
% make datastore and tall array object for data
ds = datastore("input.csv", ...
"TreatAsMissing", [".","NA"],'Delimiter', ',',...
"DatetimeLocale","en_GB",'TextscanFormats',tsFormats);
tall_ds = tall(ds);
% add datetime field
tall_ds.Datetime = datetime(tall_ds.timestamp,...
"InputFormat",'eee MMM d yyyy H:mm:ss ''GMT''Z ''(Australian Eastern Standard Time)''',...
'TimeZone','Australia/Sydney');
However, not all of the times are Eastern Standard time (some are Eastern Daylight Time) and in future not all data will be in this locale/timezone (i.e. they might be in Australia/Perth).
The format string I have used results in NaT for Australian Eastern Daylight Time, so I re-do the conversion for all rows that have NaT instead of a valid datetime.
% re-do for NaT datetimes
NaTs = isnat(tall_ds.Datetime); % find NaTs
tall_ds.Datetime(NaTs) = datetime(tall_ds.timestamp(NaTs),...
"InputFormat",'eee MMM d yyyy H:mm:ss ''GMT''Z ''(Australian Eastern Daylight Time)''',...
'TimeZone','Australia/Sydney');
Is there a 'better' way to do this, going forward? Especially if I want to make it resistant to changes in location (e.g. to Perth or elsewhere in the world), and minimise extra passes through the data. (I know MATLAB minimises passes through the data when using gather, but it seems silly to have to do this isnat call.)
Ideally I'd probably just cut the string after the timezone offset using something like extractbefore.
2 Comments
Siddharth Bhutiya
on 7 Aug 2023
I think since you are any way ignoring the text that contains the name, your approach of using extractBefore to trim that part out before conversion seems like the best way to tackle this. Extract the timestamp before "(" and then use a single datetime call to convert all the timestamps. Something like the code below
str = {'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)','Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Daylight Time)'};
str = extractBefore(str,"(");
dt = datetime(str,InputFormat='eee MMM d yyyy H:mm:ss ''GMT''Z',TimeZone='Australia/Sydney')
Accepted Answer
Stephen23
on 7 Aug 2023
Edited: Stephen23
on 7 Aug 2023
"Especially if I want to make it resistant to changes in location (e.g. to Perth or elsewhere in the world), and minimise extra passes through the data."
Then work in UTC:
C = {'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)','Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Daylight Time)'};
C = regexprep(C,"\s*\(.+$","")
D = datetime(C, "InputFormat","eee MMM d u H:m:s 'GMT'Z", "Timezone","UTC")
More Answers (0)
See Also
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!