retrieve data containing date and time from text file

Question

0 votes

Hi all,

I encountered the following strange problem when trying to retrieve date and time and other data from a text string.

The following runs OK.

>>tmp = '460018898104779  6d0502fb0be52d  2018-09-27 23:59:48';
>>test2 = textscan(tmp, '%f%14c%{yyyy-MM-dd HH:mm:ss}D', 'Delimiter',{'\n'});

But when I append two more doubles to the text string and run

>>tmp = '460018898104779  6d0502fb0be52d  2018-09-27 23:59:48  113.944081  22.504949';
>>test2 = textscan(tmp, '%f%14c%{yyyy-MM-dd HH:mm:ss}D%f%f', 'Delimiter',{'\n'});

It gives the following error message:

>>Error using textscan
>>Unable to read the DATETIME data with the format 'yyyy-MM-dd HH:mm:ss'. If the data is not a time, use %q to get text data.

Could somebody help to clarify what has gone wrong? Thank you very much!

Bayern

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Walter Roberson on 29 Sep 2018

0 votes

You should use Delimiter '\t' rather than \n

8 Comments
Show 6 older comments Hide 6 older comments

Walter Roberson on 30 Sep 2018

Open in MATLAB Online

\n is newline. Neither of those strings contained newline, so you were effectively setting delimiter to end of string.

%f without a width reads as many characters as it can until what it sees is no longer a valid number, and then if not at end of line backs up to the last place that was the end of a valid number. %f recognizes both d and e as exponents, but only when followed by digits or + or - followed by digits for the exponent; otherwise it backs up. The tab in your string is not a valid digit so it knows to end reading the number there. But if you look at your posting, because tabs do not show up in the display, it looks like your line starts with

4600188981047796d0502fb0be52d

in which case it would read right through the 4600188981047796d0502 and then encounter the 'f', decide that f was not valid because you were already in an exponent (the 'd') and so would parse the number as 4600188981047796e+502 which overflows to infinity. It would then be left pointing to the 'f' in the buffer in that case.

Anyhow, after the %f you are positioned at the tab and are processing %14c . Because you did not explicitly set the Whitespace to exclude tab, the %c operation will skip forward over any whitespace at that point, and will read 14 characters starting from that point, which would leave you positioned at the second tab.

Now you hit the %D specification with an included blank. %D format with an included blank acts a bit oddly with respect to the Delimiter and Whitespace options. In all of the formats except %c or %[], when you encounter any character in the Delimiter list or the Whitespace list, the scanning of the field is to stop. But for %D format, it looks like the rules have been changed (not sure which release) so that scanning can continue if you encounter Whitespace but not if you encounter Delimiter -- and Delimiter normally includes space. So

textscan('2018-09-27 23:59:48', '%{yyyy-MM-dd HH:mm:ss}D')

is an error because the default Delimiter includes space and so scanning stops at the space between the two parts, and the part of the date collected does not match the format. And

textscan('2018-09-27 23:59:48', '%{yyyy-MM-dd HH:mm:ss}D', 'whitespace', '\n')

fails as well, because space is still in Delimiters list. But

textscan('2018-09-27 23:59:48', '%{yyyy-MM-dd HH:mm:ss}D', 'delimiter', '\n')

works because space is no longer in delimiter so somehow it does not terminate scanning of the %D field.

Though to be honest, my hypothesis does not explain why

textscan('2018-09-27 23:59:48', '%{yyyy-MM-dd HH:mm:ss}D', 'whitespace', '')

works.

Anyhow, with Delimiter set to \n, scanning of the %D is not stopping with the tab after the time portion and it is continuing on and pulling in the numeric field and the tab after that and the numeric field after that too, until it hits end of string, and then sending all of that to be matched against the %D

With Delimiter set to \t then it sees the tab after the time portion, stops scanning, passes in the date and space and time to the %D, gets a match, and is left positioned at the tab for processing of further fields.

dpb on 4 Oct 2018

Edited: dpb on 4 Oct 2018

Open in MATLAB Online

On the last example, it works because the one time textscan doesn't add the gratuitous blank into the whitespace string is when it is specified as the empty string and there's a format scan field given. Hence, the whitespace in the above does not include a blank and there's only one field so don't need one (a delimiter, that is).

Modify the example a little to have two fields to try to parse as:

s='3.14,2018-09-27 23:59:48';
>> textscan(s, '%f%{yyyy-MM-dd HH:mm:ss}D', 'whitespace', '')
Error using textscan
Unable to read the DATETIME data with the format 'yyyy-MM-dd HH:mm:ss'. If the data is not a time, use %q to
get text data.

fails expectedly because there's no delimiter to separate the two from each other.

textscan(s, '%f%{yyyy-MM-dd HH:mm:ss}D', 'whitespace', ',')
Error using textscan
Unable to read the DATETIME data with the format 'yyyy-MM-dd HH:mm:ss'. If the data is not a time, use %q to
get text data.

This is the conundrum--you need to specify the delimiter but textscan thinks it's smarter than you are so it adds a blank to the comma and then your point above kicks in.

We can show the above is the problem by not having a blank but another delimiter in the string--

s(s==' ')=char(9);  
textscan(s, '%f%{yyyy-MM-dd HH:mm:ss}D', 'whitespace', ',')
ans =
1×2 cell array
  {[3.1400]}    {[2018-09-27 23:59:48]}
>> ans{:}
ans =
  3.1400
ans = 
datetime
 2018-09-27 23:59:48
>>

and voila! now since there isn't a blank the date field is again parsed. I'm a little nonplussed myself here though that didn't have to fix up the %D format string; it still worked with the tab instead of blank once we got past the input parsing to return the full string.

It still would have been so much easier if TMW had kept the Fortran FORMAT model for formatted i/o instead of C's model.

bayern science on 4 Oct 2018

Many thanks for digging deep into the format dpb! You have made the thread more helpful!

dpb on 4 Oct 2018

NB: I corrected original mistake of 'whitespace' for 'delimiter' in first paragraph regarding textscan propensity to add the blank even if not user-provided.

Sign in to comment.

retrieve data containing date and time from text file

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

8 Comments
Show 6 older comments Hide 6 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

retrieve data containing date and time from text file

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

8 Comments Show 6 older comments Hide 6 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

8 Comments
Show 6 older comments Hide 6 older comments