textscan doesn't stop at blank space in txt file

6 views (last 30 days)
Josh Tome
Josh Tome on 20 Oct 2022
Edited: dpb on 22 Oct 2022
Hi, I'm trying to import data from a txt file using the textscan function. While I thought it was suppose to stop at the first blank space it sees, it seems to be grabbing data beyond the blank space. My Group1 should stop at the first blank space before "Events", but it includes "Events", "100", and "Subject".
I'm using the following code thus far..
[file_list, path_n] = uigetfile('.txt','Select the Files to Process','Multiselect','on');
fidi = fopen(file_list);
Group1 = textscan(fidi, '%s %s %s %f %s %s','HeaderLines',3, 'Delimiter','\t');
Attached is the txt file data:
  4 Comments
dpb
dpb on 21 Oct 2022
Edited: dpb on 21 Oct 2022
"...the blank space doesn't match the "%s" specifier (or so I believe),"
Well, that isn't correct assumption, either, a blank is a valid character as is any other. However, unless told different with the optional 'whitespace' named parameter, blanks are considered whitespace and ignored or treated as delimiters except for quoted strings in which they are significant.
Again the textscan doc Algorithms section states--
"When matching data to a text conversion specifier, textscan reads until it finds a delimiter or an end-of-line character."
But, the format spec was '%s %s %s %f %s %s' which gets reapplied over and over until it either fails or reaches the end of file. In this case it found the %s and a numeric it could convert, but then the following records fail.
Another alternative to parsing w/ textscan when such is known to be in the file is to just accept the error; and resynch the file pointer to the next expected record and then carry on with the next section format string. This can be tricky if the file doesn't have fixed-length records as the example; fgetl will get to the next EOL record, but depending upon file content, that may not include all of the next record to be scanned and trying to back up to the previous end of record isn't easily supported in stream files. In the particular file, however, with the failure in the header line, that would work and you could subsequently get the second group in the same open with textscan as
fidi = fopen(file_list);
fmt=[repmat('%s',1,3) '%f' repmat('%s',1,2)];
G1=textscan(fidi,fmt,'HeaderLines',3,'Delimiter','\t','collectoutput',1);
fmt=[repmat('%s',1,3) '%f' repmat('%s',1,1)];
fgetl(fidi); % resynch to BOL next header group
G2=textscan(fidi,fmt,'Delimiter','\t','collectoutput',1);
Personally, I'd still opt for higher level parsing tools instead of having to then put the above into something useful...
Walter Roberson
Walter Roberson on 21 Oct 2022
All textscan formats other than %c and %[] skip leading whitespace as defined by the Whitespace option (or default list of whitespace characters if no option was passed.) And %c is perfectly happy to read a space.
If you need a space to be rejected then you have two possibilities:
  • pass Whitespace option that does not include space; or
  • use %[^ ] taking into account that would be happy to gobble a number returning it as a character vector

Sign in to comment.

Answers (1)

dpb
dpb on 20 Oct 2022
opt=detectImportOptions(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'), ...
'numheaderlines',2, ...
'readvariablenames',1, ...
'delimiter','\t', ...
'expectednumvariables',6, ...
'missingrule','fill');
opt.VariableTypes(1)={'char'};
tG=readtable(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'),opt);
ix=find(contains(tG.Subject,'Events'));
tG=tG(1:ix-1,:);
[head(tG);tail(tG)]
ans = 16×6 table
Subject Context Name Value Units Description ______________ _________ _________________________ _______ _____________ ________________________ {'PluginGait'} {'Left' } {'Cadence' } 116.39 {'steps/min'} {0×0 char } {'PluginGait'} {'Left' } {'Walking Speed' } 1.3038 {'m/s' } {0×0 char } {'PluginGait'} {'Left' } {'Stride Time' } 1.031 {'s' } {0×0 char } {'PluginGait'} {'Left' } {'Step Time' } 0.551 {'s' } {0×0 char } {'PluginGait'} {'Left' } {'Opposite Foot Off' } 12.609 {'%' } {0×0 char } {'PluginGait'} {'Left' } {'Opposite Foot Contact'} 46.557 {'%' } {0×0 char } {'PluginGait'} {'Left' } {'Foot Off' } 62.076 {'%' } {0×0 char } {'PluginGait'} {'Left' } {'Single Support' } 0.35 {'s' } {0×0 char } {'PluginGait'} {'Right'} {'Single Support' } 0.391 {'s' } {0×0 char } {'PluginGait'} {'Right'} {'Double Support' } 0.309 {'s' } {0×0 char } {'PluginGait'} {'Right'} {'Stride Length' } 1.3652 {'m' } {0×0 char } {'PluginGait'} {'Right'} {'Step Length' } 0.651 {'m' } {0×0 char } {'PluginGait'} {'Right'} {'Step Width' } 0.19855 {'m' } {0×0 char } {'PluginGait'} {'Right'} {'Limp Index' } 1.0249 {0×0 char } {0×0 char } {'PluginGait'} {'Left' } {'GDI' } 75.701 {0×0 char } {'Gait Deviation Index'} {'PluginGait'} {'Right'} {'GDI' } 72.639 {0×0 char } {'Gait Deviation Index'}
Got to thinking -- each of the first two sections would make a great table -- and can import each in part directly. Unfortunately, readtable isn't set up to be able to read from memory...but thought it worthy of showing an import object and what could do.
"In anger" (as my old Scottish power plant testing engineer friend use to say) I'd still probably first read the file in in toto and use that to find the sections and then parse them.
The first two sections are pretty easy; not so sure about the "Devices" section -- the "Moment" section also looks ok although appears empty in this dataset.
  1 Comment
dpb
dpb on 20 Oct 2022
Edited: dpb on 22 Oct 2022
SECTIONS={'Gait Cycle','Events','Devices'};
F=readlines(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'));
ix=find(startsWith(F,SECTIONS))
ix = 3×1
1 33 51
Gives the section starting locations for internal parsing -- or use those to limit the ranges read using readtable from the file itself.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!