Looking for an alternative to regexp.

3 views (last 30 days)
Bob Thompson
Bob Thompson on 23 Mar 2021
Edited: Stephen23 on 25 Mar 2021
I'm looking for an alternative way to parse through strings to find bits of information, or for a way to use regexp that doesn't give me nested cells. I'm tired of dealing with the nested cells.
I've got a string that contains node numbers and locations. I would like to capture all of the node numbers, and then put them into a double array. I can identify and extract the numbers with regexp, but any time I use regexp with tokens I end up with cells inside of cells for a reason that I don't entirely understand. Am I doing something to create the extra layer of cells, or is there another command that can parse and extract the information I want?
singlestring = 'nxyzs=74xyz[0]:-2.0447000e+010.0000000e+001.8288000e+00Nearestnodeis7736664atadistanceof4.6823094e-03locatedat-2.0451682e+012.2396341e-161.8288000e+00';
repeatstrings = repmat(singlestring,1,5);
nodes = regexp(repeatstrings,'Nearestnodeis(\d+)','tokens');
The nodes variable will contain a 1x5 cell matrix, where each cell contains a 1x1 cell, which contains the node number string.
  2 Comments
Stephen23
Stephen23 on 24 Mar 2021
Edited: Stephen23 on 25 Mar 2021
Tokens are always returned in a cell array (with size equal to the number of tokens (thus in your case scalar, because you only specified one token)). If multiple matches is enabled (the default) then every output is nested in a cell array (with size equal to the number of matches made), so you will get nested cell arrays of tokens.
FYI, if you only need to match the regular expression exactly once, then you can specify the 'once' option and the outputs are not nested in cell arrays. This does not apply to your example, but is useful in other cases.
As well as concatenating the output data or using named tokens as the answers below show, you can also use a look-behind assertion and return the matched string (no nested cell arrays), which makes post-processing much simpler:
nodes = regexp(repeatstrings,'(?<=Nearestnodeis)\d+','match')
nodes = 1×5 cell array
{'7736664'} {'7736664'} {'7736664'} {'7736664'} {'7736664'}
vec = str2double(nodes)
vec = 1×5
7736664 7736664 7736664 7736664 7736664
Bob Thompson
Bob Thompson on 24 Mar 2021
Thanks, I definitely think this is more smooth than what I usually attempt.

Sign in to comment.

Answers (2)

Star Strider
Star Strider on 23 Mar 2021
See if adding either:
Out = cell2mat([nodes{:}].')
or:
Out = str2num(cell2mat([nodes{:}].'))
to the posted code provides the desired result.
Note that str2num is not generally recommended, however it works when str2double produces an unacceptable result.

Walter Roberson
Walter Roberson on 23 Mar 2021
singlestring = 'nxyzs=74xyz[0]:-2.0447000e+010.0000000e+001.8288000e+00Nearestnodeis7736664atadistanceof4.6823094e-03locatedat-2.0451682e+012.2396341e-161.8288000e+00';
repeatstrings = repmat(singlestring,1,5);
nodes = regexp(repeatstrings,'Nearestnodeis(?<NN>\d+)','names');
str2double({nodes.NN})
ans = 1×5
7736664 7736664 7736664 7736664 7736664
  3 Comments
Walter Roberson
Walter Roberson on 23 Mar 2021
(?<WORD>PATTERN)
creates a named token; whatever is matched by PATTERN gets stored in a struct field named WORD, as text. But even though it is called a "named token", oddly enough to get back the struct, you have to ask for "names" instead of for "tokens".
You get back a struct array, one struct array entry for each time the overall pattern matches -- in this case one for each time Nearestnodeis is followed by a sequence of digits. So a 5 x 1 struct in this case, each with a field named as indicated, NN. So as usual with struct arrays you call pull out all of the entries using struct expansion inside a {}, creating a cell array of character vectors, and then you can convert them all at once using str2double() on the cell array.
Bob Thompson
Bob Thompson on 24 Mar 2021
Thanks for the explanation. I do like structures better than cells, most of the time.

Sign in to comment.

Categories

Find more on Structures in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!