regexp - match regular expression question
2 views (last 30 days)
Show older comments
Hi all,
In the Matlab 'help' documents for the function called regexp, I'm trying to understand the what the vertical line ( ie. | ) means in the pattern layout below. The example below comes directly from Matlab's help area .... after typing 'help regexp'.
The help documentation indicates:
"|" means Match subexpression before or after the "|"
What I would like to ask is. What does the above mean exactly? At the moment, I'm thinking 'which is it?' .... I was expecting that a match would either be 'before', or it would be 'after'.... but not both before OR after. But even if it really means 'match before OR after', what does that mean exactly? For example, what does "|" actually represent?
Thanks in advance.
str = 'John Davis; Rogers, James';
pat = '(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)';
n = regexp(str, pat, 'names')
2 Comments
Stephen23
on 30 Sep 2016
The | is an exclusive or. Here is an example of how it works, tested on a string with four slightly different "words":
>> regexp('a123z a%%%z a1%3z a__z','a(\d+|%+)z','match')
ans =
'a123z' 'a%%%z'
The pattern matches all sequences starting with a, ending with z, and containing XOR(digits,%-symbols). The third "word" in the string does not match this because it contains both digits and %-smbols, the fourth contains only underscore, so also does not match the regex. Now lets alter the regex and use two |, to give XOR(digits,%-symbols,underscores):
>> regexp('a123z,a%%%z,a1%3z,a__z','a(\d+|%+|_+)z','match')
ans =
'a123z' 'a%%%z' 'a__z'
Bonus if you want a convenient way to test and experiment with regular expressions, you can try my FEX submission:
Accepted Answer
Star Strider
on 30 Sep 2016
Edited: Star Strider
on 30 Sep 2016
When I’ve used the ‘|’ (‘or’) operator, I’ve used it to match either of the two (or more) sub-expressions in the expression string. In this instance, if it detects a comma it labels the first string as the last name and the second expression as the first name. If it does not detect a comma, it does the reverse. The presence or absence of a comma in the target string determines which sub-expression will return the result, because the target string with a comma will return an empty value for the sub-expression without a comma, and the reverse is true for the other sub-expression.
If you want to see how this works in practice, try it with only one sub-expression (and without the ‘|’ operator). That’s the easiest (and most instructive) way to see how a particular syntax works.
EDIT — Clarified an ambiguity in the original.
2 Comments
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!