How do I use REGEXP to match multi-digit values?
3 views (last 30 days)
Show older comments
I've been reviewing the MATLAB Programmer's Guide in hopes of finding a solution to a current problem: How do I use REGEXP to match multi-digit values?
From earlier this year, I have the following commands:
str = 'Part ID: 1 or Part ID: 2 or Part ID: 5 or Part ID: 10';
exp = 'Part ID:\s+([25])\W';
tokens = regexp(str, exp, 'tokens');
Part_data = reshape(str2double([tokens{:}]), 1, []).'
These commands work great when I'm looking to match up single digit values like 2 and 5.
Now, let's use the following updated string and associated commands;
str = 'Part ID: 112 or Part ID: 220 or Part ID: 252 or Part ID: 106';
exp = 'Part ID:\s+([220])\W';
tokens = regexp(str, exp, 'tokens');
Part_data = reshape(str2double([tokens{:}]), 1, []).'
This results in Part_data = NaN
I get the same result when trying to match the value of 112.
If possible, how do I use REGEXP to match only these two multi-digit values?
1 Comment
Cedric
on 6 Nov 2013
But what is the purpose? It seems to me that even if you were doing it right, you would essentially get 220, which is not a parameter or data (as you already know it). If you just want to see if there is a part 220, STRFIND might be more efficient. If you want to extract data that follow the header 'Part ID: 220', you should explain a bit more about the structure of the data, because it might be possible to extract it with a unique call to REGEXP.
Accepted Answer
Azzi Abdelmalek
on 6 Nov 2013
str = 'Part ID: 112 or Part ID: 220 or Part ID: 252 or Part ID: 106';
exp = '(?<=Part ID:) 220';
m = regexp(str, exp, 'match')
0 Comments
More Answers (1)
Kelly Kearney
on 6 Nov 2013
In a regular expression, [220] means "match a single 2, 2, or 0". Eliminate the brackets if you want to match a specific sequence:
str = 'Part ID: 112 or Part ID: 220 or Part ID: 252 or Part ID: 106';
exp = 'Part ID:\s+(220)\W';
tokens = regexp(str, exp, 'tokens');
Part_data = reshape(str2double([tokens{:}]), 1, []).'
If you want to match multiple sequences, use something like this:
exp = 'Part ID:\s+(220|252)\W';
4 Comments
Kelly Kearney
on 6 Nov 2013
There might be a way to do this with a singular regular expression, but I'd just capture them all and then filter out the ones you don't want:
str = 'Part ID: 112 or Part ID: 220 or Part ID: 252 or Part ID: 106';
exp = 'Part ID:\s+(\d+)';
tokens = regexp(str, exp, 'tokens');
Part_data = str2double([tokens{:}]);
Part_data = Part_data(Part_data ~= 106);
See Also
Categories
Find more on Characters and Strings in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!