Extracting second number after comma within parenthesis

I have a string
"Toc(Clock Data Ref Time) : 0x91E6 (37350,5.976000e+005 s)";
I am looking to extract only contents after comma from the second parenthesis.
So, the required would be 5.976000e+005.
My code is
XX="Toc(Clock Data Ref Time) : 0x91E6 (37350,5.976000e+005 s)";
TOC=strrep(XX,'Toc(Clock Data Ref Time)','');
TOC=regexp(TOC, '(?<=\()[^)]*(?=\))', 'match')
Which returns 37350,5.976000e+005 s.
But how to extract numbers after comma?
Thank you.

 Accepted Answer

Assuming that the number will always be in scientific notation, then the following should work.
TOC = regexp(str,'(?<=\(.*?,)\d\.\d+e\+\d+(?=.*?\))','match');
% A rough description of what pattern the expression is indicating
% (?<=\(.*?,) Look behind an open parenthesis for an optional set of any characters until reaching the following pattern
% \d\.\d+e\+\d+ A single digit followed by a period, followed by 1 or more consecutive digits, followed by e+,
% followed by 1 or more consecutive digits.
% This pattern this must precede the following look-ahead assertion.
% (?=.*?\)) Any optional set of characters before a closed parenthesis.

5 Comments

Could you explain in brief how you used regexp till comma and extracted the numbers. How you ommited sec also.
Like I said twice, regexp() is very complicated and cryptic - it's like it's own language. So that's why I presented you with a simpler and more intuitive option of using strfind(). However if you want to learn regexp(), have at it. It's probably a useful skill to have.
IA is correct, regular expressions are cryptic and can be challenging to use at best. However, they can also be incredibiliy power. I did briefly characterize what the expression I used is doing, but will try to add a bit more detail.
(?<=test)expr This is a lookaround assertion that looks for text matching test and that is following by expr. Here I am defining test as "\(.*?,". Since parenthesis are metacharacters and I want a literal open parenthesis, I have to precede it with a backslash. The period is a metcharacter that denotes any single character (including whitespaces), which is followed by an asterisk and question mark. The asterisk and question mark are types of quantifiers, where the asterisk represents 0 or more times consecutively and the question mark is 0 or 1 times (ie - optional). The asterisk is also considered a greedy quantifier but when followed by a question mark, it becomes lazy (read more in provided links). Lastly, I finish with a comma. Essentially, this test expression says that I want to look for an open parenthesis, then ignore all charaters until I get to a comma, then look for the expr expression.
I use "\d\.\d+e\+\d+" as my expr. Backslash 'd' is one way of representing any single digit; however, twice I follow this with a plus. A plus is another quantifier and represents 1 or more times consecutively, thus "\d+" says look for 1 or more consecutive digits. The backslash preceding the period represents a literal period, the backslash preceding the plus represents a literal plus, and "e" is a literal e. Therefore expr expression says look for a single digit, followed by a period, followed by 1 or more digits, followed by e, followed by a plus, and followed by 1 or more digits. If it is possible that you may encounter values less than one you might want to replace the original expr with "\d\.\d+e[-+]\d+", which will look for either a minus or plus after the e.
expr(?=test) This is another lookaround assertion, but looks ahead of a matching test for the expr. Here I am defining test as ".*?\)", which is a lazy expression that looks for any number of single characters until a closed parenthesis is found. Again, since parenthesis are metacharacters, I have to precede it with a backslash to find a literal closed parenthesis.
Here are a few links that are good resources to learning regular expressions and MATLAB specific syntax.
https://www.youtube.com/watch?v=7DG3kCDx53c (This is the first of several videos in a series)
Thanks I willl go through it.
The regexp documentation is more focussed on the function itself, for detailed documentation on regular expressions in MATLAB read this (and the links at the bottom of that page):

Sign in to comment.

More Answers (3)

Slightly modified version following IA's direction using newer string parsing functions that can do much of what regexp expressions are often used for--
XX = "Toc(Clock Data Ref Time) : 0x91E6 (37350,5.976000e+005 s)";
TOC=str2double(extractBetween(XX,',',' '));
>> TOC
TOC =
597600
>>

2 Comments

Thanks for letting us know about that function. +1 vote. I'd never heard of it.
Seems like they really beefed up and simplified the string handling with extractBetween(), endsWith(), startsWith(), etc. About time. regexp() is just too complicated for most people.
dpb
dpb on 20 Jan 2020
Edited: dpb on 20 Jan 2020
I discovered them when exploring the new strings class back when it was introduced.
They're not that easy to find on their own, however, the "See Also" links don't include them in many logical places like under any of the historical strfind strcmp routines nor even with string itself. They're listed under a topic "Search and Replace Text" but it's a long and arduous road to even get to that link from top level.
I've made the suggestion documentation needs more links to help make them visible but so far hasn't made it to the top of the list (which I reckon must be miles long)...

Sign in to comment.

If your format is fixed (the same every time), you can do it much, much more simply, and less cryptically, by avoiding regexp() and simply using indexing:
XX = "Toc(Clock Data Ref Time) : 0x91E6 (37350,5.976000e+005 s)";
% Convert from string to character array.
XX = char(XX);
% Extract known, fixed part of string from between 47 and 59, inclusive.
TOC = XX(47:59) % This is a character array. Use str2double() if you want a number.

4 Comments

I am thinking the part ""Toc(Clock Data Ref Time) :" is fixed all the time but data after that may change sometime. But format is same. The numerical value comes withinn bracket.
Then try it this way:
XX = "Toc(Clock Data Ref Time) : 0x91E6 (37350,5.976000e+005 s)";
% Convert from string to character array.
XX = char(XX);
% Find comma and ' s'. Assume they will definitely be there. Check if in doubt.
commaLocation = strfind(XX, ',')
sLocation = strfind(XX, ' s')
% Extract between those two locations.
TOC = XX((commaLocation+1):(sLocation-1))
Small doubt :). Its working for this particular string.. My data is an array so why it is not working over fulllength of the array unlike using regexp.
You gave XX as a string class variable, not as a character array, so that's why I had to use XX=char(XX). If your data is already a character array just delete that line because (for some reason) indexing doesn't seem to work with strings. Otherwise, attach XX in a .mat file if you're still interested in pursuing the strfind() method.
save('answers.mat', 'XX');

Sign in to comment.

Simply match all text from the comma to the whitespace:
>> str = 'Toc(Clock Data Ref Time) : 0x91E6 (37350,5.976000e+005 s)';
>> regexp(str,'(?<=,)\S+','match')
ans =
'5.976000e+005'

3 Comments

Precisely what the extractBetween function does... :)
I am using matlab 2014. So extractBetween is not supported there.
dpb
dpb on 21 Jan 2020
Edited: dpb on 21 Jan 2020
Then the substring selection between str(i1:i2) will. Of course, one can roll your own version of extractBetween that way as well...altho regexp() then may well be a better choice.

Sign in to comment.

Categories

Asked:

on 20 Jan 2020

Edited:

on 21 Jan 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!