Regular Expression to extract bigram

2 views (last 30 days)
string = 'ab bc cd ef gh ij kl'
what will be the regular expression to extract bigram from the given string
I am writing the code
regexp(string,'\w* \w*','match');
the o/p is coming as: 'ab bc' 'cd' 'ef' 'gh' 'ij' 'kl'
while the output i am expecting as:
  • 'ab bc'
  • 'bc cd'
  • 'cd ef'
  • 'ef gh'
  • 'gh ij'
  • 'ij kl'
  2 Comments
Walter Roberson
Walter Roberson on 26 Sep 2013
I believe the term is "bi-gram".
If the string was
'abc defg'
would you want the result to be
ab bc c<space> <space>d de ef fg
or
ab de
or
ab bc de ef fg
?
Or does it only need to work on letter pairs ?
arun
arun on 26 Sep 2013
yes you are saying right but i want to do it on word level and when i am writing the regexp as:
regexp(string,'\w+ \w+','match')
the o/p is: ans =
'ab bc' 'cd ef' 'gh ij'

Sign in to comment.

Accepted Answer

Azzi Abdelmalek
Azzi Abdelmalek on 26 Sep 2013
Edited: Azzi Abdelmalek on 26 Sep 2013
EDIT
Do you want?
string = 'ab bc cd ef gh ij kl'
regexp(string,'\s+','split');
  3 Comments
Azzi Abdelmalek
Azzi Abdelmalek on 26 Sep 2013
string = 'ab bc cd ef gh ij kl'
out=regexp(string,'\s+','split');
cellfun(@(x,y) [x ' ' y],out(1:end-1)', out(2:end)','un',0)
arun
arun on 26 Sep 2013
Thanks @Azzi Abdelmalek
for your answer.

Sign in to comment.

More Answers (1)

Andrei Bobrov
Andrei Bobrov on 26 Sep 2013
z=regexp(string,'\w*','match')
strcat(z(1:end-1),{' '},z(2:end))

Categories

Find more on Data Type Identification in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!