How to search nucleotide sequences with regexp?
1 view (last 30 days)
Show older comments
Hello everyone,
I am trying to search a huge list of 23 322 DNA sequences for this sequence:
XTTATTATTATTATTATTATTATTY
Where T and A are the usual bases, and I want X and Y to be A, C, T, or G, length 1. I am looking for this (TTA)7TT repeat core sequence and trying to find what are the bases immediately flanking it.
So I am using the regular expression:
[ACTG]{1,1}TTATTATTATTATTATTATTATT[ACTG]{1,1}
And I get 30 results. When I search for the flanking residues manually and sum up those results, using regular expressions like this:
ATTATTATTATTATTATTATTATTA
GTTATTATTATTATTATTATTATTG
CTTATTATTATTATTATTATTATTC
TTTATTATTATTATTATTATTATTT
and so on, I get 47 results. The first regular expression should be able to find all of the results in one go but apparently it does not. So I think I have made an error in constructing my first regular expression, because it is not finding all of the results. If there are any regular expression masters out there, I would greatly appreciate your help.
0 Comments
Answers (0)
See Also
Categories
Find more on Get Started with MATLAB in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!