Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.
It should be noted that spaces should be ignored or else test suites 3 and 5 fail.
good use of 'hankel' function
cool solution
Sorry about this, but I got stuck and I want to learn how to do it. After looking at several solutions, I found my mistake and was able to create my own solution :)
What happens if the test suite changed in the future?
This solution is not correct in general, as the way of using hankel here, generates n-1 fake fragments
Clever usage of the Hankel matrix. I don't automatically think of the Hankel for this application, but it really works well. Thanks - I've learned something
What's the point of a 'solution' like this? It passes the test suite, but in what way was it interesting for you to write it?
522 Solvers
Calculate the Levenshtein distance between two strings
303 Solvers
Test if a Number is a Palindrome without using any String Operations
157 Solvers
Back to basics 4 - Search Path
280 Solvers
232 Solvers