Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
This problem was originally inspired by a MATLAB Newsgroup discussion.
It should be noted that spaces should be ignored or else test suites 3 and 5 fail.
good use of 'hankel' function
cool solution
Sorry about this, but I got stuck and I want to learn how to do it. After looking at several solutions, I found my mistake and was able to create my own solution :)
What happens if the test suite changed in the future?
This solution is not correct in general, as the way of using hankel here, generates n-1 fake fragments
Clever usage of the Hankel matrix. I don't automatically think of the Hankel for this application, but it really works well. Thanks - I've learned something
What's the point of a 'solution' like this? It passes the test suite, but in what way was it interesting for you to write it?
Program an exclusive OR operation with logical operators
597 Solvers
269 Solvers
Implement simple rotation cypher
805 Solvers
Find out missing number from a vector of 9 elements
207 Solvers
Back to basics - mean of corner elements of a matrix
235 Solvers