I don't know anything about DNA.
- How is the text file organized?
Is it just one long string of ATGC's?
- Are you trying to find a number of - and certain - position(s) of string sequences of length K. Any sequence that is?
- What is a
maximal repeat?
- What is a substring in your case? Just a part of the text file?
Are you loading the whole text file in a string?
And yes, feel free to provide the code and the text-file. Otherwise I have to re-invent your specialized hash container. What are you using, a hash_multimap or something because of duplicate sequences?
You can also mail it to
my_email_addy, then I will take care of it for this forum.