Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    forums
Go Back   Code Forums > Application and Web Development > Standard C, C++
User Name
Password

Reply
 
LinkBack Thread Tools Display Modes
Old 02-09-2005, 05:18 AM   #1 (permalink)
jodders
Registered User
 
Join Date: Feb 2005
Posts: 1
jodders is on a distinguished road
Hashing problem

I hope this is the right area to post this problem i am having. Its a complicated program and im just got a brain freeze and in a situation where i dont how iam going to achieve this methodology. This explanation is going to be pretty long!

Im using a hashing method with a hashtable which looks for repeats from a given text file. These repeats are DNA sequences! So these are, A,T,G,C.



Basically,the reason to use hashing in repeats is because once you have found the repeats in the substring, you can extend those repeats to look for maximal repeats.

Ive worked my arse off trying to get the damn hashing method to work to how and almost there but i dont know how to use it to extend those repeats.I have a feeling the way i have done it,may not make it possible. Im at the last hurdle and having trouble jumping over it.


At the moment,i have added user input and allowed the person to specify:
1.the length of the repeat he wants e.g 4 in this example
2.the top n repeats he wants to show (eg.top 20) repeats.


Code:
Code: The results show: Number of repeats in order: AAAA with 48 repeats 72 140 407 408 409 410 411 412 413 414 415 421 422 423 424 433 490 501 560 561 562 709 710 717 718 719 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 Number of repeats in order: TTTT with 25 repeats 49 129 194 219 220 221 222 228 229 230 231 356 357 460 461 537 538 548 549 555 556 1018 1079 1243 1244 Number of repeats in order: ATTT with 17 repeats 20 128 161 282 321 355 436 442 459 927 981 985 1041 1078 1128 1219 1250 Number of repeats in order: CAAA with 15 repeats 88 99 121 175 270 406 420 432 500 716 803 871 922 1147 1201 Number of repeats in order: TTTG with 15 repeats 50 195 223 232 239 322 516 626 917 986 1042 1220 1227 1245 1270 Number of repeats in order: TATT with 15 repeats 2 19 127 281 316 363 441 458 543 980 984 1032 1040 1077 1127 Number of repeats in order: TTTA with 14 repeats 0 212 265 437 462 550 557 928 982 1064 1080 1089 1129 1159 Number of repeats in order: CTTT with 13 repeats 211 264 515 536 554 625 1017 1063 1088 1158 1226 1242 1269 Number of repeats in order: AAAT with 13 repeats 73 100 303 434 563 711 894 1011 1138 1148 1171 1202 1279 Number of repeats in order: TTTC with 12 repeats 21 130 162 283 358 403 443 495 539 809 1019 1251
SUBLEN = AAAA, k =4;
the integers show the position of the SUBLEN on the string with how many repeats have been found.

My problem i think i m going to have is that K has always been specified by the user,so there is going to be always K length. So the results show what the user has entered as k =4. The way i have calculated this,is my hashtable size is going to be always: 4^k....

The reason its 4,is because im working with A,T,G,C..


so if k = 2, the hashtable size is: 16
k = 3, the hashtable size is: 64
k = 4,the hashtable size is: 256

and so on... Hopefully this has made it a little bit clearer of the foundations of this program...


So i have two ways of thinking to implement this:
1) Quite tricky thing to do,but Instead of having k = 4 or any number, is there a way to go through the whole txt without specifying what k is? I think im going to have a problem with the SIZE part.

e.g Code:


#define SUBLEN 7
#define SIZE 16384



SIZE is the size of the hashtable. I dont have faith in this option at all because what happens with you get k = 20? the size isnt worth thinking about!


2) The other way which i thought might work could be,if k =4 or any integer,
maybe i could add a cycle to extend the repeats until the matches i found were greater or equal to 2.


This is a pretty complicated program and i need some help on how i can do this.

I can provide code if that helps to explain things.

kind regards

joHnnY
__________________
jodders is offline   Reply With Quote
Old 02-09-2005, 01:51 PM   #2 (permalink)
Valmont
[code][/code] enforcer
 
Valmont's Avatar
 
Join Date: Mar 2003
Location: Netherlands
Posts: 1,545
Valmont is on a distinguished road
I don't know anything about DNA.

- How is the text file organized?
Is it just one long string of ATGC's?
- Are you trying to find a number of - and certain - position(s) of string sequences of length K. Any sequence that is?
- What is a maximal repeat?
- What is a substring in your case? Just a part of the text file?
Are you loading the whole text file in a string?

And yes, feel free to provide the code and the text-file. Otherwise I have to re-invent your specialized hash container. What are you using, a hash_multimap or something because of duplicate sequences?

You can also mail it to my_email_addy, then I will take care of it for this forum.
__________________
Valmont is offline   Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem Assignment (Urgent help req.) Boltress Standard C, C++ 0 01-12-2005 07:59 AM
looping problem maria_arif MS Technologies ( ASP, VB, C#, .NET ) 1 11-29-2004 08:07 AM
hashing help saiz66 Standard C, C++ 2 06-28-2004 01:39 AM
Help debugging a power problem Belisarius Lounge 0 10-25-2003 04:44 PM
This is a windows/C problem UnderWing Standard C, C++ 6 03-28-2003 06:17 AM


All times are GMT -8. The time now is 05:55 PM.


Powered by vBulletin Version 3.6.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0 RC8





Copyright © 2000-2006, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting
Open Circle