Editing
Final Report/Thesis 2019
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Task 2: Artificially recover DNA file== ===Aims=== In this task, the project group aims to artificially recover Somerton Man's DNA file to satisfy the basic SNPs amount requirements (2000 SNPs for each chromosome) of GEDmatch's one-to-many tool and find out how many people is related to Somerton Man's DNA kit. ===Methods=== The recovery works was done by developing multiple programs using C++. In general, the recovery work is to replace a fixed amount of empty SNPs which is 2000 SNPs for each chromosome with available SNPs. Several simple recovery algorithms were implemented. The first algorithm is called random algorithm which is to replace empty SNPs with random base pairs in genotype. The second algorithm used was by replacing empty genotype with homozygous pairs (AA, GG, TT, CC) which resulted in 4 new algorithms. In addition, if there was no DNA kit that matches with Somerton Man' DNA in the database, recovering more empty SNP's of the Somerton Man's DNA could be a back up plan. With the recovery algorithm introduced before, the project team can recover more SNPs in Somerton Man's DNA reference file and try to use the one-to-many tool on those kits. ===Results and discussion=== With the developed program, multiple artificial DNA kits which have 2000 SNPs in each chromosome were created. Unfortunately, all of these DNA kits have 0 matches with other DNA in the public database which means these artificial DNA kits do not relate to any kit in the GEDmatch database. [[File:zero_match.png|thumb|600px|center|Figure 9: match results of artificial DNA(replace empty SNPs with random pairs to 2000 SNPs in each choromosone)]] Then kits with more amount of empty SNPs were replaced with homozygous pairs or random pairs were created, but none of these files could find relative DNA kits in the database. Even the DNA kits with all empty SNPs recovered could find a matched DNA result. It is important to note that all 5 recovery strategies were all implemented. As GEDmatch is the most commonly used DNA database in public, it contains a huge amount of DNA kits in its database. As the website shown, the total number of kits managed by GEDmatch database is 1363427157376. Therefore, the chance that no DNA kit in the database is related to Somerton Man is nearly impossible. This means that the quality of Somerton Man's DNA file is too low to be used on the one-to-many tool and implementing simple recovery algorithms are pointless. ===Conclusion=== It is obvious that there is too many empty SNPs in Somerton Man's DNA reference file. The recovery algorithms introduced were too simple and cannot help to increase the chance of finding Somerton Man's relatives in the GEDmatch database. With DNA data that only contains approximate 2% available SNPs, it is nearly impossible to find any possible related DNA kits to Somerton Man.
Summary:
Please note that all contributions to Derek may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Derek:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information