Editing
Cipher Cracking 2019
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Specific tasks== We have supplied you with a file of about 700,000 rs and i numbers and the corresponding SNPs for the Somerton Man. Unfortunately there's a very high drop out rate. The idea is to groundtruth this data, characterize it, and squeeze out of it any information you can. The idea is to find out what the data can tell us and also what it definitely fails at telling us. We need to know both. Here are some ideas to get you started: * Write a script to count the rs and i numbers in each chromosome. Count the base pairs in each chromosome. Tabulate these results. Do this quickly. * Create a synthetic "random human" file to get some idea what random SNPs do. Upload this in research mode on Gedmatch Genesis and test its characteristics. Does it actually link to any humans? * Create a synthetic human that say has all A's or a short sequence of SNPs that is repeated over and over periodically. How do these files behave? * Upload the Somerton Man in research mode. Find out which chromosomes are under the minimum required based papers. Try many different ways of padding these base pairs with dummy data in order to meet the minimum. You are looking to find the padding method that influences the results the least. * Using a known good DNA reference file, deliberately drop out those SNPs that are missing in the Somerton Man. As you know the groundtruth in this case, you can investigate which padding methods failed and which produced partial good results. * Another padding trick is you might introduce homozygous pairs (AA, GG, TT, CC) in various places to beef up the number of SNPs. These segments might distort the relationship estimates, but if you keep them to smaller segments, they might not make a difference. You can test this out on the good DNA file that has been deliberately stripped. * Further to padding by just the minimum amount, you can then investigate if you can pad by more by finding SNPs that are common to males or have a high liklihood in males. You can test this on good DNA files. * You can go to a SNP database that is indexed by rs and i numbers and find the affect each SNP has (eg. if it affects eyecolour, a disease, or whatever). You can then create you own file containing all the descriptions for the Somerton Man. Then you can write your own software to datamine this file for interesting physiological features, characteristics, and diseases.
Summary:
Please note that all contributions to Derek may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Derek:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information