Difference between revisions of "Cipher Cracking 2019"

Revision as of 11:39, 31 March 2019

1 Supervisors
2 Honours students
3 Project guidelines
4 General project description
5 Specific tasks
6 Deliverables
- 6.1 Semester 1
- 6.2 Semester 2
7 Weekly progress and questions
8 Approach and methodology
9 Possible extension
10 Expectations
11 Relationship to possible career path
12 See also
13 References and useful resources
14 Back

Supervisors

Honours students

Project guidelines

Project Handbook

General project description

In this project is about human identification. In times of war and natural disaster there is a high demand for the ability to identify human remains. Also in the area of policing crime, the need to forensically identify remains is critical. Also human identification can extend to humans that are alive: for example reuniting an adopted child with a birth family. Children that do not know their origins for a number of reasons (eg. human trafficking of babies) can also benefit.

The idea of this project is to advance the techniques of human identification by focusing on a difficult case that that is too hard for current techniques. You can read the details about the dead body and the circumstances here.

We also want you to bring the skills of an electrical engineer to bear on the area of e-forensics and bioinformatics see if you can apply these to an important area of the case, ie. analyzing degraded DNA sequences.

Specific tasks

We have supplied you with a file of about 700,000 rs numbers and the corresponding SNP base pairs for the Somerton Man. Unfortunately there's a very high drop out rate. The idea is to ground truth this data, characterize it, and squeeze out of it any information you can. The idea is to find out what the data can tell us and also what id definitely fails at telling us. We need to know both. Here are some ideas to get you started:

Write a script to count the rs numbers in each chromosome. Count the base pairs in each chromosome. Tabulate these results. Do this quickly.

Create a synthetic "random human" file to get some idea what random base pairs do. Upload this in research mode on Gedmatch Genesis and test its characteristics. Does it actually link to any humans?

Create a synthetics human that say has all A's or a short sequence of base pairs that is repeated over and over periodically. How do these files behave?

Upload the Somerton Man in research mode. Find out which chromosomes are under the minimum required based papers. Try many different ways of padding these base pairs with dummy data in order to meet the minimum. To are looking to find the padding method that influences the results the least.

Using a known good DNA reference file, deliberately drop out those SNPs the are missing in the Somerton Man. As you know the groundtruth in this case, you can investigate which padding methods failed and which produced partial results.

Another padding trick is you might introduce opposite homozygotes in various places to beef up the number of pairs. These segments might distort the relationship estimates, but if you keep them to smaller segments, they might not make a difference. You can test this out on the good DNA file that has been deliberately stripped.

Further to padding by just the minimum amount, you can then investigate if you can pad by more by finding basepairs that are common to males or have a high liklihood in males. You can have good DNA files you can test this on.

You can go to an rs number database that describes the effect of each pair (eg. if it affect eyecolour, a disease, or whatever). You can then create you own file containing all the rs descriptions for the Somerton Man. Then you can write your own software to datamine this file for interesting physiological features, characteristics, and diseases.

Deliverables

Semester 1

Start Project Work (Week 1)
Proposal seminar (Week 5)
- File:141 Proposal Seminar2019.pdf
Progress report (Week 12) - only one report needed in wiki format
- File:141 Progress Report2019.pdf

Semester 2

Final seminar (Week 10)
- File:141 Final Seminar2019.pdf
Final report (Week 11) - only one report needed in wiki format
- Final Report/Thesis 2019
Poster (Week 12) - one poster only needed
- File:141 Poster2019.pdf
Project exhibition 'expo' (Week 12)
CD or stick containing your whole project directories (Week 13)
YouTube video (Week 13) - add the URL to this wiki
- https://www.youtube.com/insert_here

Weekly progress and questions

This is where you record your progress and ask questions. Make sure you update this every week.

Cipher cracking 2019 weekly progress

Approach and methodology

We expect you to take a structured approach to both the validation of last year's results, and the writing of the software. You should carefully design the big-picture high-level view of the software modules, and the relationships and interfaces between them. Think also about the data transformations needed.

Possible extension

If there is time we can get hold of the original FASTQ file and perform statistical tests on it. And you can attempt statistical imputation to infer missing bases pairs.

Expectations

We don't really expect you to find the killer or identify the Somerton Man, though that would be cool if you do and you'll become very famous overnight. To get good marks we expect you to show a logical engineering approach to squeezing information out of the data, and using known good files to always groundtruth each idea.

It is perfectly acceptable to have a long list of ideas that didn't work, provide they are carefully test in a structured way. Finding things that don't work is part of the scientific process. When we know know what is supposed to work or not, being able to eliminate ideas that don't work is still very exciting and worthwhile.

Relationship to possible career path

Whilst the project is fascinating as you'll learn about a specific murder case—and we do want you to have a lot of fun with it—the project does have a hard-core serious engineering side. It will familiarize you with techniques in information theory, probability, statistics, programming, bioinformatics, and datamining. It will also improve your software skills. So go ahead and have fun with this, but keep your eye on the bigger engineering picture and try to build up an appreciation of why these techniques are useful to our industry. Now go find that killer...this message will self-destruct in five seconds :-)

References and useful resources

If you find any useful external links and resources, list them here:

The Tamam Shud case
The PALEOMIX protocol used by this project File:NPO schubert2014.pdf
Guanchen Li's Master's thesis File:Thesis a1652167 Guanchen Li.pdf'
Gedmatch Genesis

Back

@@ Line 15: / Line 15: @@
 The idea of this project is to advance the techniques of human identification by focusing on a difficult case that that is too hard for current techniques.  You can read the details about the dead body and the circumstances [http://en.wikipedia.org/wiki/Taman_Shud_Case here.]
-We also want you to bring the skills of an electrical engineer to bear on the area of e-forensics and bioinformatics see if you can apply these to other areas of the case (eg. analyzing degraded DNA sequences).
+We also want you to bring the skills of an electrical engineer to bear on the area of e-forensics and bioinformatics see if you can apply these to an important area of the case, ie. analyzing degraded DNA sequences.
 ==Specific tasks==

Difference between revisions of "Cipher Cracking 2019"

Revision as of 11:39, 31 March 2019

Contents

Supervisors

Honours students

Project guidelines

General project description

Specific tasks

Deliverables

Semester 1

Semester 2

Weekly progress and questions

Approach and methodology

Possible extension

Expectations

Relationship to possible career path

See also

References and useful resources

Back

Navigation menu

Views

Personal tools

Navigation

Search

Tools