Difference between revisions of "Cipher Cracking 2013"
(→Specific tasks) |
(→Specific tasks) |
||
Line 30: | Line 30: | ||
* Critically review the statistical analysis of the letters. See if you can extend it (eg. testing another language previous students missed and checking if they included all possibilities of ambiguous letters). Is the conclusion of previous team correct? | * Critically review the statistical analysis of the letters. See if you can extend it (eg. testing another language previous students missed and checking if they included all possibilities of ambiguous letters). Is the conclusion of previous team correct? | ||
− | * Download the searchable pdf file of the copy of the [http://www.eleceng.adelaide.edu.au/personal/dabbott/tamanshud/W&T_rubaiyat_wells_copy.pdf Omar Khayyam] that closely matches the dead man's copy. Create an ascii file with the raw text. Use this as a one-time pad that directly substitutes the letters of the alphabet a-z. The book contains 75 quatrains (four sentence poems) each containing about 140 letters. So the whole book contains about <math>75 \times 140 = 10,500 | + | * Download the searchable pdf file of the copy of the [http://www.eleceng.adelaide.edu.au/personal/dabbott/tamanshud/W&T_rubaiyat_wells_copy.pdf Omar Khayyam] that closely matches the dead man's copy. Create an ascii file with the raw text. Use this as a one-time pad that directly substitutes the letters of the alphabet a-z. The book contains 75 quatrains (four sentence poems) each containing about 140 letters. So the whole book contains about <math>75 \times 140 = 10,500\,</math> words. As we don't know where in the book the one-time pad starts, start at the beginning and step by one. You'll then end up with >10,000 decrypts. Write a software script to look for the most common top-20 words in all the decrypts to narrow down to a few possible results that can be examined by eye. |
* Extend the [[Media:Cipher GUI.rar|CipherGUI 2011]] software that was created a the previous team. See if you can add more ciphers to the collection. Use it to eliminate more ciphers and enter your conclusions here: [[Cipher Cross-off List]]. Be critical and be prepared to question and recheck some of the items already on the list. | * Extend the [[Media:Cipher GUI.rar|CipherGUI 2011]] software that was created a the previous team. See if you can add more ciphers to the collection. Use it to eliminate more ciphers and enter your conclusions here: [[Cipher Cross-off List]]. Be critical and be prepared to question and recheck some of the items already on the list. |
Revision as of 20:48, 18 March 2013
Contents
Supervisors
Honours students
Project guidelines
General project description
In this project you will attempt to solve a possible murder that took place in Adelaide in 1948. This crime remains unsolved till today, but you can use engineering to bring our knowledge closer to the killer. You can read the details about the dead body and the circumstances [1]
Associated with the dead body was this secret code:
- MRGOABABD
- MTBIMPANETP
- MLIABOAIAQC
- ITTMTSAMSTGAB
(See the original photograph, as there may be an extra line, and some of the M's may be W's. Some people also think that the last "I" is really a "V". Also the last G is probably really a C). To this day code crackers have been unable to decrypt it.
We also want you to bring the skills of an electrical engineer to bear on the area of e-forensics and see if you can apply these to other areas of the case (eg. graphical reconstruction of the dead man's face).
Specific tasks
Here are the remaining tasks resulting from previous work. You may want to focus on a subset of these:
- Critically review the statistical analysis of the letters. See if you can extend it (eg. testing another language previous students missed and checking if they included all possibilities of ambiguous letters). Is the conclusion of previous team correct?
- Download the searchable pdf file of the copy of the Omar Khayyam that closely matches the dead man's copy. Create an ascii file with the raw text. Use this as a one-time pad that directly substitutes the letters of the alphabet a-z. The book contains 75 quatrains (four sentence poems) each containing about 140 letters. So the whole book contains about [math]75 \times 140 = 10,500\,[/math] words. As we don't know where in the book the one-time pad starts, start at the beginning and step by one. You'll then end up with >10,000 decrypts. Write a software script to look for the most common top-20 words in all the decrypts to narrow down to a few possible results that can be examined by eye.
- Extend the CipherGUI 2011 software that was created a the previous team. See if you can add more ciphers to the collection. Use it to eliminate more ciphers and enter your conclusions here: Cipher Cross-off List. Be critical and be prepared to question and recheck some of the items already on the list.
- A previous team created a webcrawler and search engine for search keywords with wild cards, as Google does not allow this. This is
to check for common repeated expressions on the WWW that may contain initial letters that are also in the code. If the code is an initialism, this will give us a clue as to some likely content. http://yacy.de/en/index.html
Create a proper web interface for the web crawler that was written last year and make it easier to use. See if you can run it and get a few example string matches.
- Because the web crawler would take years to trawl the web, write another interface that searches pre-indexed web data. Talk to the comp. sci. department to find sources of user-accessible indexed web data.
- Use computer graphics to reconstruct and undistort the face of the dead man. What would he look like if he were alive? To do this you need to buy a 3D scanner and scan in the man's face from a plaster bust that is in a museum. An example of the type of graphics software you can use to manipulate the scanned image is 123D.
- Create an entertaining YouTube video of your whole project.
Weekly progress and questions
This is where you record your progress and ask questions. Make sure you update this every week.
Approach and methodology
We expect you to take a structured approach to both the validation of last year's results, and the writing of the software. You should carefully design the big-picture high-level view of the software modules, and the relationships and interfaces between them. Think also about the data transformations needed - you will start off with html web pages, and in the end will need some matlab graphs.
Possible extension
If you knock off this project too easily and are looking for a harder code cracking problem to try your software out on, you can progress to analyzing another famous unsolved mystery: the Voynich Manuscript
Expectations
We don't really expect you to find the killer, though that would be cool if you do and you'll become very famous overnight. To get good marks we expect you to show a logical approach to trying to find the patterns from the code on the web, and any other attempts to crack the code. Running the webcrawler for many hours is unrealistic, but we would like you to run it for short bursts to find some examples of the type of matches it comes up with.
Relationship to possible career path
Whilst the project is fascinating as you'll learn about a specific murder case—and we do want you to have a lot of fun with it—the project does have a hard-core serious engineering side. It will familiarize you with techniques in information theory, probability, statistics, encryption, decryption, and datamining. It will also improve your software skills. The project will also involve writing software code that trawls for patterns on the world wide web (exploiting it as a huge database). This will force you to learn about search engines and databases; and the new tools you develop may lead to new IP in the area of datamining and also make you rich/famous. The types of jobs out there where these skills are useful are in computer security, comms, or in digital forensics. The types of industries that will need you are: the software industry, e-finance industry, e-security, IT industry, Google, telecoms industry, ASIO, ASIS, defence industry (e.g. DSD), etc. So go ahead and have fun with this, but keep your eye on the bigger engineering picture and try to build up an appreciation of why these techniques are useful to our industry. Now go find that killer...this message will self-destruct in five seconds :-)
See also
- Final Report 2012
- Semester A Progress Report 2012
- Cipher Cross-off List
- CipherGUI 2011 (Download)
- Web Crawler 2011 (Download)
- 2012 Final Seminar (YouTube)
- The scanning process (YouTube)
- The fusion process (YouTube)
- The final model (YouTube)
- Code Cracking: Who Murdered the Somerton Man? 2012 (YouTube)
- Timeline of the Taman Shud Case
- List of people connected to the Taman Shud Case
- List of facts on the Taman Shud Case that are often misreported
- List of facts we do know about the Somerton Man
- The Taman Shud Case Coronial Inquest
- Letter frequency plots
- Structural Features of the Code
- Markov models
- Primary source material on the Taman Shud Case
- Secondary source material on the Taman Shud Case
- Transition probabilities from selected texts
- Listed poems from The Rubaiyat of Omar Khayyam
- Using the Rubaiyat of Omar Khayyam as a one-time pad
- Using the King James Bible as a one-time pad
- Using the Revised Standard Edition Bible as a one-time pad
- Transitions within words
References and useful resources
If you find any useful external links, list them here:
- The taman shud case
- Edward Fitzgerald's translation of رباعیات عمر خیام by عمر خیام
- Adelaide Uni Library e-book collection
- Project Gutenburg e-books
- Foreign language e-books
- UN Declaration of Human Rights - different languages
- Statistical debunking of the 'Bible code'
- One time pads
- Analysis of criminal codes and ciphers
- Code breaking in law enforcement: A 400-year history
- Evolutionary algorithm for decryption of monoalphabetic homophonic substitution ciphers encoded as constraint satisfaction problems