Editing Final Report 2012 (section)

==Introduction==
[[Image:ProjectBreakdown.png|thumb|400px|right|Project Breakdown]]

For 60 years the identity of the Somerton Man and the meaning behind the code has remained a mystery.  This project is aimed at providing an engineer’s perspective on the case, to use analytical techniques to decipher the code and provide a computer generated reconstruction of the victim to assist in the identification. Ultimately the aim is to crack the code and solve the case.

The project was broken down into two main aspects of focus, Cipher Analysis and Identification, and then further into subtasks to be accomplished:
 
The first aspect of the project focuses on the analysis of the code, through the use of cryptanalysis and mathematical techniques as well as programs designed to provide mass data analysis. The mathematical techniques looked at confirming and expanding on the statistical analysis and cipher cross off from previous years. The mass data analysis expanded on the current Web Crawler, pattern matcher and Cipher GUI to improve the layout and functionality of these applications.

The second aspect focuses on creating a 3D reconstruction of the victim’s face, using the bust as the template. The 3D image is hoped to be able to be used to help with the identification of the Somerton Man. This aspect of the project is a new focus, that hasn’t previously been worked on.
The techniques and programs we have been using have been designed to be general, so that they could easily be applied to other cases in used in situations beyond the aim of this project.

===Motivation===
For over 60 years, this case has captured the public's imagination. An unidentified victim, an unknown cause of death, a mysterious code, and conspiracy theories surrounding the story - each wilder than the last. As engineers our role is to solve problems that arise, and this project gives us the ideal opportunity to apply our problem solving skills to a real world problem, something different from the usual "use resistors, capacitors and inductors" approach in electrical & electronic engineering. The techniques and technologies developed in the course of this project are broad enough to be applied to other areas; pattern matching, data mining, 3-D modelling and decryption are all useful in a range of fields, or even of use in other criminal investigations such as the (perhaps related) [http://en.wikipedia.org/wiki/Taman_Shud#Marshall_case Joseph Saul Marshall case].

===Project Objectives===

The ultimate goal of this project is to help the police to solve the mystery. Whilst this is a rather lofty and likely unachievable goal, by working towards it we may be able to provide useful insights into the case and develop techniques and technologies that are applicable beyond the scope of this single case. We have taken a multi-pronged approach to give a broader perspective and draw in different aspects that, taken in isolation, may be meaningless but taken as a whole become significant. The two main focuses of this project are cracking the code found in the Rubaiyat, and trying to identify the victim. To further break down cracking the code, the web crawler and pattern matcher are used to trawl the internet for meaning in the letters of the code, and directly interrogating the code with various ciphers to try and find out what is written. In order to be able to identify the victim so long after his death, we need to recreate how he might have looked when he was alive, firstly by making a three-dimensional reconstruction from a bust taken after he was autopsied, and then by manipulating the resulting model to create a more life-like representation, complete with hair, skin and eye colours (which can be found in the coroner's report).

===Previous Studies===
====Previous Public Investigations====

[[Image:The_Code.png|thumb|300px|right|The code found in the back of the Rubaiyat linked to the Somerton Man.]]
During the course of the police investigation, the case became ever more mysterious. Nobody came forward to identify the dead man, and the only possible clues to his identity showed up several months later, when a luggage case that had been checked in at Adelaide Railway Station was handed in. This had been stored on the 30th November, and it was linked to the Somerton Man through a type of thread not available in Australia but matching that used to repair one of the pockets on the clothes he wore when found. Most of the clothes in the suitcase were similarly lacking labels, leaving only the name "T. Keane" on a tie, "Keane" on a laundry bag, and "Kean" (no 'e') on a singlet. However, these were to prove unhelpful, as a worldwide search found that there was no "T. Keane" missing in any English-speaking country. It was later noted that these three labels were the only ones that could not be removed without damaging the clothing. <ref name=TamamShudWiki>''Tamam Shud Case'', Wikipedia Foundation Inc, http://en.wikipedia.org/wiki/Taman_Shud_Case</ref>

At the time, code experts were called in to try to unravel the text, but their attempts were unsuccessful. In 1978, the Australian Defence Force analysed the code, and came to the conclusion that:
* There are insufficient symbols to provide a pattern
* The symbols could be a complex substitute code, or the meaningless response to a disturbed mind
* It is not possible to provide a satisfactory answer <ref name=InsideStory>''Inside Story'', presented by Stuart Littlemore, ABC TV, 1978.</ref>

What does not help in the cracking of the code is the ambiguity of many of the letters.

As can be seen, the first letters of both the first and second (third?) lines could be considered either an 'M' or a 'W', the first letter of the last line either an 'I' or a 'V', plus the floating 'C' at the end of the penultimate line. There is also some confusion about the 'X' above the 'O' in the penultimate line, whether it is a part of the code or not, and the relevance of the second, crossed-out line. Was it a mistake on the part of the writer, or was it an attempt to underline (as the later letters seem to suggest)? Many amateur enthusiasts since have attempted to decipher the code, but with the ability to "cherry-pick" a cipher to suit the individual it is possible to read any number of meanings from the text.

====Previous Honours Projects====
This is the fourth year for this particular project, and each of the previous groups has contributed in their own way, providing a useful foundation for the studies undertaken this year and into the future.

=====2009=====
The original group from 2009, Andrew Turnbull and Denley Bihari, focused on the nature of the "code". In particular, they looked at whether it even warranted investigation - could it just have been the random ramblings of an intoxicated mind? Through surveying people, both drunk and sober, they established that the letters were significantly different from the frequencies generated by a population sample, and weren't likely to be random. Given the suggestion that the letters had some significance, they set about establishing what that significance was.  They considered the possibility that the letters represented the initial letters of an unordered list, and this was shown to have some merit. They also tested the theory that the letters were part of a transposition cipher - where the order of letters is simply swapped around, as in an anagram - but the relative frequencies of the letters (no 'e's whatsoever for example) suggested this was unlikely. After discounting transposition ciphers, they looked at the other main group of ciphers - substitution ciphers, where the letters of the original message are replaced with totally different ones. They were able to discount Vigenere, Playfair, Alphabet Reversal and Caesar ciphers, and examined the possibility the code used a one-time pad that was the Rubaiyat or the Bible, though these were eliminated as candidates. Finally, they attempted to establish the language the code was written in, and in a comparison of mainly Western-European languages found that English was the best match by scoring the languages using Hidden Markov Modelling. Based on this, they discovered that, with the ambiguity in some of the letters, the most likely intended sequence of letters in the code is actually:

WRGOABABD

WTBIMPANETP

MLIABOAIAQC

ITTMTSAMSTCAB <ref name= Final Report 2009> "Final Report 2009"; Turnbull, A and Bihari, D; https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_report_2009:_Who_killed_the_Somerton_man%3F</ref>

=====2010=====
In the following year, Kevin Ramirez and Michael Lewis-Vassallo verified that the letters were indeed unlikely to be random by surveying a larger number of people to improve the accuracy of the letter frequency analysis, with both more sober and more drunk participants. They further investigated the idea that the letters represented an initialism, testing against a wider range of texts and taking sequences of letters from the code to do comparisons with. They discovered that the Rubaiyat had an intriguingly low match, suggesting that it was made that way intentionally. From this they proposed that the code was possibly an encoded initialism based on the Rubaiyat.

The main aim for the group in 2010 was to write a web crawler and text parsing algorithm that was generalised to be useful beyond the scope of the course. The text parser was able to take in a text or HTML file and find specific words and patterns within that file, and could be used to go through a large directory quickly. The web crawler was used to pass text directly from the internet to the text parser to be checked for patterns. Given the vast amount of data available on the internet, this made for a fast method for accessing a large quantity of raw text to be statistically analysed. This built upon an existing crawler, allowing the user to input a URL directly or a range of URLs from a text file, mirror the website text locally, then automatically run it through the text parsing software. <ref name= Final Report 2010> "Final Report 2010"; Lewis-Vassallo, M and Ramirez, K; https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_report_2010</ref>

=====2011=====
In 2011, Steven Maxwell and Patrick Johnson proposed that there may already exist an answer to the code, it just hasn't been linked to the case yet. To this end, they further developed the web crawler to be able to pattern match directly, an exact phrase, an initialism or a regular expression, and pass the matches on to the user. When they tested their "WebCrawler", they did find a match to a portion of the code, "MLIAB". This turned out to be a poem with no connection to the case, but shows the effectiveness and potential for the WebCrawler. 

They also looked to expand the range of ciphers that were checked, and wrote a program that performed the encoding and decoding of a range of ciphers automatically, allowing them to test a wide variety of ciphers in a short period of time. This also lead them to use a "Cipher Cross-Off List" to keep track of which possibilities had been discounted, which were untested, and which had withstood testing to remain candidates. By using their "CipherGUI" and this cross-off list, they were able to test over 30 different ciphers, and narrow down the possibilities significantly - only a handful of ciphers were inconclusive, and given the time period in which the Somerton Man was alive, more modern ciphers could automatically be discounted. <ref name= Final Report 2011> "Final Report 2011"; Johnson, P and Maxwell, S; https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_report_2011</ref>

===Structure of this Report===
With so many disparate objectives, each a self-contained project within itself, the rest of the report has been divided into sections dedicated to each aspect. There is then a section on how the project was managed, including the project breakdown, timeline and budget, and the sections are finally brought back together in the conclusion, which also indicates potential future directions that this project could take.