Difference between revisions of "Semester A Progress Report 2012"

From Derek
Jump to: navigation, search
(References and useful resources)
(References and useful resources)
Line 116: Line 116:
 
- Cipher GUI
 
- Cipher GUI
  
Appendix
+
==Appendix==
Frequency Analysis
+
===Frequency Analysis===
[[Image:Initial Freq.png]]
+
+
Figure 1 – Results from analysis of dictionary
+
  
 +
[[Image:Dictinoary Freq Anal.png|Figure 2 - Results from analysis of dictionary]]
 +
 +
[[Image:Initial Freq.png|Figure 2 - Results from analysis of initial letters from text]]
 
   
 
   
Figure 2 – Results from analysis of initial letters from text
+
 
 
Standard Operating Procedure for the use of the DAVID-Laserscanner
 
Standard Operating Procedure for the use of the DAVID-Laserscanner
  

Revision as of 00:38, 2 June 2012

Thomas Stratfold

June 1, 2012

Introduction

This progress report is a summary of the work in which Aidan Duffy and myself have achieved over the first semester of our Final Year Honours Project for the School of Electrical and Electronic Engineering, under the supervision on Derek Abbott and Matthew Berryman.

The overall goals of this project is to decipher the code found in association with the Somerton Man, and use this to identify the victim and ultimately solve the case. In order to do this we first must determine what the code is, what cipher is (if any) is used, what language it was written in or whether it is just a series of random letters.

The secondary aspects of the project is through the identification of the Somerton Man, this is a new aspect to the project and will be achieved through creating a 3D model of the victim.

The techniques and programs we have been using have been designed to be general, so that they could easily be applied to other cases in used in situations beyond the aim of this project.

Background

The Case

At 6:30am on December 1st 1948, a man was found deceased on Somerton Beach, South Australia, resting on the rock wall at the top of the beach. The victim contained no form of identification and his fingerprints and dental records didn't match any international registries. The only items on his body were some cigarettes, chewing gum, a comb, an unused train ticket and a used bus ticket.

The report from the autopsy identified the man's stomach and kidneys were congested and there was excess blood in his liver; this suggested that his death was unnatural, and most likely the cause of an unknown poison. 44 years later, in 1994 under a review of the case, it was suggested that the death fits that of the poison digitalis.

A month and a half later, a suitcase was found left at Adelaide Railway Station and believed to belong to the victim. However none of these items contained any further clues on the identity of the man or his killer.

The Code

One other item was found on the victim’s body, inside a sewn up pocket of his trousers, a small piece of paper torn form a book with the words "Tamam Shud". Translated from Persian this means ended or finished. Which can be found on the last page of the book called The Rubaiyat of Omar Khayyam. On November 30th 1948, a man in Glenelg found this book left in the backseat of his car, testing later on confirmed that the paper found on the victim matched this book.

In the back of this book, written in pencil were five lines of capital letters, with the second line crossed out:

WRGOABABD MLIAOI WTBIMPANETP MLIABOAIAQC ITTMTSAMSTGAB

The similarity between the second line and the fourth line indicates that a mistake was made, which increases the likelihood that the lines are in fact a code. However over the years no one has yet been able to determine the meaning or purpose of the code.

Over the years there have been multiple attempts at identify and deciphering the code, however there si yet to be any acceptable results. One notable attempt was made in 1978, by the Australian Defence Force, who conducted analysis the code and stated that there wasn't enough symbols to provide a pattern, the symbols could be a complex substitution code, or a meaningless response to a disturbed mind but ultimately were unable to provide a satisfactory answer.

This leads to the motivation behind this project, as there have been 60 years, and three years worth of projects run by the University yet the code still remains unsolved.

Previous Year's Work

This is now the fourth year of the project, and the previous three groups have provided some valuable insight into the case, and this is the basis on what we have built on.

In 2009, the group established the letters were not random, the code wasn't a transposition cipher, and the code was consistent with representing the first letter of words.

In 2010, the group continued along the first letter path, and compared the code against particular texts. However they were unsuccessful in their results, with the largest matches coming from The Rubaiyat. They also developed a simple web application and pattern matcher, which was designed to download and search the contents of webpages looking for patterns; this was then compared with the Somerton code.

In 2011, the group expanded the web application and created the web crawler that would search the Internet by itself. They also focused on various ciphers and cryptographic techniques that may have been used to generate the code.

All three groups also worked on a Cipher Cross-off List; this list contains ciphers and encryptions and these have been systematically tested and crossed off using frequency distribution and through decoding methods. There are currently more then 30 ciphers, which have been disproved, showing the method used for each of these.

Group Members

This year the project has myself, Thomas Stratfold, a Bachelor of Engineering (Telecommunications) student, and Aidan Duffy, a double degree Bachelor of Engineering (Electrical and Electronic Engineering) and Bachelor of Economics student. With the new aspect of the project, the 3D reconstruction, we decided to work together on it. However with the other aspects we decided to separate, Aidan would focus on the Web Crawler while I was going to focus on the Language analysis and cipher cross off.

2012 Progress

This year the main focus of the project is on the identification of the victim through a 3D reconstruction of a bust taken months after the victim's death. The other main focus of this year is on verifying and improving on the work of previous years; this has been done through expanding the Web Crawler and testing the code against more languages.


3D Reconstruction

The 3D reconstruction is a new aspect to the project, which we have been exploring, as a hope to provide a reconstructed image of the victim's face in the hopes of finally identifying him.

The first part of this aspect of the project involves scanning the bust, creating a model on the computer, modifying the model to undistorted the image and correct any changes to the face caused from the post modem. After this, it involves adding colour into the image, so that the model would look more realistic and would make it easier to identify the victim.

The first part of the semester involved determining how we would create the model; we begin by testing out a few easily available 3D modelling software, such as 123D and PM Scanner. Neither of these was very efficient or easy to use, and the examples did not appear to be of very high quality. As a result we decided to instead use a low-cost “David Laserscanner”, which is an easy to use kit that allows for 3D modelling. Since this kit involves the use of a 5mW laser line, which could have the potential to cause harm if handled incorrectly. As a result we were required to create a Risk Assessment and Standard Operating Procedure, included in the appendix, before we were allowed to purchase the kit.

My contribution to this aspect of the project involved meeting with one of the schools research engineers, Henry Ho, along with Aidan, and to create the risk assessment, which would be used in conjunction with the laser scanner.

We are still in the process of waiting for the David Laserscanner kit to arrive, so have been unable to complete any further progress in the area.


Web Crawler

The web crawler created in 2011 is already functional, however it takes a very long time to scan web pages, and as a result would take months to check even a fraction of the Internet. The idea behind this aspect of the project was to modify the crawler so that it would use pre-indexed data; the data search engines use to speed up searches.

Aidan did majority of the work conducted in this area.

Language Verification

In previous years there has not been a lot of focus on what language the code could be written in. In 2009 they tested 10 languages and came to the assumption that English fit the best. This year we have decided to verify this assumption by testing more languages and performing further analysis. This has been my main focus for contribution towards the project.

To start this analysis, it began with reviewing previous years work; previously they tested languages by using a frequency analysis of the most used letters, and compared this with the code. To test this we first went through the dictionary and calculated the number of words that begins with each letter of the alphabet. This was then converted into a frequency analysis and compared to the Somerton code. The results from this were not very promising, as there was very little matching between the two lots of frequencies. The results are shown in figure 1 of the appendix.

Considering the previous years had mainly focused on Western-European languages, and records had shown the victim was most likely of Eastern-European decent, we decided to expand and consider more languages. I was able to find a fantastic web site, which had the Tower of Babel bible passage translated into over a hundred different languages. Using this site, I was able to get 85 different languages that contained approximately 1000 characters in each translation. This allow for frequency analysis to be performed by running a Java program which I had modified to calculate the number of times each letter appeared in the texts. This analysis was then repeated; by calculating the number of times each letter was the initial letter of a word.

The results from this were quite interesting. When we compared the initial letter frequency results with the Somerton code, by computing the difference in frequency and standard deviation. The results showed that English was the third closest, behind two Philippians’ languages Ilocano and Tagalog. This helps to support the theory that the code is in English since it has such a high frequency comparison to the code. A table of these results is included within the appendix.

Further analysis need to be done in this area, the best thing to do would be to tae the top 20 languages and perform more frequency analysis using longer texts to refine the results. As a result of the data we have collected, we will be constructing a Language Cross-Off List, similar to the cipher version that will explain why we have discounted each of the languages tested.

Overall Progress

Overall we have made a solid start on the project, and have started to get back onto track with our original schedule. Currently the only delay we are facing is coming from accessing the laser scanner, so have been unable to achieve much in the 3D reconstruction. This will be our main focus for next semester, along with expanding the web crawler to be more user friendly and hopefully to be able to use indexed data to search for patterns on the Internet. We will continue looking into the languages and will also try to analysis and disprove more ciphers from the list.








References and useful resources

- Frequency analysis of texts

- Previous work - Languages tested

- Previous work - Web based application

- Previous work - Web crawler - Pattern Matcher - Cipher GUI

Appendix

Frequency Analysis

Figure 2 - Results from analysis of dictionary

Figure 2 - Results from analysis of initial letters from text


Standard Operating Procedure for the use of the DAVID-Laserscanner

Back