Final Report 2011
Contents
- 1 Under Construction
- 2 Introduction
- 3 Background Theory
- 4 Structural and Statistical Investigation
- 5 Cipher Investigation
- 6 CipherGUI
- 7 Pattern Matcher
- 8 Web Crawler
- 9 System Integration
- 10 Web Crawler Investigation
- 11 Future Development
- 12 Project Management
- 13 Project Outcomes
- 14 References
- 15 See also
- 16 References and useful resources
- 17 Back
Under Construction
Construction Notespace:
- Due to be finished 23/10/11 11:59pm
- Following structure is according to my report (mostly) so if the content you're uploading isn't consistent with the title just change the title.
- questions.. Project Outcomes before/after Management
Executive Summary
Introduction
History
The Case
The Code
Technological Progress
Previous Studies
Project Objectives
At the beginning of the project several broad objectives were established. These were:
- Comprehensive Cipher Analysis
- Create an ability to custom search the web
In the Comprehensive Cipher Analysis section the aim is to examine as many ciphers as possible and determine if each cipher can be ruled out as being used in encrypting the Somerton Man Code. This process intends to contribute to the ongoing cipher examination of the code.
The second objective aims at creating the ability to custom search and analyse the vast amounts of data available on the web that provides greater control than the average internet search engine. The reasoning behind this is the theory that with the amount of data available on the web, the true meaning of the code could already be written somewhere. Thus by exhaustively searching for distinct patterns evident in the code it may be possible to directly identify parts of the underlying message. This objective also includes the design aim of making the software flexible in that it could accept many different search patterns; providing applications beyond the scope of the Somerton Man investigation.
It should be noted that the project does not set the objectives of cracking the code nor solving the case. The code has not been solved in over 6 decades of attempts, so while the project does hope to shed some light on the meaning behind the code; the success of the project does not hinge on the code being cracked or the case being solved.
Extended Objectives
Group meetings throughout the course of the project identified areas on which the project objectives could be extended.
In the Cipher Analysis section a lot of software was being written for the process of investigating cipher links to the code. Rather than archive this software after each cipher was examined, an objective was set to utilise it by creating a centralised cipher analysis tool intuitively implementing numerous ciphers.
With the aim of designing web search software with numerous applications a need for a user-friendly interface was identified. This spawned the objective of creating an interactive user-friendly GUI from which to run the search mechanism from and present results through.
Finally, with these useful software applications, the goal was set to release them to the public by making them available on the project's wiki page.
Structure of Remainder of Report
Background Theory
Cipher Analysis
Web Crawler
Structural and Statistical Investigation
Concept
Given the fact the code was found in the back of the book of poems, Rubaiyat of Omar Khayyam, there remains suspicion that the code is somehow linked to the contents of the Rubaiyat. If this were the case, a cipher analysis may not even be necessary. This theory has been investigated by testing three hypotheses through statistically and structurally analysing the poems in the Rubaiyat.
Hypotheses
- The code is an initialism of a poem in the Rubaiyat
- Based on previous studies indicating an English initialism and the fact the code has four (un-crossed out) lines, with each poem being a quatrain (four line poem).
- The code is related to the initial letters of each word, line or poem
- Based on previous studies indicating an English initialism.
- The code is generally related to text in the Rubaiyat
- Based on the links between the Rubaiyat and the code.
Technical Challenges
The two main challenges in this analysis revolve around the source material.
- Code Ambiguities
- Sample Size
Code ambiguities refer to difficulties in determining which letters some of the handwritten symbols in the code represent; a challenge created by the untidy handwriting. Sample Size refers to the issues encountered due to the limited sample of 44 letters we have to analyse from the code.
Design of Tests
The approach to testing these hypotheses varied, although each used Java text parsing and statistic-gathering code. The first hypothesis was tested through statistically analysing the structure of the Rubaiyat poems and comparing to the Somerton Man code structure. The second and third hypotheses were tested through analysing letter frequencies in the poems using software and comparing these results to Somerton Man code letter frequencies. In the case of Hypothesis 2 frequency data was gathered on the first letter of each poem, the first letter of each line and the initial letter of each word. The third hypothesis similarly analysed letter frequencies of all letters in the Rubaiyat.
Results of Tests
Hypothesis 1: The code is an initialism of a poem. Statistics were gathered on the number of words in each line (first, second, third, fourth) of each poem. The statistics gathered include the mean number of words in each line, the standard deviation, the maximum number of words in a line and the minimum. The results categorized by line number in a Rubaiyat poem are shown in the table below, followed by the statistics from the Somerton Man’s code.
Line | Mean | Std Dev | Max | Min |
First | 8.00 | 1.15 | 10 | 5 |
Second | 7.69 | 1.20 | 10 | 5 |
Third | 7.88 | 1.06 | 10 | 5 |
Fourth | 7.87 | 1.31 | 10 | 5 |
Line | Number of Letters |
First | 9 |
Second | 11 |
Third | 11 |
Fourth | 13 |
The important result is the maximum number of words in the poem lines. Each line category has a maximum number of words of 10 across all of the 75 poems contained in the Rubaiyat. However, the code has 11, 11 and 13 letters in its second, third and fourth lines respectively, each over the maximum. These results allow Hypothesis 1 to be ruled out, giving the conclusion that the code is not an initialism of a Rubaiyat poem.
Hypothesis 2: The code is related to the initial letters of each word, line or poem. Letter frequency data was gathered on the first letter of each poem, of each line and of each word. This data is plotted against average English initial frequencies and the code letter distribution.
A link between poem initials or line initials and the code can be trivially ruled out. There is a ‘G’ in the code but no line or poem starts with a ‘G’ in the entire Rubaiyat. A link between all initial letters in the Rubaiyat and the code is more difficult to rule out. There is a generally good correlation between English initials and initials in the Rubaiyat (graphed in light blue) as might be expected, but there are significant discrepancies when compared to the code, such as the code clearly having a greater proportion of A’s, B’s and M’s. While a link cannot be ruled out due to the small sample size of the code (44 letters), for the purposes of this project a link has been ruled unlikely.
Hypothesis 3: The code is generally related to the text in the Rubaiyat. This hypothesis was tested by adapting the Java text parser code to generate letter frequency plots for the all letters in the Rubaiyat poems. The results are displayed in the graph below.
While there is very good correlation between the Rubaiyat poems and English text, the letter frequency of the code is substantially different, with significantly larger proportions of M’s, A’s and B’s. Again the sample size of 44 letters for the code restricts our ability to make a conclusion, but for our purposes there is enough evidence to discount a link.
Conclusions
The rejection of these three hypotheses indicates there is no direct (unencrypted) link between the code and the Rubaiyat, disregarding the weaknesses surrounding the assumptions required with ambiguous letters and the small sample size. This result, combined with the 2009 and 2010 results indicating the code was not random[1][2], led to the conclusion that the project did require a comprehensive cipher analysis. It should be noted that this conclusion doesn’t rule out all links between the code and the Rubaiyat; just unencrypted links.
Cipher Investigation
Concept
Previous Work
Technical Challenges
Methodology
Results
Cipher | Test techniques | Status | Student |
ADFGVX | Structural analysis by inspection | Disproven | Steven |
Affine | Direct decryption | Disproven | Patrick |
CipherGUI
Concept
Technical Challenges
Design
Implementation
Testing
Pattern Matcher
Concept
Previous Work
Technical Challenges
Design
Implementation
Testing
Web Crawler
Concept
Previous Work
Technical Challenges
Design
Implementation
Testing
System Integration
Concept
Previous Work
Technical Challenges
Design
Implementation
Testing
Web Crawler Investigation
Concept
Technical Challenges
Design
Results
Future Development
Cipher Analysis
Web Crawler
Project Management
Timeline
Role Allocation
Review Process
Budget
Risk Management
Project Outcomes
Significance and Innovations
Strengths and Weaknesses
Conclusion
References
- ↑ https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_report_2009:_Who_killed_the_Somerton_man%3F
- ↑ https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_Report_2010
See also
- Glossary
- Cipher Cross-off List
- Stage 1 Design Document 2011
- Stage 2 Progress Report 2011
- Final Seminar Recording 2011 Part 1
- Final Seminar Recording 2011 Part 2
- Final Seminar Recording 2011 Part 3
References and useful resources
- Final Report 2010
- Final report 2009: Who killed the Somerton man?
- Timeline of the Taman Shud Case
- The taman shud case
- Edward Fitzgerald's translation of رباعیات عمر خیام by عمر خیام
- Adelaide Uni Library e-book collection
- Project Gutenburg e-books
- Foreign language e-books
- UN Declaration of Human Rights - different languages
- Statistical debunking of the 'Bible code'
- One time pads
- Analysis of criminal codes and ciphers
- Code breaking in law enforcement: A 400-year history
- Evolutionary algorithm for decryption of monoalphabetic homophonic substitution ciphers encoded as constraint satisfaction problems