Final Report 2011

1 Under Construction
- 1.1 Executive Summary
2 Introduction
3 Background Theory
- 3.1 Cipher Analysis
- 3.2 Web Crawler
4 Structural and Statistical Investigation
5 Cipher Investigation
6 CipherGUI
7 Pattern Matcher
8 Web Crawler
9 System Integration
10 Web Crawler Investigation
11 Future Development
- 11.1 Cipher Analysis
- 11.2 Web Crawler
12 Project Management
13 Project Outcomes
14 References
15 See also
16 References and useful resources
17 Back

Under Construction

Construction Notespace:

Due to be finished 23/10/11 11:59pm
Following structure is according to my report (mostly) so if the content you're uploading isn't consistent with the title just change the title.
- questions.. Project Outcomes before/after Management

Executive Summary

Introduction

History

The Case

The Code

The code found in the back of the Rubaiyat linked to the Somerton Man.

Technological Progress

Previous Studies

Project Objectives

At the beginning of the project several broad objectives were established. These were:

Comprehensive Cipher Analysis
Create an ability to custom search the web

In the Comprehensive Cipher Analysis section the aim is to examine as many ciphers as possible and determine if each cipher can be ruled out as being used in encrypting the Somerton Man Code. This process intends to contribute to the ongoing cipher examination of the code.

The second objective aims at creating the ability to custom search and analyse the vast amounts of data available on the web that provides greater control than the average internet search engine. The reasoning behind this is the theory that with the amount of data available on the web, the true meaning of the code could already be written somewhere. Thus by exhaustively searching for distinct patterns evident in the code it may be possible to directly identify parts of the underlying message. This objective also includes the design aim of making the software flexible in that it could accept many different search patterns; providing applications beyond the scope of the Somerton Man investigation.

It should be noted that the project does not set the objectives of cracking the code nor solving the case. The code has not been solved in over 6 decades of attempts, so while the project does hope to shed some light on the meaning behind the code; the success of the project does not hinge on the code being cracked or the case being solved.

Extended Objectives

Group meetings throughout the course of the project identified areas on which the project objectives could be extended.

In the Cipher Analysis section a lot of software was being written for the process of investigating cipher links to the code. Rather than archive this software after each cipher was examined, an objective was set to utilise it by creating a centralised cipher analysis tool intuitively implementing numerous ciphers.

With the aim of designing web search software with numerous applications a need for a user-friendly interface was identified. This spawned the objective of creating an interactive user-friendly GUI from which to run the search mechanism from and present results through.

Finally, with these useful software applications, the goal was set to release them to the public by making them available on the project's wiki page.

Structure of Remainder of Report

Background Theory

Cipher Analysis

Web Crawler

Structural and Statistical Investigation

Concept

Given the fact the code was found in the back of the book of poems, Rubaiyat of Omar Khayyam, there remains suspicion that the code is somehow linked to the contents of the Rubaiyat. If this were the case, a cipher analysis may not even be necessary. This theory has been investigated by testing three hypotheses through statistically and structurally analysing the poems in the Rubaiyat.

Hypotheses

The code is an initialism of a poem in the Rubaiyat
- Based on previous studies indicating an English initialism and the fact the code has four (un-crossed out) lines, with each poem being a quatrain (four line poem).
The code is related to the initial letters of each word, line or poem
- Based on previous studies indicating an English initialism.
The code is generally related to text in the Rubaiyat
- Based on the links between the Rubaiyat and the code.

Technical Challenges

The two main challenges in this analysis revolve around the source material.

Code Ambiguities
Sample Size

Code ambiguities refer to difficulties in determining which letters some of the handwritten symbols in the code represent; a challenge created by the untidy handwriting. Sample Size refers to the issues encountered due to the limited sample of 44 letters we have to analyse from the code.

Design of Tests

The approach to testing these hypotheses varied, although each used Java text parsing and statistic-gathering code. The first hypothesis was tested through statistically analysing the structure of the Rubaiyat poems and comparing to the Somerton Man code structure. The second and third hypotheses were tested through analysing letter frequencies in the poems using software and comparing these results to Somerton Man code letter frequencies. In the case of Hypothesis 2 frequency data was gathered on the first letter of each poem, the first letter of each line and the initial letter of each word. The third hypothesis similarly analysed letter frequencies of all letters in the Rubaiyat.

Results of Tests

Hypothesis 1: The code is an initialism of a poem. Statistics were gathered on the number of words in each line (first, second, third, fourth) of each poem. The statistics gathered include the mean number of words in each line, the standard deviation, the maximum number of words in a line and the minimum. The results categorized by line number in a Rubaiyat poem are shown in the table below, followed by the statistics from the Somerton Man’s code.

Table 1: Letters per Line in Rubaiyat Poems

Line	Mean	Std Dev	Max	Min
First	8.00	1.15	10	5
Second	7.69	1.20	10	5
Third	7.88	1.06	10	5
Fourth	7.87	1.31	10	5

Table 2: Letters per Line in Code

Line	Number of Letters
First	9
Second	11
Third	11
Fourth	13

The important result is the maximum number of words in the poem lines. Each line category has a maximum number of words of 10 across all of the 75 poems contained in the Rubaiyat. However, the code has 11, 11 and 13 letters in its second, third and fourth lines respectively, each over the maximum. These results allow Hypothesis 1 to be ruled out, giving the conclusion that the code is not an initialism of a Rubaiyat poem.

Hypothesis 2: The code is related to the initial letters of each word, line or poem. Letter frequency data was gathered on the first letter of each poem, of each line and of each word. This data is plotted against average English initial frequencies and the code letter distribution.

A link between poem initials or line initials and the code can be trivially ruled out. There is a ‘G’ in the code but no line or poem starts with a ‘G’ in the entire Rubaiyat. A link between all initial letters in the Rubaiyat and the code is more difficult to rule out. There is a generally good correlation between English initials and initials in the Rubaiyat (graphed in light blue) as might be expected, but there are significant discrepancies when compared to the code, such as the code clearly having a greater proportion of A’s, B’s and M’s. While a link cannot be ruled out due to the small sample size of the code (44 letters), for the purposes of this project a link has been ruled unlikely.

Hypothesis 3: The code is generally related to the text in the Rubaiyat. This hypothesis was tested by adapting the Java text parser code to generate letter frequency plots for the all letters in the Rubaiyat poems. The results are displayed in the graph below.

While there is very good correlation between the Rubaiyat poems and English text, the letter frequency of the code is substantially different, with significantly larger proportions of M’s, A’s and B’s. Again the sample size of 44 letters for the code restricts our ability to make a conclusion, but for our purposes there is enough evidence to discount a link.

Conclusions

The rejection of these three hypotheses indicates there is no direct (unencrypted) link between the code and the Rubaiyat, disregarding the weaknesses surrounding the assumptions required with ambiguous letters and the small sample size. This result, combined with the 2009 and 2010 results indicating the code was not random^[1]^[2], led to the conclusion that the project did require a comprehensive cipher analysis. It should be noted that this conclusion doesn’t rule out all links between the code and the Rubaiyat; just unencrypted links.

Cipher Investigation

Concept

Previous Work

Technical Challenges

Methodology

Results

Cipher	Test techniques	Status	Student
ADFGVX	Structural analysis by inspection	Disproven	Steven
Affine	Direct decryption	Disproven	Patrick

CipherGUI

Concept

Technical Challenges

Design

Implementation

Testing

Pattern Matcher

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

Web Crawler

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

System Integration

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

Web Crawler Investigation

Concept

Technical Challenges

Design

Results

Future Development

Cipher Analysis

Web Crawler

Project Management

Timeline

Role Allocation

Review Process

Budget

Risk Management

Project Outcomes

Significance and Innovations

Strengths and Weaknesses

Conclusion

References

References and useful resources

Back

[1] ttps://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_report_2009:_Who_killed_the_Somerton_man%3F

[2] ttps://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_Report_2010

[1]

[2]

Final Report 2011

Contents

Under Construction

Executive Summary

Introduction

History

The Case

The Code

Technological Progress

Previous Studies

Project Objectives

Extended Objectives

Structure of Remainder of Report

Background Theory

Cipher Analysis

Web Crawler

Structural and Statistical Investigation

Concept

Hypotheses

Technical Challenges

Design of Tests

Results of Tests

Conclusions

Cipher Investigation

Concept

Previous Work

Technical Challenges

Methodology

Results

CipherGUI

Concept

Technical Challenges

Design

Implementation

Testing

Pattern Matcher

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

Web Crawler

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

System Integration

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

Web Crawler Investigation

Concept

Technical Challenges

Design

Results

Future Development

Cipher Analysis

Web Crawler

Project Management

Timeline

Role Allocation

Review Process

Budget

Risk Management

Project Outcomes

Significance and Innovations

Strengths and Weaknesses

Conclusion

References

See also

References and useful resources

Back

Navigation menu

Search