Final Report 2011

From Derek
Revision as of 20:29, 22 October 2011 by A1162034 (Talk | contribs)

Jump to: navigation, search

Under Construction

Construction Notespace:

  • Due to be finished 23/10/11 11:59pm
  • Following structure is according to my report (mostly) so if the content you're uploading isn't consistent with the title just change the title.
    • questions.. Project Outcomes before/after Management

Executive Summary

Introduction

History

The Case

The Code

The code found in the back of the Rubaiyat linked to the Somerton Man.

Technological Progress

Previous Studies

Project Objectives

Extended Objectives

Structure of Remainder of Report

Background Theory

Cipher Analysis

Web Crawler

Structural and Statistical Investigation

Concept

Given the fact the code was found in the back of the book of poems, Rubaiyat of Omar Khayyam, there remains suspicion that the code is somehow linked to the contents of the Rubaiyat. If this were the case, a cipher analysis may not even be necessary. This theory has been investigated by testing three hypotheses through statistically and structurally analysing the poems in the Rubaiyat.

Hypotheses

  1. The code is an initialism of a poem in the Rubaiyat
    • Based on previous studies indicating an English initialism and the fact the code has four (un-crossed out) lines, with each poem being a quatrain (four line poem).
  2. The code is related to the initial letters of each word, line or poem
    • Based on previous studies indicating an English initialism.
  3. The code is generally related to text in the Rubaiyat
    • Based on the links between the Rubaiyat and the code.

Technical Challenges

The two main challenges in this analysis revolve around the source material.

  1. Code Ambiguities
  2. Sample Size

Code ambiguities refer to difficulties in determining which letters some of the handwritten symbols in the code represent; a challenge created by the untidy handwriting. Sample Size refers to the issues encountered due to the limited sample of 44 letters we have to analyse from the code.

Design of Tests

The approach to testing these hypotheses varied, although each used Java text parsing and statistic-gathering code. The first hypothesis was tested through statistically analysing the structure of the Rubaiyat poems and comparing to the Somerton Man code structure. The second and third hypotheses were tested through analysing letter frequencies in the poems using software and comparing these results to Somerton Man code letter frequencies. In the case of Hypothesis 2 frequency data was gathered on the first letter of each poem, the first letter of each line and the initial letter of each word. The third hypothesis similarly analysed letter frequencies of all letters in the Rubaiyat.

Results of Tests

Hypothesis 1: The code is an initialism of a poem. Statistics were gathered on the number of words in each line (first, second, third, fourth) of each poem. The statistics gathered include the mean number of words in each line, the standard deviation, the maximum number of words in a line and the minimum. The results categorized by line number in a Rubaiyat poem are shown in the table below, followed by the statistics from the Somerton Man’s code.

Table 1: Letters per Line in Rubaiyat Poems
Line Mean Std Dev Max Min
First 8.00 1.15 10 5
Second 7.69 1.20 10 5
Third 7.88 1.06 10 5
Fourth 7.87 1.31 10 5
Table 2: Letters per Line in Code
Line Number of Letters
First 9
Second 11
Third 11
Fourth 13

The important result is the maximum number of words in the poem lines. Each line category has a maximum number of words of 10 across all of the 75 poems contained in the Rubaiyat. However, the code has 11, 11 and 13 letters in its second, third and fourth lines respectively, each over the maximum. These results allow Hypothesis 1 to be ruled out, giving the conclusion that the code is not an initialism of a Rubaiyat poem.

Hypothesis 2: The code is related to the initial letters of each word, line or poem. Letter frequency data was gathered on the first letter of each poem, of each line and of each word. This data is plotted against average English initial frequencies and the code letter distribution.

All, Line and Poem Initials

A link between poem initials or line initials and the code can be trivially ruled out. There is a ‘G’ in the code but no line or poem starts with a ‘G’ in the entire Rubaiyat. A link between all initial letters in the Rubaiyat and the code is more difficult to rule out. There is a generally good correlation between English initials and initials in the Rubaiyat (graphed in light blue) as might be expected, but there are significant discrepancies when compared to the code, such as the code clearly having a greater proportion of A’s, B’s and M’s. While a link cannot be ruled out due to the small sample size of the code (44 letters), for the purposes of this project a link has been ruled unlikely.

Hypothesis 3: The code is generally related to the text in the Rubaiyat. This hypothesis was tested by adapting the Java text parser code to generate letter frequency plots for the all letters in the Rubaiyat poems. The results are displayed in the graph below.

All initials

While there is very good correlation between the Rubaiyat poems and English text, the letter frequency of the code is substantially different, with significantly larger proportions of M’s, A’s and B’s. Again the sample size of 44 letters for the code restricts our ability to make a conclusion, but for our purposes there is enough evidence to discount a link.

Conclusions

The rejection of these three hypotheses indicates there is no direct (unencrypted) link between the code and the Rubaiyat, disregarding the weaknesses surrounding the assumptions required with ambiguous letters and the small sample size. This result, combined with the 2009 and 2010 results indicating the code was not random[1][2], led to the conclusion that the project did require a comprehensive cipher analysis. It should be noted that this conclusion doesn’t rule out all links between the code and the Rubaiyat; just unencrypted links.

Cipher Investigation

Concept

Previous Work

Technical Challenges

Methodology

Results

CipherGUI

Concept

Technical Challenges

Design

Implementation

Testing

Pattern Matcher

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

Web Crawler

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

System Integration

Concept

Previous Work

Technical Challenges

Design

Implementation

Testing

Web Crawler Investigation

Concept

Technical Challenges

Design

Results

Future Development

Cipher Analysis

Web Crawler

Project Management

Timeline

Role Allocation

Review Process

Budget

Risk Management

Project Outcomes

Significance and Innovations

Strengths and Weaknesses

Conclusion

References

  1. https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_report_2009:_Who_killed_the_Somerton_man%3F
  2. https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_Report_2010

See also

References and useful resources

Back