Editing
Final Report 2011
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Structural and Statistical Investigation== ===Concept=== An analysis of the text in the Rubaiyat of Omar Khayyam was conducted to investigate if the code had been derived directly from the collection of poems within this book. Three different investigative directions were followed in a statistical analysis of each poem’s structure. The results from this preliminary analysis were used to decide if a subsequent cipher investigation was required in 2011. One of three conclusions reached in the 2009 honours project identified that the letters in the code are consistent with an English initialism. In acknowledgement of the connection between the code and the Rubaiyat of Omar Khayyam, three hypotheses were derived to statistically investigate initialisms within this book. The cipher investigation component of the 2011 project was initiated with this analysis, as in accordance with the project objectives stipulated above, the results determined if a cryptologic cipher investigation was actually necessary. The three hypotheses that were considered: ===Hypotheses=== # The code is an initialism of a poem in the Rubaiyat #* Based on previous studies indicating an English initialism and the fact the code has four (un-crossed out) lines, with each poem being a quatrain (four line poem). # The code is related to the initial letters of each word, line or poem #* Based on previous studies indicating an English initialism. # The code is generally related to text in the Rubaiyat #* Based on the links between the Rubaiyat and the code. ===Technical Challenges=== The two main challenges in this analysis were with the source material. # Code Ambiguities # Sample Size Code ambiguities refer to difficulties in determining which letters some of the handwritten symbols in the code represent; a challenge created by the untidy handwriting. Sample Size refers to the issues encountered due to the limited sample of 44 letters we have to analyse from the code. ===Design of Tests=== The approach to testing these hypotheses varied, although each used Java text parsing and statistic-gathering code. The first hypothesis was tested through statistically analysing the structure of the Rubaiyat poems and comparing to the Somerton Man code structure. The second and third hypotheses were tested through analysing letter frequencies in the poems using software and comparing these results to Somerton Man code letter frequencies. In the case of Hypothesis 2 frequency data was gathered on the first letter of each poem, the first letter of each line and the initial letter of each word. The third hypothesis similarly analysed letter frequencies of all letters in the Rubaiyat. ===Results of Tests=== '''Hypothesis 1: The code is an initialism of a poem.''' Statistics were gathered on the number of words in each line (first, second, third, fourth) of each poem. The statistics gathered include the mean number of words in each line, the standard deviation, the maximum number of words in a line and the minimum. The results categorized by line number in a Rubaiyat poem are shown in the table below, followed by the statistics from the Somerton Man’s code. <center>'''Table 1: Letters per Line in Rubaiyat Poems'''</center> {|border="1" cellspacing="0" style="text-align:center; margin: 1em auto 1em auto" |- style="color:white; background:MidnightBlue; font-weight:bold" | width="60" | Line || width="60" | Mean || width="60" | Std Dev|| width="60" | Max || width="60" | Min |- | style="color:white; font-weight:bold; background:#4682B4" | First || styl" |8.00 || 1.15 || 10 || 5 |-style="background:#DCDCDC" | style="color:white; font-weight:bold; background:#4682B4" | Second || 7.69 || 1.20 || 10 || 5 |- | style="color:white; font-weight:bold; background:#4682B4" | Third|| 7.88 || 1.06 || 10 || 5 |-style="background:#DCDCDC" | style="color:white; font-weight:bold; background:#4682B4" | Fourth|| 7.87 || 1.31 || 10 || 5 |- |} <center>'''Table 2: Letters per Line in Code'''</center> {|border="1" cellspacing="0" style="text-align:center; margin: 1em auto 1em auto" |- style="color:white; background:MidnightBlue; font-weight:bold" | width="100" | Line || width="150" | Number of Letters |- | style="color:white; font-weight:bold; background:#4682B4" | First || 9 |-style="background:#DCDCDC" | style="color:white; font-weight:bold; background:#4682B4" | Second || 11 |- | style="color:white; font-weight:bold; background:#4682B4" | Third|| 11 |-style="background:#DCDCDC" | style="color:white; font-weight:bold; background:#4682B4" | Fourth|| 13 |- |} The important result is the maximum number of words in the poem lines. Each line category has a maximum number of words of 10 across all of the 75 poems contained in the Rubaiyat. However, the code has 11, 11 and 13 letters in its second, third and fourth lines respectively, each over the maximum. These results allow Hypothesis 1 to be ruled out, giving the conclusion that the code is not an initialism of a Rubaiyat poem. '''Hypothesis 2: The code is related to the initial letters of each word, line or poem.''' Letter frequency data was gathered on the first letter of each poem, of each line and of each word. This data is plotted against average English initial frequencies and the code letter distribution. [[Image:CombindeInitialPlots.png|650px|center|All, Line and Poem Initials]] <center>'''Figure 5 - Letter frequency of initial letters in the Rubaiyat of Omar Khayyam'''</center> A link between poem initials or line initials and the code can be trivially ruled out. There is a ‘G’ in the code but no line or poem starts with a ‘G’ in the entire Rubaiyat. A link between all initial letters in the Rubaiyat and the code is more difficult to rule out. There is a generally good correlation between English initials and initials in the Rubaiyat (graphed in light blue) as might be expected, but there are significant discrepancies when compared to the code, such as the code clearly having a greater proportion of A’s, B’s and M’s. While a link cannot be ruled out due to the small sample size of the code (44 letters), for the purposes of this project a link has been ruled unlikely. '''Hypothesis 3: The code is generally related to the text in the Rubaiyat.''' This hypothesis was tested by adapting the Java text parser code to generate letter frequency plots for the all letters in the Rubaiyat poems. The results are displayed in the graph below. [[Image:FullLetterFreqPlot.png|650px|center|All initials]] <center>'''Figure 6 - Letter frequency of all letters in the Rubaiyat of Omar Khayyam'''</center> While there is very good correlation between the Rubaiyat poems and English text, the letter frequency of the code is substantially different, with significantly larger proportions of M’s, A’s and B’s. Again the sample size of 44 letters for the code restricts our ability to make a conclusion, but for our purposes there is enough evidence to discount a link. ===Conclusions=== The rejection of these three hypotheses indicates there is no direct unencrypted link between the code and the Rubaiyat, disregarding the weaknesses surrounding the assumptions required with ambiguous letters and the small sample size. This result, combined with the decision that the code was not random<ref name=FinalReport2009>''Final Report 2009'', Bihari, Denley and Turnbull, Andrew, 2009, https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_report_2009:_Who_killed_the_Somerton_man%3F</ref><ref name=FinalReport2010>''Final Report 2010'', Ramirez, Kevin and Lewis-Vassallo, Michael, 2010, https://www.eleceng.adelaide.edu.au/personal/dabbott/wiki/index.php/Final_Report_2010</ref> from the [[Final report 2009: Who killed the Somerton man?|2009]] and [[Final Report 2010|2010]] Honours Projects led to the conclusion that a comprehensive cipher analysis was required in the 2011 project. It should be noted that these results do not rule out all connections between the code and the Rubaiyat; just unencrypted ones.
Summary:
Please note that all contributions to Derek may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Derek:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information