Cipher Cross-off List

From Derek
Revision as of 23:46, 19 September 2011 by A1162034 (Talk | contribs)

Jump to: navigation, search

Purpose

Previous studies into the Tamam Shud case have concluded that the mysterious code left behind is not just random letters; it is in fact a code. This raises the question: What code was used in encrypting this code? This page is aimed at addressing this question. The Cipher Cross-off list is a place where cipher schemes are listed that have been identified as potentially being used in creating the Somerton Man's code. As part of our project, we will be methodically investigating many of these listed ciphers to see if we can rule them out as being used in the encryption of the code.

Cipher Cross-off List

Substitution Ciphers

First Order Substitution Ciphers

Stream Ciphers

Substitution and Transposition Ciphers

Reasoning

The following section contains the explanations and/or proofs behind the ruled-out ciphers.

Random Sequence of Letters

As part of their work, the students undertaking this project in both 2009 and 2010 conducted surveys of both sober and intoxicated people to see if the letter frequencies obtained were similar to the letter frequencies evident in the Somerton Man's code. Neither of the groups' surveys were consistent with the code and subsequently it was concluded that it is not simply a random bunch of letters. The relevant sections of the previous groups' work can be seen at the following links:

Anagram/Transposition Cipher

By looking at letter frequency plots of various languages against the code, and by identifying other anomalies such as the existence of a 'Q' but no 'U' in the code, the Honours students in 2009 concluded the code did not use a Transposition Cipher alone. The relevant section of their report can be seen here.

ADFGVX Cipher

The ADFGVX Cipher was introduced to public knowledge in March 1918. It was used primarily by the German Army during World War One. The technique used for encryption produces a ciphertext that contains only the English letters A, D, F, G, V and X, hence the name. The ADFGVX Cipher uses transposition and substitution with bipartite fractionation. Given the lengthy methodology used for encryption and decryption, full details have not been provided. Further information can be accessed here.

A review of the Somerton Man code showed it contained 16 different English alphabet letters. Thus any cipher methodology that produces a ciphertext with fewer than 16 different letters can be trivially disproven. This includes the ADFGVX Cipher. It was also considered that false letters may have been used to mask the ADFGVX cipher. For this the relative frequency plot for the Somerton Man code was examined. A graph of the relative frequencies of letters in the Somerton Man code compared with the relative English letter frequencies is shown below. There is no evidence that the six letters expected in an ADFGVX ciphertext are more prevalent in the Somerton Man code than in the English language. The ADFGVX Cipher using a weak mask was consequently ruled out of the investigation. Furthermore apart from the letter “A”, the relative frequencies of the six letters is low, three do not even appear. Thus it was also concluded that a strong mask was not used. The ADFGVX was discounted from all further investigation following these results.

Frequency Analysis of code and English

Affine Cipher

Like the Shift Cipher, the Affine Cipher is a mono-alphabetic substitution cipher. It is commonly categorised as a block cipher with a length of 1. Each letter of the plaintext is therefore encrypted independently from the other letters.

Encryption method:

[math] e_k(x) = ax + b [/math]

Decryption method:

[math] d_k (y) = a^{-1}(y - b) [/math]

The rules that were examined for encryption and decryption are shown above, where x represents a given plaintext letter and y represents the corresponding ciphertext letter. Given that there was 25 possible values for b within the modulo 26 English Alphabet and 12 possibilities for a, the total number of possible keys for the Affine Cipher was 312. The limitation imposed on a is a consequence of requiring an inverse within the modulo 26 domain.

In 2011, a Java program was written to test all 312 possible key variations. The output from the program has been added to the Cipher Cross-off List wiki page and is available here. The results did not present any understandable text, and as such, the Affine Cipher has been ruled out of any future investigation. The results are available here.

Alphabet Reversal Cipher

The Alphabet Reversal cipher is a substitution cipher where A becomes Z, B becomes Y, C becomes X etc. This leads to the following encoding and decoding key (read in vertical order):

Plaintext Alphabet:   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cipher Alphabet   :   Z Y X W V U T S R Q P O N M L K J I H G F E D C B A

Thus, for example, "HELLO" becomes "SVOOL". This cipher was tested on the Tamam Shud code by the 2011 group. A small Java program has been written that takes input from the command line or from a text file and produces output in reversed form. The result of running a file containing the code through the program turns the input of:

MRGOABABD
MTBIMPANETP
MLIABOAIAQC
ITTMTSAMSTGAB

into

NITLZYZYW
NGYRNKZMVGK
NORZYLZRZJX
RGGNGHZNHGTZY

As can be seen, there can be no meaning deciphered from the alphabet-reversed text and thus we can rule out the Alphabet Reversal Cipher as being used in encrypting the Somerton Man's code.

Atbash Cipher

The Atbash Cipher was developed for the Hebrew alphabet, but can be modified to work with the English alphabet. It is identical to the Alphabet Reversal Cipher and is also a special case of the Affine Cipher. Both of these have already been disproven.

Auto-Key Cipher

The auto-key cipher is a stream cipher. Stream ciphers use a different key for every block as opposed to using the same key repeatedly as is the case with block ciphers. In the case of the Auto-Key cipher, the letters of the message are used as the input key stream apart from the first key letter which is normally chosen. The encryption and decryption methodologies that were tested are shown below where k represents the initial key, xi represents a given plaintext letter and yi represents the corresponding ciphertext letter.

Encryption method:

[math]e_k(x_i)= x_i + k_i = \begin{cases} x_1 + k & i = 1 \\ x_i + x_{i-1} & i \gt 1 \end{cases}[/math]

Decryption method:

[math]d_k(y_i)= y_i - k_i = \begin{cases} y_1 - k & i = 1 \\ y_i - x_{i-1} & i \gt 1 \end{cases}[/math]


Noticeably only addition is specified within the above formula for encryption however the 2011 project team identified that a variation to the methodology could be present. Specifically in place of the addition, subtraction could have been used. Both were considered in the investigation as well as alternation between addition and subtraction.

Furthermore since the input key stream depends on the message being encoded, the line order of the Somerton Man code is also important. Different line orders yield different message text and uncertainty within the code relating to the crossed out line was a concern in testing the Auto-Key methodology. To ensure thoroughness within the investigation different scenarios were considered. These were:

  • The original line order as appears under the Alphabet Reversal Cipher section
  • Swapping the order of the second and third lines within the original code. This is to consider the circumstance where a mistake was made when generating the code and the third line is a replacement for the crossed out line.
  • Each line separately with the same key used each time.

A Java program was written to test each of the above scenarios with all three possible variations of the decryption formula. Examination of the program’s output file by one of the project team members revealed there was no meaningful English text and as a consequence the Auto-Key Cipher has been ruled out as the technique used to generate the code of the Somerton Man. The output file has been uploaded and is available here.

Baconian Cipher

The Baconian Cipher is a binary-style code using only two letters or formatting variations. The code can also be hidden amongst apparently meaningless text. For example, consider the message "test" encoded using A and B representing binary values.

"test" then becomes BAABB AABAA BAABA BAABB if we use simple letter ordering. This can easily be hidden, for example, Bread can be baked about any ample bastion. Can it be a blessing around a hobby?

For another variation, instead of using A and B, we could use bolded and unbolded letters to represent the binary.

It is clear the Baconian Cipher wasn't used in the Somerton Man Code. Firstly, there is no obvious formatting and secondly the most occuring letters are "A", "T" and "M", which occur 8, 6 and 6 times. This hardly leaves room for much more than a two letter message.

Beaufort Cipher

Coming soon!... or is it???

Bifid Cipher

The Bifid Cipher was first published in 1901. A Polybius square is used with transposition for fractionation encryption. The fractionation that is achieved gives a dependency of each ciphertext character on two plaintext characters, like in the Playfair cipher assessed in 2009. Further information about the Bifid Cipher methodology can be found here.

To test the Bifid Cipher mechanism a known plain text was encoded and the resultant ciphertext was letter frequency analysed and compared to the Somerton Man code. A graph of the relative frequency of each English alphabet letter is shown below. The absence of the letter J in the case of the Bifid Cipher results is in accordance with the encryption methodology where the letters “I” and “J” are represented by the former only.

Bifid Cipher Frequency Analysis

Comparison of the Bifid encryption results with the Somerton Man code shows a weak correlation. The results for the Bifid Cipher case show a distribution between all possible ciphertext letters with a deviation significantly smaller than the Somerton Man code. These results were sufficient to conclude that the Bifid Cipher mechanism had not been used to generate the Somerton Man code. The conclusion is not definitive given the small sample size of the Somerton Man code. An interesting observation is that the letter “J” is absent in both results.

Book Cipher

The Book Cipher requires the sender and recipient of the message to have identical copies of a book or script of writing. The cipher works by replacing words of the plaintext by index numbers corresponding to the word location in the cipher book. For example, if the word "and" appears as the 15th letter in the book, any occurrences of 'and' can be replaced by the number '15' in the ciphertext. Another variation of the book cipher involves coding individual letters via the book, using the index of words starting with the relevant letter. Since the Somerton Man code is made of letters rather than numbers, we can rule out the Book Cipher.

Dvorak Encoding

Dvorak Encoding is a monoalphabetic substitution cipher based on the Dvorak Keyboard, a keyboard layout invented in 1936 with the goal of reducing finger distance travel to increase typing rates. The encoding works by simply typing the plaintext into a Qwerty keyboard using the Dvorak layout. As can be seen from the picture below, this would transform QWERTY to ',.PYF.

Dvorak Keyboard and Qwerty Keyboard

Decrypting Dvorak Encoding simply involves reversing the previous step - typing the ciphertext into a Dvorak Keyboard using Qwerty layout.

A slight variation on Dvorak Encoding is to type the plaintext into a Dvorak Keyboard using the Qwerty layout and decrypt by typing into a Qwerty Keyboard using the Dvorak layout. Both methods have been tested on the Somerton Man code, both producing no logical output (shown below). Thus Dvorak Encoding has been crossed off the list.

Plain Dvorak (De)Coding

MPIRAXAXE
MYXCMLAB.YL
MNCAXRACA'J
CYYMYOAMOYIAX

Alternate Dvorak (De)Coding

MOUSANANH
MKNGMRALDKR
MPGANSAGAXI
GKKMK;AM;KUAN

Four-square Cipher

The Four-square Cipher is similar to the Playfair Cipher and the Two-square Cipher in that it is a digraph cipher - it encrypts letters in pairs. This means that the output code should occur in even numbers. In the case of the Somerton Man's code, the lines consist of 9, 11, 11 and 13 letters - no even numbers. This would indicate that a simple digraph encryption technique, including the Four-square Cipher has not been used. Unless of course, null (padding) characters have been added to the end of each line...

Hill Cipher

The Hill Cipher was invented by Lester S. Hill in 1929. It is a polygraphic polyalphabetic substitution cipher based on linear algebra. The encryption and decryption methodologies that were tested are defined by the formulas shown below. Encryption method:

[math] e_A(x) = xA [/math]

Decryption method:

[math] d_A(x) = yA^{-1} [/math]

The matrix A is required to be invertible within the alphabet used, for English this is modulo 26.

Given it is a block cipher the results of the Index of Coincidence method were extremely helpful for analysis of the Hill Cipher. The 2011 investigation concluded that since the code was generated by hand, encryption key matrix sizes of 2x2 and 3x3 were most practically feasible. However reflection on the findings of the Index of Coincidence Method showed the most likely block lengths were 3 and 7. A block length of two, corresponding to a 2x2 encryption key matrix, was deemed unlikely. Furthermore a block length of two would yield a digraph cipher, meaning the ciphertext was generated in pairs. The Somerton Man code contains lines consisting of 9, 11, 11 and 13 letters. There are no even numbers of letters on any line. From this and the results of the Index of Coincidence method, it was concluded the Hill Cipher using a dimension two encryption key could be ruled out as the source of the Somerton Man code. As only the first of four lines was a multiple of three, a Hill Cipher using a dimension three encryption key was also ruled out.

Homophonic Substitution Ciphers

Homophonic Substitution Ciphers are substitution ciphers that use multiple symbols for more common letters (such as 'e') in an attempt to confuse cryptanalysts trying to crack the code through frequency analysis. For example, if 'e's occur 14% of the time, use 14 different symbols for 'e' during the encryption process. There are two reasons we can rule out a homophonic substitution cipher. Firstly, this sort of cipher requires a cipher alphabet of much greater size than the plaintext alphabet. From what we can see, the Somerton Man code simply uses the English alphabet. Secondly, homophonic substitution ciphers theoretically produce flat frequency distributions. The Somerton Man Code's letter frequency is clearly not flat, as can be seen below.

Code Letter Frequencies.png

Keyword Cipher

The Keyword Cipher is a simple mono-alphabetic substitution cipher, created by listing a keyword under the plaintext alphabet then filling in the remaining letters. Any repeated letters in the keyword are ignored. For example, using the keyword SOMERTON, the translation from the plaintext alphabet to the cipher alphabet becomes:

Plaintext Alphabet:   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cipher Alphabet   :   S O M E R T N A B C D F G H I J K L P Q U V W X Y Z

There is a very good possibility that a Keyword Cipher was used to generate the Somerton Man's code. The reasoning behind this viewpoint follows.

Firstly, note how towards the end of the alphabets above, the Cipher Alphabet and the Plaintext form become very similar due to the nature of the keyword encoding. With this in mind, examining the following letter frequency plot of the Code letter frequencies against English letter frequencies has a fascinating result - towards the end of the alphabet (S,T,U,V,W,X,Y,Z) there is good correllation between the code and English letter frequencies. However at the start of the alphabet there are vast discrepancies.

Letter frequencies of the Code against the English language.

This result is consistent with a Keyword-encoded cipher, where the keyword at the start of the cipher alphabet makes for significant displacements in letter substitutions, whereas, as the keyword ends and the remainder of the alphabet fills in to complete the cipher alphabet, the letter substitution displacements become closer and closer together, as evidenced by the example Plaintext and Cipher alphabets shown above.

While the letter frequency plots show there is a chance that the code was formed using the Keyword Cipher, it is difficult to confirm or rule it out. This is due to the limited number of letters in the Somerton Man's code we have to work with. Usually we would perform frequency analysis on the code to determine the keyword, however with only 44 letters to work with, frequency analysis will not be accurate.

If the Keyword Cipher was used in encrypting the code, we can make guesses as to what sort of keywords could have been used. The effect of a keyword in transforming the plaintext alphabet to the ciphertext alphabet is:

  • Create a random sequence at the beginning (according to the keyword) - e.g. plaintext A,B,C,D,E,F,G --> ciphertext K,E,Y,W,O,R,D
  • Shuffle the rest of the alphabet along at a decreasing rate as keyword letters are filtered out.

From this, we can guess that the keyword would contain no letters later than "T", as the following letters in the alphabet have a good frequency match to normal English frequencies - i.e. we want the shuffling along to be finished. Another guess we can make is that either the keyword is 4 letters long or the fifth letter is an 'A'. Either of these circumstances would cause ciphertext 'A' to line up with plaintext 'E', converting the code-frequency dominant A's into English frequency dominant plaintext E's.

Nihilist Cipher

The Nihilist Cipher utilises a Polybius square with a keyword to provide a mixed alphabet. The square is used to convert each plaintext letter as well as a second keyword different to the first, into a two digit number. The ciphertext is obtained by adding the plaintext letter values and the corresponding keyword letter values. The keyword numbers are repeated as required. An example of the Nihilist Cipher methodology can be seen here.

As the Nihilist Cipher mechanism produces a ciphertext consisting of only numerical values, use for the Somerton Man code can be trivially disproven given that the Somerton Man code is formed from only letters.

Null Cipher

The mysterious code found marked in the back of the Rubaiyat.

The Null Cipher is a form of Steganography - which involves hiding the real message rather than making it unintelligible as with substitution and transposition ciphers. In the Null Cipher, the plaintext of the message is disguised by blending it in with a large amount of 'null' or non-cipher text. For example, we can hide the message Hidden Words in the initial letters of Horrors in Derek's den entice naive wanderers. Ominous remains describe sacrifices.

Usually Null Cipher-encoded ciphertext is in easily understandable text so that it is not evident there is a hidden message inside and also long given the null letters padding out the text. The Somerton Man's code is relatively short so it does not appear to be a typical Null Cipher. However there is always a chance there is a small message, for example a place name, encoded in the text. There are many ways a message could be encoded, for example, if we look at the code in the picture to the right, there are several letters with distinctive markings that could form part of the plaintext: 'O' and 'C' in the third line and 'S' and 'S' in the fourth line.

While the Null Cipher cannot be completely ruled out, it is unlikely.

Rotary telephone dial face. Sourced from http://www.arctos.com/dial/

Number-Based Ciphers

Number codes include those such as simple letter-number substitutions and Phone Keypad encryption systems. All number-based ciphers can easily be discounted as the Somerton Man code is formed of letters, with no numbers.

Rotary Telephone Dial Cipher

Given the prevalence of rotary dial telephones in 1948 a cipher was considered that used their faces. A template of a typical rotary dial telephone can be seen to the left showing both the available numbers and corresponding letters. Importantly, the letter “Q” is absent. The code of the Somerton Man contains an instance of the letter “Q”, thus the face of a rotary dial telephone could not have been used as an encryption mechanism. As a consequence any cipher using the face of a rotary dial telephone has been disproven and will not be further investigated.

One Time Pad

The One-Time pad has been investigated by the 2009 group. They experimented using the Rubaiyat of Omar Khayyam (which is closely linked with the case) and the King James Bible (common at the time) and found no key to de-ciphering the code. Their conclusions can be seen here: 2009 One-Time Pad Conclusion

Pigpen Cipher

Pigpen Cipher is formed by replacing letters with relevant part of structure (sourced from Wikipedia)

The Pigpen Cipher is a simple geometric substitution cipher, known for its use by the Freemasons in the 1700s to keep their records secret. The cipher substitutes letters according to the diagram to the right. The resulting cipher is formed of geometric parts of the structure corresponding to the relevant letter - thus the cipher is formed of partial squares, triangles and dots rather than letters. The Somerton Man code is formed of letters and thus we can rule it out.

Playfair Cipher

The 2009 students concluded that the cipher used was not likely to be the Playfair Cipher based on an empirical test they performed. Their conclusion can be seen here: 2009 Playfair Cipher Conclusion

The Playfair Cipher has also been re-examined in 2011. The cipher was popularised by Baron Playfair of St Andrews, but it was actually invented in 1854 by Sir Charles Wheatstone; a pioneer of telegraph and inventor of the Wheatstone Bridge.

The Playfair Cipher is a digraph substitution cipher in which pairs of plaintext are replaced by pairs of ciphertext letters. The cipher involves the creation a 5x5 square (I and J share the same reference) using a keyword. For example, using the keyword Playfair we could create the following square:

P L A Y F
I R B C D
E G H K M
N O Q S T
U V W X Z

For each pair of letters, a rectangle is traced out in the Playfair square, and the plaintext pair is replaced by the letters in the alternate corners of the rectange. For example, if we encoded TH (as shown below) we would return the coded pair QM.

P L A Y F
I R B C D
E G|H K M|
N O|Q S T|
U V W X Z

There are three special cases that must be considered -

  • if the plaintext pair is the same letter repeated, an X is usually inserted between them
  • if the plaintext pair share the same row, the letters are replaced by those immediately to the right
  • if the plaintext pair share the same column, they are replaced by those directly below

Cracking the Playfair Cipher is easiest done by Frequency Analysis - analyse which digraphs occur most commonly in English (or the chosen language) and compare this to the distribution of digraphs in the ciphertext. Unfortunately, digraph frequency analysis requires a large amount of data to compare. The 22 pairs available in the Somerton Man's code do not provide sufficient data for this (although there are three "AB" pairs). This means we must try to analyse it in different ways.

There are two factors that point to the Playfair Cipher not being used in creating the Somerton Man code.

  • The cipher tends to have a flattening effect on on the frequency distribution of individual letters. The Somerton Man code has three letters (A, M, T) with frequencies well above 10%. 20 samples (the same size as the code) of English text were taken and encrypted. Of these, just 2 encrypted samples (i.e. 1/10) had three letters with 10% or greater frequencies. The other samples all had 0, 1 or 2 letters above the 10% mark. While this doesn't rule it out, it does make the use of the Playfair Cipher more unlikely.
  • Due to the nature of the cipher, there are never encrypted double letter pairs - i.e. you never see "EE" or "HH" paired letters. Looking at the pairing of the Somerton Man code, we see a "TT" towards the end. This would appear to rule out the Playfair Cipher as being used in forming the code.
MR GO AB AB DM TB IM PA NE TP ML IA BO AI AQ CI TT MT SA MS TG AB

Rail Fence Cipher

The Rail Fence Cipher (or zigzag cipher) was identified as a possibility because of the four distinct lines indicating it could be a 4-rail cipher. A Rail Fence Cipher involves writing out the unencrypted message in a zigzag and then reading it in rows to form the encrypted version. For example, take "Rail Fence Cipher" in a 3-rail cipher:


R   F   E   H    
 A L E C C P E   
  I   N   I   R  

The encrypted form is therefore: RFEH ALECCPE INIR.

We can discount the Rail Fence Cipher as being used in the Tamam Shud code for several reasons. Firstly, it is simply a transposition cipher and previous studies have shown the letter frequency plot is not consistent with a transposition. The presence of a 'Q' in the code without a 'U' also indicates it is unlikely to be a transposition cipher. The final indicator comes from testing the code itself:

   M     R     G     O     A     B     A     B     D 
  M T   B I   M P   A N   E T   P     
 M   L I   A B   O A   I A   Q C      
I     T     T     M     T     S     A     M     S     T     G     A     B

As we can see, the zigzags do not form recognisable words, and there are extra letters overflowing from the top and bottom lines.

Shift Cipher

The shift cipher is a mono-alphabetic cipher and in the most general description is literally a substitution cipher. Each letter is shifted by the same amount within the alphabet that is used and the modulo operator ensures any shift remains within the alphabet. One implementation of the shift cipher is famously known as the Caesar Cipher, but while the Caesar Cipher only uses one value for the key the following examination explores all available options. The encryption and decryption methodologies are shown below where x represents a given plaintext letter, y represents the corresponding ciphertext letter and k is the key, which is restricted by the size of the alphabet used.

Encryption method:

[math]e_k(x) = y = x + k [/math]

Decryption method:

[math]d_k(y) = x = y - k [/math]

The shift cipher has been tested in 2011 with the Java code that was used cycling through all 25 key options within the English Alphabet, reduced by one to remove the zero shift case. The results have been uploaded and can be found here. Since the results show no understandable text, the shift cipher has been removed from future project investigations.

It is worth noting that the results of this test will be confirmed with the future implementation of a testing procedure for the Affine Cipher corresponding to the case of a = 1.

Templar Cipher

The Templar Cipher is a variation of the Pigpen Cipher and was used by the Knights Templar some time after their founding in 1118. Letters of the plaintext are substituted with symbols in accordance with the diagram below. The letter “J” is encoded with the same symbol as the letter “I”. As ciphertexts generated using the Templar Cipher consist of only symbols and the Somerton Man code consists of only letters, the Templar Cipher can be removed from further investigation.

Templar Cipher alphabet to symbol conversion diagram (sourced from Wikipedia)

Trifid Cipher

The Trifid Cipher was invented in 1901 following publication of the Bifid Cipher. It extends the Bifid Cipher into a third dimension which consequently achieves fractionation that sees each ciphertext character dependent on three plaintext characters. Further information about the Trifid Cipher and example of the encryption methodology can be found here. As the Trifid Cipher requires 27 ciphertext letters, the full-stop was used for the additional character like in the reference material.

Since the Somerton Man code did not contain any characters beyond the traditional English Alphabet, the Trifid Cipher mechanism could not be trivially discounted. Testing therefore followed the same procedure as the Bifid Cipher; a known plaintext was encoded and the resultant ciphertext was letter frequency analysed and compared to the Somerton Man code. The relative frequency of each English Alphabet letter is shown in the graph below, with the “Dot” letter representing the 27th ciphertext character.

Trifid Cipher Frequency Analysis

The Trifid Cipher shows an approximately even distribution across all ciphertext letters. The Somerton Man code in comparison is sporadic, with the proportion of letters “A”, “B”, “M” and “T” much larger. From these results it was decided that the Trifid Cipher had not been used to generate the Somerton Man code however as was the case with the Bifid Cipher, the small sample size of the Somerton Man code prevents a definitive conclusion being reached.

Trithemius Cipher

The Trithemius Cipher is a polyalphabetic substitution cipher invented by Johannes Trithemius in the 15th century. It uses the Trithemius table which is shown below and consists of the 26 letters of the English alphabet forming the first line followed by 25 rows of the same letters shifted once to left each time (25 different shift ciphers).

Trithemius Cipher encryption table

Encryption of the plaintext message is achieved by using the top line as the plaintext guide and selecting a letter below from the same column. The first plaintext letter uses the first row of the table, the second letter uses the second row of the table and so on. At the 27th letter operation returns to the first line if needed. An example is the message “THIS IS SECRET”, which would be encoded as: “TIKV MX YLKAOE”.

The Trithemius Cipher has been tested with java code that deciphers the Somerton Man code directly. The results contained no understandable plaintext thus the Trithemius Cipher has been ruled out of further investigations.

Two-square Cipher

The Two-square Cipher is similar to the Playfair Cipher in that it is a digraph cipher - it encrypts letters in pairs. This means that the output code should occur in even numbers. In the case of the Somerton Man's code, the lines consist of 9, 11, 11 and 13 letters - no even numbers. This would indicate that a simple digraph encryption technique such as the Two-square Cipher has not been used.

VIC Cipher

The VIC Cipher was a cipher scheme issued by the Soviet Union. The version that was examined in the following investigation was the one adapted to the English language thus coinciding with the Somerton Man code. Further information about implementation of the VIC Cipher is available here. Use of the VIC Cipher to generate the Somerton Man code can be trivially disproved as its formula outputs ciphertext consisting only of numerical blocks of length five while the Somerton Man code contains only letters.

For completeness a conversion between numbers and letters was considered. Two cases were examined. The first that there was a two digit number representing each letter in the code and the second using the conventional representation of Z26 with A = 0, B = 1, etc. Both instances failed to produce a numerical representation that was a factor of five which would be inherent in the use of the VIC Cipher system. The possibility that dummy variables could have been used to pad the size were dismissed as too remote and it was decided no change would be made to the original conclusion that the VIC Cipher was not used.

As mentioned above the VIC Cipher scheme that was investigated was the version adapted to the English language thus there is an opportunity for future exploration of alternative languages.

Vigenere Cipher

Investigation of the Vigenere Cipher scheme in 2009 ruled that it had not been used to produce the code of the Somerton Man. Upon reviewing the findings, the 2011 project team concluded that further enquiries needed to be pursued before the Vigenere Cipher could be dismissed. An additional benefit of this investigation was the opportunity to use results to narrow testing of the Hill Cipher, covered below.

As mentioned in the 2009 Final Report, the Vigenere Cipher is a block cipher with the block length determined by the length of the keyword used. The 2009 investigation considered the keyword “LEMON” and thus their statistical evaluation considered a block length of five. They did not consider if another block length was used. The investigation in 2011 attempted to identify if there were any likely block lengths besides size five for the Somerton Man code and test the Vigenere Cipher in these instances. The block length testing results were not constrained to the Vigenere Cipher alone. As block length was considered statistically, the findings are applicable to any other block cipher mechanism.

The mechanism used to identify likely block lengths was the Index of Coincidence. It is defined as the probability that any two randomly chosen characters within a text (be that plaintext or ciphertext) are the same letter. Calculation of the probability for the modulo 26 case is shown below where fi denotes the number of occurrences of the i-th letter of the input text alphabet and n represents the number of letters in the message.

[math]I_c(x) = \frac{\displaystyle\sum_{}f_i(f_i - 1)}{n(n - 1)},[/math]

For a random string of text the Index of Coincidence gives Ic(x) = 26(1/26)2 = 0.038 and for a string of English text Ic(x) ≈ 0.065 is given. By dividing the ciphertext into blocks and calculating the indices of coincidence for each block, they can be compared to the English case. If there is a strong correlation between every block and the value for the English case above, the number of blocks can be seen as a likely block length. For this method to succeed it is critical when the ciphertext is divided into blocks that it is equivalent to writing the ciphertext as columns in a matrix of m rows, where m is the number of blocks. An example of this is shown below for three blocks and the ciphertext “SECRET MESSAGE HIDDEN”.


Block 1 => S R M S E D N

Block 2 => E E E A H D

Block 3 => C T S G I E


For the Somerton Man code the Index of Coincidence method was used to test all possible block lengths. With the code of length 44 this corresponded to the integer set 1 to 44. A MATLAB script was written to automate the testing. The results showed that if a block cipher was used to generate the Somerton Man code, lengths of 3 and 7 were most likely used.

Frequency analysis of the Vigenere Cipher was conducted using both likely block length candidates with code words of “SOS” and “WOOMERA” respectively. For consistency the same English text was used in each instance. A graph comparing the results to the Somerton Man code is shown below.

Frequency Analysis of Vigenere Cipher Cases

Comparison between the two Vigenere Ciphers show strong correlation. Only four letters have a respective difference in relative frequency exceeding 50%. Conversely, the Somerton Man code possesses characteristics that differ significantly from both the three letter and seven letter cases. Of the sixteen letters that appear in the Somerton Man code only the letter “o” can be favourably compared to the Vigenere Cases. The remaining fifteen letters do not share consistency.

These results are not completely conclusive given the short length of the Somerton Man code. They are however sufficient for the 2011 project team to remove the Vigenere Cipher from further enquiry.

See also

References and useful resources

If you find any useful external links, list them here:

Back