Difference between revisions of "Transition probabilities from selected texts"

From Derek
Jump to: navigation, search
(added "See also" section)
(More languages and wider variety of texts for existing languages)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
The Somerton Man's code (without the extra line) is 44 characters long. So, if the text is purely random (1/26 chance of each letter appearing) then the probability of attaining this particular string of 44 is (1/26)^44 = 5.51027E-63. This is a good initial comparison.
 
The Somerton Man's code (without the extra line) is 44 characters long. So, if the text is purely random (1/26 chance of each letter appearing) then the probability of attaining this particular string of 44 is (1/26)^44 = 5.51027E-63. This is a good initial comparison.
  
==First Order Transition Probabilities==
+
For transitions that have p=0, corrections to p=0.0001 have been performed to attain a non-zero Markov probability.
  
English (1984 - George Orwell)
+
HMMER score<ref>ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf Page 43</ref> is the log (base 2) of Markov probability / null probability (1/26^44)
<br/>Markov Probability: 1.4641414719132793E-67
+
<br/>Corrected Zeroes:  1
+
  
French (Les Orientales - Victor Hugo)
+
<!--
<br/>Markov Probability: 1.1571661202766258E-70
+
This software output has been formatted in html for quick entry into the project wiki
<br/>Corrected Zeroes:   2
+
-->
 
+
==First order==
Vigenere Cipher (1984 - George Orwell, Keyword LEMON)
+
===All letters===
 +
<br/>(..\Texts\English.txt) All Letters
 +
<br/>Markov Probability: 4.196215910162246E-70
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        -23.646530132315654
 +
<br/>
 +
<br/>(..\Texts\French.txt) All Letters
 +
<br/>Markov Probability: 4.562440416695874E-74
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        -36.8135257053655
 +
<br/>
 +
<br/>(..\Texts\German.txt) All Letters
 +
<br/>Markov Probability: 1.6093650169064557E-82
 +
<br/>Corrected Zeroes:    3
 +
<br/>HMMER Score:        -64.89226460456342
 +
<br/>
 +
<br/>(..\Texts\Spanish.txt) All Letters
 +
<br/>Markov Probability: 1.7633169297716054E-83
 +
<br/>Corrected Zeroes:   1
 +
<br/>HMMER Score:        -68.08239247668342
 +
<br/>
 +
<br/>(..\Texts\Italian.txt) All Letters
 +
<br/>Markov Probability: 2.8938109466376915E-92
 +
<br/>Corrected Zeroes:    9
 +
<br/>HMMER Score:        -97.26506645809364
 +
<br/>
 +
<br/>(..\Texts\Portuguese.txt) All Letters
 +
<br/>Markov Probability: 1.0172950778991843E-77
 +
<br/>Corrected Zeroes:    1
 +
<br/>HMMER Score:        -48.94437749826932
 +
<br/>
 +
<br/>(..\Texts\Dutch.txt) All Letters
 +
<br/>Markov Probability: 2.139309827314818E-84
 +
<br/>Corrected Zeroes:    1
 +
<br/>HMMER Score:        -71.12546693519374
 +
<br/>
 +
<br/>(..\Texts\Swedish.txt) All Letters
 +
<br/>Markov Probability: 4.7024882053115854E-77
 +
<br/>Corrected Zeroes:    2
 +
<br/>HMMER Score:        -46.735691382905166
 +
<br/>
 +
<br/>(..\Texts\Vigenere - 1984.txt) All Letters
 
<br/>Markov Probability: 1.646391769425068E-70
 
<br/>Markov Probability: 1.646391769425068E-70
<br/>Corrected Zeroes:   0
+
<br/>Corrected Zeroes:   0
 +
<br/>HMMER Score:        -24.99631136880728
 +
<br/>
 +
<br/>(Outputs\Playfair.out) All Letters
 +
<br/>Markov Probability: 5.213910076344393E-69
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        -20.01132524791302
 +
<br/>
 +
===Initial letters===
 +
<br/>(..\Texts\English.txt) Initial Letters
 +
<br/>Markov Probability: 5.755746003335865E-56
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        23.316377212971148
 +
<br/>
 +
<br/>(..\Texts\French.txt) Initial Letters
 +
<br/>Markov Probability: 1.960919656262944E-61
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        5.153264236029422
 +
<br/>
 +
<br/>(..\Texts\German.txt) Initial Letters
 +
<br/>Markov Probability: 7.441017498436695E-73
 +
<br/>Corrected Zeroes:    1
 +
<br/>HMMER Score:        -32.78590341697
 +
<br/>
 +
<br/>(..\Texts\Spanish.txt) Initial Letters
 +
<br/>Markov Probability: 3.204888821639082E-63
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        -0.7818480694202504
 +
<br/>
 +
<br/>(..\Texts\Italian.txt) Initial Letters
 +
<br/>Markov Probability: 6.089831612262369E-59
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        13.431992337096512
 +
<br/>
 +
<br/>(..\Texts\Portuguese.txt) Initial Letters
 +
<br/>Markov Probability: 2.2658166834014361E-60
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        8.683693049158332
 +
<br/>
 +
<br/>(..\Texts\Dutch.txt) Initial Letters
 +
<br/>Markov Probability: 6.9654365323502316E-68
 +
<br/>Corrected Zeroes:    1
 +
<br/>HMMER Score:        -16.27154908302583
 +
<br/>
 +
<br/>(..\Texts\Swedish.txt) Initial Letters
 +
<br/>Markov Probability: 2.6806903224847996E-64
 +
<br/>Corrected Zeroes:    2
 +
<br/>HMMER Score:        -4.361445908011734
 +
<br/>
 +
==Second order==
 +
===All letters===
 +
<br/>(..\Texts\English.txt) All Letters
 +
<br/>Markov Probability: 4.2148901763982914E-92
 +
<br/>Corrected Zeroes:    9
 +
<br/>HMMER Score:        -96.72254209077907
 +
<br/>
 +
<br/>(..\Texts\French.txt) All Letters
 +
<br/>Markov Probability: 1.9814441750241465E-90
 +
<br/>Corrected Zeroes:    9
 +
<br/>HMMER Score:        -91.16762862009702
 +
<br/>
 +
<br/>(..\Texts\German.txt) All Letters
 +
<br/>Markov Probability: 5.919358467581905E-105
 +
<br/>Corrected Zeroes:    14
 +
<br/>HMMER Score:        -139.41766153806188
 +
<br/>
 +
<br/>(..\Texts\Spanish.txt) All Letters
 +
<br/>Markov Probability: 3.342953425806875E-98
 +
<br/>Corrected Zeroes:    11
 +
<br/>HMMER Score:        -116.98848244535708
 +
<br/>
 +
<br/>(..\Texts\Italian.txt) All Letters
 +
<br/>Markov Probability: 6.083262400057097E-116
 +
<br/>Corrected Zeroes:    21
 +
<br/>HMMER Score:        -175.9194661728711
 +
<br/>
 +
<br/>(..\Texts\Portuguese.txt) All Letters
 +
<br/>Markov Probability: 4.2738731323579313E-94
 +
<br/>Corrected Zeroes:    13
 +
<br/>HMMER Score:        -103.34634923820269
 +
<br/>
 +
<br/>(..\Texts\Dutch.txt) All Letters
 +
<br/>Markov Probability: 5.327306011536052E-112
 +
<br/>Corrected Zeroes:    17
 +
<br/>HMMER Score:        -162.82319287449383
 +
<br/>
 +
<br/>(..\Texts\Swedish.txt) All Letters
 +
<br/>Markov Probability: 1.1823873333660746E-91
 +
<br/>Corrected Zeroes:    10
 +
<br/>HMMER Score:        -95.23440631711085
 +
<br/>
 +
<br/>(..\Texts\Vigenere - 1984.txt) All Letters
 +
<br/>Markov Probability: 1.669944098510842E-92
 +
<br/>Corrected Zeroes:    8
 +
<br/>HMMER Score:        -98.05823732223358
 +
<br/>
 +
<br/>(Outputs\Playfair.out) All Letters
 +
<br/>Markov Probability: 7.389612924665265E-86
 +
<br/>Corrected Zeroes:    6
 +
<br/>HMMER Score:        -75.98096976556741
 +
<br/>
 +
===Initial letters===
 +
<br/>(..\Texts\English.txt) Initial Letters
 +
<br/>Markov Probability: 1.0496288884966237E-55
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        24.18318171171002
 +
<br/>
 +
<br/>(..\Texts\French.txt) Initial Letters
 +
<br/>Markov Probability: 8.83846402382603E-70
 +
<br/>Corrected Zeroes:    3
 +
<br/>HMMER Score:        -22.571823368605646
 +
<br/>
 +
<br/>(..\Texts\German.txt) Initial Letters
 +
<br/>Markov Probability: 1.6757717490935368E-89
 +
<br/>Corrected Zeroes:    10
 +
<br/>HMMER Score:        -88.08742718870606
 +
<br/>
 +
<br/>(..\Texts\Spanish.txt) Initial Letters
 +
<br/>Markov Probability: 8.869612541473453E-71
 +
<br/>Corrected Zeroes:    3
 +
<br/>HMMER Score:        -25.888676055317255
 +
<br/>
 +
<br/>(..\Texts\Italian.txt) Initial Letters
 +
<br/>Markov Probability: 1.0949986922100798E-57
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        17.600375336401736
 +
<br/>
 +
<br/>(..\Texts\Portuguese.txt) Initial Letters
 +
<br/>Markov Probability: 9.986807259493189E-70
 +
<br/>Corrected Zeroes:    4
 +
<br/>HMMER Score:        -22.395595515749598
 +
<br/>
 +
<br/>(..\Texts\Dutch.txt) Initial Letters
 +
<br/>Markov Probability: 1.9749729570078271E-69
 +
<br/>Corrected Zeroes:    2
 +
<br/>HMMER Score:        -21.41185805018986
 +
<br/>
 +
<br/>(..\Texts\Swedish.txt) Initial Letters
 +
<br/>Markov Probability: 9.55816081723446E-69
 +
<br/>Corrected Zeroes:    4
 +
<br/>HMMER Score:        -19.13695790770939
 +
<br/>
  
German (Traumdeutung - Sigmund Freud)
 
<br/>Note: does not account for Eszett (sharp s) character
 
<br/>Markov Probability: 3.8662593620911806E-73
 
<br/>Corrected Zeroes:  1
 
  
English Initial Letters (1984 - George Orwell)
+
==References==
<br/>Markov Probability: 1.9187432339606176E-56
+
<References/>
<br/>Corrected Zeroes:  0
+
 
+
French Initial Letters (Les Orientales - Victor Hugo)
+
<br/>counting words like l'hopital as two words (''le'' followed by ''hopital''):
+
<br/>Markov Probability: 7.809561685705767E-61
+
<br/>Corrected Zeroes:  0
+
<br/>discounting the ''l''' (only consider the ''hopital'')
+
<br/>Markov Probability: 1.1841007473332175E-60
+
<br/>Corrected Zeroes:  0
+
 
+
 
+
German Initial Letters (Traumdeutung - Sigmund Freud)
+
<br/>Note: does not account for Eszett (sharp s) character. Though I don't think a word can ever start with this character
+
<br/>Markov Probability: 4.29592233581315E-64
+
<br/>Corrected Zeroes:  1
+
 
+
==Second Order Transition Probabilies==
+
 
+
English (1984 - George Orwell)
+
<br/>Markov Probability: 2.115089006082431E-43
+
<br/>Corrected Zeroes:  14
+
 
+
German (Traumdeutung - Sigmund Freud)
+
<br/>Note: does not account for Eszett (sharp s) character
+
<br/>Markov Probability: 3.79644909538402E-35
+
<br/>Corrected Zeroes:  21
+
 
+
French (Les Orientales - Victor Hugo)
+
<br/>Markov Probability: 4.429249667204738E-34
+
<br/>Corrected Zeroes:  18
+
 
+
Vigenere (English - 1984 - Orwell)
+
<br/>Markov Probability: 1.6699440985106574E-60
+
<br/>Corrected Zeroes:  8
+
 
+
English Initial Letters (1984 - George Orwell)
+
<br/>Markov Probability: 7.009981410871232E-53
+
<br/>Corrected Zeroes:  2
+
 
+
German Initial Letters (Traumdeutung - Sigmund Freud)
+
<br/>Note: does not account for Eszett (sharp s) character
+
<br/>Markov Probability: 2.908650572588623E-32
+
<br/>Corrected Zeroes:  17
+
 
+
Les Orientales - Victor Hugo.txt
+
<br/>Not counting ''l''' as a word (but counting the word contracted with it):
+
<br/>Markov Probability: 1.0762921500526206E-40
+
<br/>Corrected Zeroes:  12
+
<br/>Counting the ''l''' as one word and the other contracted word as another word:
+
<br/>Markov Probability: 2.970787716759867E-41
+
<br/>Corrected Zeroes:  12
+
  
 
==See also==
 
==See also==
 
*[[Cipher Cracking 2009]]
 
*[[Cipher Cracking 2009]]
 
*[[Markov models]]
 
*[[Markov models]]
 +
 +
==Back==
 +
*[https://myuni.adelaide.edu.au/webapps/portal/frameset.jsp Back to MyUni]
 +
*[http://www.eleceng.adelaide.edu.au/personal/dabbott Back to Derek Abbott's homepage]
 +
*[http://www.eleceng.adelaide.edu.au Back to EEE Department page]
 +
*[http://www.adelaide.edu.au Back to the University of Adelaide homepage]

Latest revision as of 16:48, 2 October 2009

The Somerton Man's code (without the extra line) is 44 characters long. So, if the text is purely random (1/26 chance of each letter appearing) then the probability of attaining this particular string of 44 is (1/26)^44 = 5.51027E-63. This is a good initial comparison.

For transitions that have p=0, corrections to p=0.0001 have been performed to attain a non-zero Markov probability.

HMMER score[1] is the log (base 2) of Markov probability / null probability (1/26^44)

First order

All letters


(..\Texts\English.txt) All Letters
Markov Probability: 4.196215910162246E-70
Corrected Zeroes: 0
HMMER Score: -23.646530132315654

(..\Texts\French.txt) All Letters
Markov Probability: 4.562440416695874E-74
Corrected Zeroes: 0
HMMER Score: -36.8135257053655

(..\Texts\German.txt) All Letters
Markov Probability: 1.6093650169064557E-82
Corrected Zeroes: 3
HMMER Score: -64.89226460456342

(..\Texts\Spanish.txt) All Letters
Markov Probability: 1.7633169297716054E-83
Corrected Zeroes: 1
HMMER Score: -68.08239247668342

(..\Texts\Italian.txt) All Letters
Markov Probability: 2.8938109466376915E-92
Corrected Zeroes: 9
HMMER Score: -97.26506645809364

(..\Texts\Portuguese.txt) All Letters
Markov Probability: 1.0172950778991843E-77
Corrected Zeroes: 1
HMMER Score: -48.94437749826932

(..\Texts\Dutch.txt) All Letters
Markov Probability: 2.139309827314818E-84
Corrected Zeroes: 1
HMMER Score: -71.12546693519374

(..\Texts\Swedish.txt) All Letters
Markov Probability: 4.7024882053115854E-77
Corrected Zeroes: 2
HMMER Score: -46.735691382905166

(..\Texts\Vigenere - 1984.txt) All Letters
Markov Probability: 1.646391769425068E-70
Corrected Zeroes: 0
HMMER Score: -24.99631136880728

(Outputs\Playfair.out) All Letters
Markov Probability: 5.213910076344393E-69
Corrected Zeroes: 0
HMMER Score: -20.01132524791302

Initial letters


(..\Texts\English.txt) Initial Letters
Markov Probability: 5.755746003335865E-56
Corrected Zeroes: 0
HMMER Score: 23.316377212971148

(..\Texts\French.txt) Initial Letters
Markov Probability: 1.960919656262944E-61
Corrected Zeroes: 0
HMMER Score: 5.153264236029422

(..\Texts\German.txt) Initial Letters
Markov Probability: 7.441017498436695E-73
Corrected Zeroes: 1
HMMER Score: -32.78590341697

(..\Texts\Spanish.txt) Initial Letters
Markov Probability: 3.204888821639082E-63
Corrected Zeroes: 0
HMMER Score: -0.7818480694202504

(..\Texts\Italian.txt) Initial Letters
Markov Probability: 6.089831612262369E-59
Corrected Zeroes: 0
HMMER Score: 13.431992337096512

(..\Texts\Portuguese.txt) Initial Letters
Markov Probability: 2.2658166834014361E-60
Corrected Zeroes: 0
HMMER Score: 8.683693049158332

(..\Texts\Dutch.txt) Initial Letters
Markov Probability: 6.9654365323502316E-68
Corrected Zeroes: 1
HMMER Score: -16.27154908302583

(..\Texts\Swedish.txt) Initial Letters
Markov Probability: 2.6806903224847996E-64
Corrected Zeroes: 2
HMMER Score: -4.361445908011734

Second order

All letters


(..\Texts\English.txt) All Letters
Markov Probability: 4.2148901763982914E-92
Corrected Zeroes: 9
HMMER Score: -96.72254209077907

(..\Texts\French.txt) All Letters
Markov Probability: 1.9814441750241465E-90
Corrected Zeroes: 9
HMMER Score: -91.16762862009702

(..\Texts\German.txt) All Letters
Markov Probability: 5.919358467581905E-105
Corrected Zeroes: 14
HMMER Score: -139.41766153806188

(..\Texts\Spanish.txt) All Letters
Markov Probability: 3.342953425806875E-98
Corrected Zeroes: 11
HMMER Score: -116.98848244535708

(..\Texts\Italian.txt) All Letters
Markov Probability: 6.083262400057097E-116
Corrected Zeroes: 21
HMMER Score: -175.9194661728711

(..\Texts\Portuguese.txt) All Letters
Markov Probability: 4.2738731323579313E-94
Corrected Zeroes: 13
HMMER Score: -103.34634923820269

(..\Texts\Dutch.txt) All Letters
Markov Probability: 5.327306011536052E-112
Corrected Zeroes: 17
HMMER Score: -162.82319287449383

(..\Texts\Swedish.txt) All Letters
Markov Probability: 1.1823873333660746E-91
Corrected Zeroes: 10
HMMER Score: -95.23440631711085

(..\Texts\Vigenere - 1984.txt) All Letters
Markov Probability: 1.669944098510842E-92
Corrected Zeroes: 8
HMMER Score: -98.05823732223358

(Outputs\Playfair.out) All Letters
Markov Probability: 7.389612924665265E-86
Corrected Zeroes: 6
HMMER Score: -75.98096976556741

Initial letters


(..\Texts\English.txt) Initial Letters
Markov Probability: 1.0496288884966237E-55
Corrected Zeroes: 0
HMMER Score: 24.18318171171002

(..\Texts\French.txt) Initial Letters
Markov Probability: 8.83846402382603E-70
Corrected Zeroes: 3
HMMER Score: -22.571823368605646

(..\Texts\German.txt) Initial Letters
Markov Probability: 1.6757717490935368E-89
Corrected Zeroes: 10
HMMER Score: -88.08742718870606

(..\Texts\Spanish.txt) Initial Letters
Markov Probability: 8.869612541473453E-71
Corrected Zeroes: 3
HMMER Score: -25.888676055317255

(..\Texts\Italian.txt) Initial Letters
Markov Probability: 1.0949986922100798E-57
Corrected Zeroes: 0
HMMER Score: 17.600375336401736

(..\Texts\Portuguese.txt) Initial Letters
Markov Probability: 9.986807259493189E-70
Corrected Zeroes: 4
HMMER Score: -22.395595515749598

(..\Texts\Dutch.txt) Initial Letters
Markov Probability: 1.9749729570078271E-69
Corrected Zeroes: 2
HMMER Score: -21.41185805018986

(..\Texts\Swedish.txt) Initial Letters
Markov Probability: 9.55816081723446E-69
Corrected Zeroes: 4
HMMER Score: -19.13695790770939


References

  1. ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf Page 43

See also

Back