Difference between revisions of "Transition probabilities from selected texts"
(Created page with '(Denley: Sorry about the CSV but I don't have the patience to format it into a table. Maybe sometime I'll get java to do it for me) Initial Letters from 1984 by Orwell ,A,B,C,D...') |
(More languages and wider variety of texts for existing languages) |
||
(19 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | ( | + | The Somerton Man's code (without the extra line) is 44 characters long. So, if the text is purely random (1/26 chance of each letter appearing) then the probability of attaining this particular string of 44 is (1/26)^44 = 5.51027E-63. This is a good initial comparison. |
− | + | For transitions that have p=0, corrections to p=0.0001 have been performed to attain a non-zero Markov probability. | |
− | + | HMMER score<ref>ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf Page 43</ref> is the log (base 2) of Markov probability / null probability (1/26^44) | |
− | + | ||
− | + | <!-- | |
− | + | This software output has been formatted in html for quick entry into the project wiki | |
− | + | --> | |
− | + | ==First order== | |
− | + | ===All letters=== | |
− | + | <br/>(..\Texts\English.txt) All Letters | |
− | + | <br/>Markov Probability: 4.196215910162246E-70 | |
− | + | <br/>Corrected Zeroes: 0 | |
− | + | <br/>HMMER Score: -23.646530132315654 | |
− | + | <br/> | |
− | + | <br/>(..\Texts\French.txt) All Letters | |
− | + | <br/>Markov Probability: 4.562440416695874E-74 | |
− | + | <br/>Corrected Zeroes: 0 | |
− | + | <br/>HMMER Score: -36.8135257053655 | |
− | + | <br/> | |
− | + | <br/>(..\Texts\German.txt) All Letters | |
− | + | <br/>Markov Probability: 1.6093650169064557E-82 | |
− | + | <br/>Corrected Zeroes: 3 | |
− | + | <br/>HMMER Score: -64.89226460456342 | |
− | + | <br/> | |
− | + | <br/>(..\Texts\Spanish.txt) All Letters | |
− | + | <br/>Markov Probability: 1.7633169297716054E-83 | |
− | + | <br/>Corrected Zeroes: 1 | |
− | + | <br/>HMMER Score: -68.08239247668342 | |
− | + | <br/> | |
+ | <br/>(..\Texts\Italian.txt) All Letters | ||
+ | <br/>Markov Probability: 2.8938109466376915E-92 | ||
+ | <br/>Corrected Zeroes: 9 | ||
+ | <br/>HMMER Score: -97.26506645809364 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Portuguese.txt) All Letters | ||
+ | <br/>Markov Probability: 1.0172950778991843E-77 | ||
+ | <br/>Corrected Zeroes: 1 | ||
+ | <br/>HMMER Score: -48.94437749826932 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Dutch.txt) All Letters | ||
+ | <br/>Markov Probability: 2.139309827314818E-84 | ||
+ | <br/>Corrected Zeroes: 1 | ||
+ | <br/>HMMER Score: -71.12546693519374 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Swedish.txt) All Letters | ||
+ | <br/>Markov Probability: 4.7024882053115854E-77 | ||
+ | <br/>Corrected Zeroes: 2 | ||
+ | <br/>HMMER Score: -46.735691382905166 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Vigenere - 1984.txt) All Letters | ||
+ | <br/>Markov Probability: 1.646391769425068E-70 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: -24.99631136880728 | ||
+ | <br/> | ||
+ | <br/>(Outputs\Playfair.out) All Letters | ||
+ | <br/>Markov Probability: 5.213910076344393E-69 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: -20.01132524791302 | ||
+ | <br/> | ||
+ | ===Initial letters=== | ||
+ | <br/>(..\Texts\English.txt) Initial Letters | ||
+ | <br/>Markov Probability: 5.755746003335865E-56 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: 23.316377212971148 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\French.txt) Initial Letters | ||
+ | <br/>Markov Probability: 1.960919656262944E-61 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: 5.153264236029422 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\German.txt) Initial Letters | ||
+ | <br/>Markov Probability: 7.441017498436695E-73 | ||
+ | <br/>Corrected Zeroes: 1 | ||
+ | <br/>HMMER Score: -32.78590341697 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Spanish.txt) Initial Letters | ||
+ | <br/>Markov Probability: 3.204888821639082E-63 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: -0.7818480694202504 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Italian.txt) Initial Letters | ||
+ | <br/>Markov Probability: 6.089831612262369E-59 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: 13.431992337096512 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Portuguese.txt) Initial Letters | ||
+ | <br/>Markov Probability: 2.2658166834014361E-60 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: 8.683693049158332 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Dutch.txt) Initial Letters | ||
+ | <br/>Markov Probability: 6.9654365323502316E-68 | ||
+ | <br/>Corrected Zeroes: 1 | ||
+ | <br/>HMMER Score: -16.27154908302583 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Swedish.txt) Initial Letters | ||
+ | <br/>Markov Probability: 2.6806903224847996E-64 | ||
+ | <br/>Corrected Zeroes: 2 | ||
+ | <br/>HMMER Score: -4.361445908011734 | ||
+ | <br/> | ||
+ | ==Second order== | ||
+ | ===All letters=== | ||
+ | <br/>(..\Texts\English.txt) All Letters | ||
+ | <br/>Markov Probability: 4.2148901763982914E-92 | ||
+ | <br/>Corrected Zeroes: 9 | ||
+ | <br/>HMMER Score: -96.72254209077907 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\French.txt) All Letters | ||
+ | <br/>Markov Probability: 1.9814441750241465E-90 | ||
+ | <br/>Corrected Zeroes: 9 | ||
+ | <br/>HMMER Score: -91.16762862009702 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\German.txt) All Letters | ||
+ | <br/>Markov Probability: 5.919358467581905E-105 | ||
+ | <br/>Corrected Zeroes: 14 | ||
+ | <br/>HMMER Score: -139.41766153806188 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Spanish.txt) All Letters | ||
+ | <br/>Markov Probability: 3.342953425806875E-98 | ||
+ | <br/>Corrected Zeroes: 11 | ||
+ | <br/>HMMER Score: -116.98848244535708 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Italian.txt) All Letters | ||
+ | <br/>Markov Probability: 6.083262400057097E-116 | ||
+ | <br/>Corrected Zeroes: 21 | ||
+ | <br/>HMMER Score: -175.9194661728711 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Portuguese.txt) All Letters | ||
+ | <br/>Markov Probability: 4.2738731323579313E-94 | ||
+ | <br/>Corrected Zeroes: 13 | ||
+ | <br/>HMMER Score: -103.34634923820269 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Dutch.txt) All Letters | ||
+ | <br/>Markov Probability: 5.327306011536052E-112 | ||
+ | <br/>Corrected Zeroes: 17 | ||
+ | <br/>HMMER Score: -162.82319287449383 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Swedish.txt) All Letters | ||
+ | <br/>Markov Probability: 1.1823873333660746E-91 | ||
+ | <br/>Corrected Zeroes: 10 | ||
+ | <br/>HMMER Score: -95.23440631711085 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Vigenere - 1984.txt) All Letters | ||
+ | <br/>Markov Probability: 1.669944098510842E-92 | ||
+ | <br/>Corrected Zeroes: 8 | ||
+ | <br/>HMMER Score: -98.05823732223358 | ||
+ | <br/> | ||
+ | <br/>(Outputs\Playfair.out) All Letters | ||
+ | <br/>Markov Probability: 7.389612924665265E-86 | ||
+ | <br/>Corrected Zeroes: 6 | ||
+ | <br/>HMMER Score: -75.98096976556741 | ||
+ | <br/> | ||
+ | ===Initial letters=== | ||
+ | <br/>(..\Texts\English.txt) Initial Letters | ||
+ | <br/>Markov Probability: 1.0496288884966237E-55 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: 24.18318171171002 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\French.txt) Initial Letters | ||
+ | <br/>Markov Probability: 8.83846402382603E-70 | ||
+ | <br/>Corrected Zeroes: 3 | ||
+ | <br/>HMMER Score: -22.571823368605646 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\German.txt) Initial Letters | ||
+ | <br/>Markov Probability: 1.6757717490935368E-89 | ||
+ | <br/>Corrected Zeroes: 10 | ||
+ | <br/>HMMER Score: -88.08742718870606 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Spanish.txt) Initial Letters | ||
+ | <br/>Markov Probability: 8.869612541473453E-71 | ||
+ | <br/>Corrected Zeroes: 3 | ||
+ | <br/>HMMER Score: -25.888676055317255 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Italian.txt) Initial Letters | ||
+ | <br/>Markov Probability: 1.0949986922100798E-57 | ||
+ | <br/>Corrected Zeroes: 0 | ||
+ | <br/>HMMER Score: 17.600375336401736 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Portuguese.txt) Initial Letters | ||
+ | <br/>Markov Probability: 9.986807259493189E-70 | ||
+ | <br/>Corrected Zeroes: 4 | ||
+ | <br/>HMMER Score: -22.395595515749598 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Dutch.txt) Initial Letters | ||
+ | <br/>Markov Probability: 1.9749729570078271E-69 | ||
+ | <br/>Corrected Zeroes: 2 | ||
+ | <br/>HMMER Score: -21.41185805018986 | ||
+ | <br/> | ||
+ | <br/>(..\Texts\Swedish.txt) Initial Letters | ||
+ | <br/>Markov Probability: 9.55816081723446E-69 | ||
+ | <br/>Corrected Zeroes: 4 | ||
+ | <br/>HMMER Score: -19.13695790770939 | ||
+ | <br/> | ||
+ | |||
+ | |||
+ | ==References== | ||
+ | <References/> | ||
+ | |||
+ | ==See also== | ||
+ | *[[Cipher Cracking 2009]] | ||
+ | *[[Markov models]] | ||
+ | |||
+ | ==Back== | ||
+ | *[https://myuni.adelaide.edu.au/webapps/portal/frameset.jsp Back to MyUni] | ||
+ | *[http://www.eleceng.adelaide.edu.au/personal/dabbott Back to Derek Abbott's homepage] | ||
+ | *[http://www.eleceng.adelaide.edu.au Back to EEE Department page] | ||
+ | *[http://www.adelaide.edu.au Back to the University of Adelaide homepage] |
Latest revision as of 16:48, 2 October 2009
The Somerton Man's code (without the extra line) is 44 characters long. So, if the text is purely random (1/26 chance of each letter appearing) then the probability of attaining this particular string of 44 is (1/26)^44 = 5.51027E-63. This is a good initial comparison.
For transitions that have p=0, corrections to p=0.0001 have been performed to attain a non-zero Markov probability.
HMMER score[1] is the log (base 2) of Markov probability / null probability (1/26^44)
Contents
First order
All letters
(..\Texts\English.txt) All Letters
Markov Probability: 4.196215910162246E-70
Corrected Zeroes: 0
HMMER Score: -23.646530132315654
(..\Texts\French.txt) All Letters
Markov Probability: 4.562440416695874E-74
Corrected Zeroes: 0
HMMER Score: -36.8135257053655
(..\Texts\German.txt) All Letters
Markov Probability: 1.6093650169064557E-82
Corrected Zeroes: 3
HMMER Score: -64.89226460456342
(..\Texts\Spanish.txt) All Letters
Markov Probability: 1.7633169297716054E-83
Corrected Zeroes: 1
HMMER Score: -68.08239247668342
(..\Texts\Italian.txt) All Letters
Markov Probability: 2.8938109466376915E-92
Corrected Zeroes: 9
HMMER Score: -97.26506645809364
(..\Texts\Portuguese.txt) All Letters
Markov Probability: 1.0172950778991843E-77
Corrected Zeroes: 1
HMMER Score: -48.94437749826932
(..\Texts\Dutch.txt) All Letters
Markov Probability: 2.139309827314818E-84
Corrected Zeroes: 1
HMMER Score: -71.12546693519374
(..\Texts\Swedish.txt) All Letters
Markov Probability: 4.7024882053115854E-77
Corrected Zeroes: 2
HMMER Score: -46.735691382905166
(..\Texts\Vigenere - 1984.txt) All Letters
Markov Probability: 1.646391769425068E-70
Corrected Zeroes: 0
HMMER Score: -24.99631136880728
(Outputs\Playfair.out) All Letters
Markov Probability: 5.213910076344393E-69
Corrected Zeroes: 0
HMMER Score: -20.01132524791302
Initial letters
(..\Texts\English.txt) Initial Letters
Markov Probability: 5.755746003335865E-56
Corrected Zeroes: 0
HMMER Score: 23.316377212971148
(..\Texts\French.txt) Initial Letters
Markov Probability: 1.960919656262944E-61
Corrected Zeroes: 0
HMMER Score: 5.153264236029422
(..\Texts\German.txt) Initial Letters
Markov Probability: 7.441017498436695E-73
Corrected Zeroes: 1
HMMER Score: -32.78590341697
(..\Texts\Spanish.txt) Initial Letters
Markov Probability: 3.204888821639082E-63
Corrected Zeroes: 0
HMMER Score: -0.7818480694202504
(..\Texts\Italian.txt) Initial Letters
Markov Probability: 6.089831612262369E-59
Corrected Zeroes: 0
HMMER Score: 13.431992337096512
(..\Texts\Portuguese.txt) Initial Letters
Markov Probability: 2.2658166834014361E-60
Corrected Zeroes: 0
HMMER Score: 8.683693049158332
(..\Texts\Dutch.txt) Initial Letters
Markov Probability: 6.9654365323502316E-68
Corrected Zeroes: 1
HMMER Score: -16.27154908302583
(..\Texts\Swedish.txt) Initial Letters
Markov Probability: 2.6806903224847996E-64
Corrected Zeroes: 2
HMMER Score: -4.361445908011734
Second order
All letters
(..\Texts\English.txt) All Letters
Markov Probability: 4.2148901763982914E-92
Corrected Zeroes: 9
HMMER Score: -96.72254209077907
(..\Texts\French.txt) All Letters
Markov Probability: 1.9814441750241465E-90
Corrected Zeroes: 9
HMMER Score: -91.16762862009702
(..\Texts\German.txt) All Letters
Markov Probability: 5.919358467581905E-105
Corrected Zeroes: 14
HMMER Score: -139.41766153806188
(..\Texts\Spanish.txt) All Letters
Markov Probability: 3.342953425806875E-98
Corrected Zeroes: 11
HMMER Score: -116.98848244535708
(..\Texts\Italian.txt) All Letters
Markov Probability: 6.083262400057097E-116
Corrected Zeroes: 21
HMMER Score: -175.9194661728711
(..\Texts\Portuguese.txt) All Letters
Markov Probability: 4.2738731323579313E-94
Corrected Zeroes: 13
HMMER Score: -103.34634923820269
(..\Texts\Dutch.txt) All Letters
Markov Probability: 5.327306011536052E-112
Corrected Zeroes: 17
HMMER Score: -162.82319287449383
(..\Texts\Swedish.txt) All Letters
Markov Probability: 1.1823873333660746E-91
Corrected Zeroes: 10
HMMER Score: -95.23440631711085
(..\Texts\Vigenere - 1984.txt) All Letters
Markov Probability: 1.669944098510842E-92
Corrected Zeroes: 8
HMMER Score: -98.05823732223358
(Outputs\Playfair.out) All Letters
Markov Probability: 7.389612924665265E-86
Corrected Zeroes: 6
HMMER Score: -75.98096976556741
Initial letters
(..\Texts\English.txt) Initial Letters
Markov Probability: 1.0496288884966237E-55
Corrected Zeroes: 0
HMMER Score: 24.18318171171002
(..\Texts\French.txt) Initial Letters
Markov Probability: 8.83846402382603E-70
Corrected Zeroes: 3
HMMER Score: -22.571823368605646
(..\Texts\German.txt) Initial Letters
Markov Probability: 1.6757717490935368E-89
Corrected Zeroes: 10
HMMER Score: -88.08742718870606
(..\Texts\Spanish.txt) Initial Letters
Markov Probability: 8.869612541473453E-71
Corrected Zeroes: 3
HMMER Score: -25.888676055317255
(..\Texts\Italian.txt) Initial Letters
Markov Probability: 1.0949986922100798E-57
Corrected Zeroes: 0
HMMER Score: 17.600375336401736
(..\Texts\Portuguese.txt) Initial Letters
Markov Probability: 9.986807259493189E-70
Corrected Zeroes: 4
HMMER Score: -22.395595515749598
(..\Texts\Dutch.txt) Initial Letters
Markov Probability: 1.9749729570078271E-69
Corrected Zeroes: 2
HMMER Score: -21.41185805018986
(..\Texts\Swedish.txt) Initial Letters
Markov Probability: 9.55816081723446E-69
Corrected Zeroes: 4
HMMER Score: -19.13695790770939