Difference between revisions of "Transition probabilities from selected texts"

From Derek
Jump to: navigation, search
(New data using new punctuation filter)
(Updated texts. More languages)
Line 10: Line 10:
 
==First order==
 
==First order==
 
===All letters===
 
===All letters===
<br/>(..\Texts\1984 - George Orwell.txt) All Letters
+
<br/>(..\Texts\English.txt) All Letters
<br/>Markov Probability: 1.4822672916815308E-71
+
<br/>Markov Probability: 4.196215910162246E-70
<br/>Corrected Zeroes:    1
+
<br/>Corrected Zeroes:    0
<br/>HMMER Score:        -28.46974151192516
+
<br/>HMMER Score:        -23.646530132315654
 +
<br/>
 +
<br/>(..\Texts\French.txt) All Letters
 +
<br/>Markov Probability: 4.562440416695874E-74
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        -36.8135257053655
 
<br/>
 
<br/>
<br/>(..\Texts\Les Orientales - Victor Hugo.txt) All Letters
+
<br/>(..\Texts\German.txt) All Letters
<br/>Markov Probability: 7.955726018472886E-79
+
<br/>Markov Probability: 1.6093650169064557E-82
<br/>Corrected Zeroes:    2
+
<br/>Corrected Zeroes:    3
<br/>HMMER Score:        -52.620978304803444
+
<br/>HMMER Score:        -64.89226460456342
 
<br/>
 
<br/>
<br/>(..\Texts\Traumdeutung - Sigmund Freud.txt) All Letters
+
<br/>(..\Texts\Spanish.txt) All Letters
<br/>Markov Probability: 3.749298888974187E-77
+
<br/>Markov Probability: 2.0705524720748866E-82
 
<br/>Corrected Zeroes:    1
 
<br/>Corrected Zeroes:    1
<br/>HMMER Score:        -47.062494868234964
+
<br/>HMMER Score:        -64.52874041851341
 +
<br/>
 +
<br/>(..\Texts\Italian.txt) All Letters
 +
<br/>Markov Probability: 2.8938109466376915E-92
 +
<br/>Corrected Zeroes:    9
 +
<br/>HMMER Score:        -97.26506645809364
 
<br/>
 
<br/>
 
<br/>(..\Texts\Vigenere - 1984.txt) All Letters
 
<br/>(..\Texts\Vigenere - 1984.txt) All Letters
Line 31: Line 41:
 
<br/>
 
<br/>
 
===Initial letters===
 
===Initial letters===
<br/>(..\Texts\1984 - George Orwell.txt) Initial Letters
+
<br/>(..\Texts\English.txt) Initial Letters
<br/>Markov Probability: 2.0136596296001355E-56
+
<br/>Markov Probability: 5.755746003335865E-56
 
<br/>Corrected Zeroes:    0
 
<br/>Corrected Zeroes:    0
<br/>HMMER Score:        21.80119412864152
+
<br/>HMMER Score:        23.316377212971148
 
<br/>
 
<br/>
<br/>(..\Texts\Les Orientales - Victor Hugo.txt) Initial Letters
+
<br/>(..\Texts\French.txt) Initial Letters
<br/>Markov Probability: 3.3267604806714393E-60
+
<br/>Markov Probability: 1.960919656262944E-61
 
<br/>Corrected Zeroes:    0
 
<br/>Corrected Zeroes:    0
<br/>HMMER Score:        9.237779904103608
+
<br/>HMMER Score:        5.153264236029422
 
<br/>
 
<br/>
<br/>(..\Texts\Traumdeutung - Sigmund Freud.txt) Initial Letters
+
<br/>(..\Texts\German.txt) Initial Letters
<br/>Markov Probability: 3.820168061668581E-68
+
<br/>Markov Probability: 7.441017498436695E-73
 
<br/>Corrected Zeroes:    1
 
<br/>Corrected Zeroes:    1
<br/>HMMER Score:        -17.138126745609156
+
<br/>HMMER Score:        -32.78590341697
 +
<br/>
 +
<br/>(..\Texts\Spanish.txt) Initial Letters
 +
<br/>Markov Probability: 8.969553991898917E-65
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        -5.940942320075476
 +
<br/>
 +
<br/>(..\Texts\Italian.txt) Initial Letters
 +
<br/>Markov Probability: 6.089831612262369E-59
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        13.431992337096512
 
<br/>
 
<br/>
 
==Second order==
 
==Second order==
 
===All letters===
 
===All letters===
<br/>(..\Texts\1984 - George Orwell.txt) All Letters
+
<br/>(..\Texts\English.txt) All Letters
<br/>Markov Probability: 3.9262648017739784E-100
+
<br/>Markov Probability: 4.2148901763982914E-92
<br/>Corrected Zeroes:    15
+
<br/>Corrected Zeroes:    9
<br/>HMMER Score:        -123.40030441377017
+
<br/>HMMER Score:        -96.72254209077907
 
<br/>
 
<br/>
<br/>(..\Texts\Les Orientales - Victor Hugo.txt) All Letters
+
<br/>(..\Texts\French.txt) All Letters
<br/>Markov Probability: 2.1087630055723357E-106
+
<br/>Markov Probability: 1.9814441750241465E-90
<br/>Corrected Zeroes:    18
+
<br/>Corrected Zeroes:    9
<br/>HMMER Score:        -144.22863349364334
+
<br/>HMMER Score:        -91.16762862009702
 +
<br/>
 +
<br/>(..\Texts\German.txt) All Letters
 +
<br/>Markov Probability: 5.919358467581905E-105
 +
<br/>Corrected Zeroes:    14
 +
<br/>HMMER Score:        -139.41766153806188
 +
<br/>
 +
<br/>(..\Texts\Spanish.txt) All Letters
 +
<br/>Markov Probability: 2.4983480586257613E-101
 +
<br/>Corrected Zeroes:    12
 +
<br/>HMMER Score:        -127.37441550467722
 
<br/>
 
<br/>
<br/>(..\Texts\Traumdeutung - Sigmund Freud.txt) All Letters
+
<br/>(..\Texts\Italian.txt) All Letters
<br/>Markov Probability: 3.731464295941246E-119
+
<br/>Markov Probability: 6.083262400057097E-116
 
<br/>Corrected Zeroes:    21
 
<br/>Corrected Zeroes:    21
<br/>HMMER Score:        -186.5903538114491
+
<br/>HMMER Score:        -175.9194661728711
 
<br/>
 
<br/>
 
<br/>(..\Texts\Vigenere - 1984.txt) All Letters
 
<br/>(..\Texts\Vigenere - 1984.txt) All Letters
Line 69: Line 99:
 
<br/>
 
<br/>
 
===Initial letters===
 
===Initial letters===
<br/>(..\Texts\1984 - George Orwell.txt) Initial Letters
+
<br/>(..\Texts\English.txt) Initial Letters
<br/>Markov Probability: 7.555198589304339E-61
+
<br/>Markov Probability: 1.0496288884966237E-55
<br/>Corrected Zeroes:    2
+
<br/>Corrected Zeroes:    0
<br/>HMMER Score:        7.0992034873802545
+
<br/>HMMER Score:        24.18318171171002
 
<br/>
 
<br/>
<br/>(..\Texts\Les Orientales - Victor Hugo.txt) Initial Letters
+
<br/>(..\Texts\French.txt) Initial Letters
<br/>Markov Probability: 1.0973476039668194E-80
+
<br/>Markov Probability: 8.83846402382603E-70
<br/>Corrected Zeroes:    9
+
<br/>Corrected Zeroes:    3
<br/>HMMER Score:        -58.80087939586076
+
<br/>HMMER Score:        -22.571823368605646
 
<br/>
 
<br/>
<br/>(..\Texts\Traumdeutung - Sigmund Freud.txt) Initial Letters
+
<br/>(..\Texts\German.txt) Initial Letters
<br/>Markov Probability: 1.457883499720296E-103
+
<br/>Markov Probability: 1.6757717490935368E-89
<br/>Corrected Zeroes:    18
+
<br/>Corrected Zeroes:    10
<br/>HMMER Score:        -134.7953707374809
+
<br/>HMMER Score:        -88.08742718870606
 +
<br/>
 +
<br/>(..\Texts\Spanish.txt) Initial Letters
 +
<br/>Markov Probability: 3.8251249716620213E-81
 +
<br/>Corrected Zeroes:    7
 +
<br/>HMMER Score:        -60.32132120442464
 +
<br/>
 +
<br/>(..\Texts\Italian.txt) Initial Letters
 +
<br/>Markov Probability: 1.0949986922100798E-57
 +
<br/>Corrected Zeroes:    0
 +
<br/>HMMER Score:        17.600375336401736
 
<br/>
 
<br/>
  

Revision as of 01:53, 29 September 2009

The Somerton Man's code (without the extra line) is 44 characters long. So, if the text is purely random (1/26 chance of each letter appearing) then the probability of attaining this particular string of 44 is (1/26)^44 = 5.51027E-63. This is a good initial comparison.

For transitions that have p=0, corrections to p=0.0001 have been performed to attain a non-zero Markov probability.

HMMER score[1] is the log (base 2) of Markov probability / null probability (1/26^44)

First order

All letters


(..\Texts\English.txt) All Letters
Markov Probability: 4.196215910162246E-70
Corrected Zeroes: 0
HMMER Score: -23.646530132315654

(..\Texts\French.txt) All Letters
Markov Probability: 4.562440416695874E-74
Corrected Zeroes: 0
HMMER Score: -36.8135257053655

(..\Texts\German.txt) All Letters
Markov Probability: 1.6093650169064557E-82
Corrected Zeroes: 3
HMMER Score: -64.89226460456342

(..\Texts\Spanish.txt) All Letters
Markov Probability: 2.0705524720748866E-82
Corrected Zeroes: 1
HMMER Score: -64.52874041851341

(..\Texts\Italian.txt) All Letters
Markov Probability: 2.8938109466376915E-92
Corrected Zeroes: 9
HMMER Score: -97.26506645809364

(..\Texts\Vigenere - 1984.txt) All Letters
Markov Probability: 1.646391769425068E-70
Corrected Zeroes: 0
HMMER Score: -24.99631136880728

Initial letters


(..\Texts\English.txt) Initial Letters
Markov Probability: 5.755746003335865E-56
Corrected Zeroes: 0
HMMER Score: 23.316377212971148

(..\Texts\French.txt) Initial Letters
Markov Probability: 1.960919656262944E-61
Corrected Zeroes: 0
HMMER Score: 5.153264236029422

(..\Texts\German.txt) Initial Letters
Markov Probability: 7.441017498436695E-73
Corrected Zeroes: 1
HMMER Score: -32.78590341697

(..\Texts\Spanish.txt) Initial Letters
Markov Probability: 8.969553991898917E-65
Corrected Zeroes: 0
HMMER Score: -5.940942320075476

(..\Texts\Italian.txt) Initial Letters
Markov Probability: 6.089831612262369E-59
Corrected Zeroes: 0
HMMER Score: 13.431992337096512

Second order

All letters


(..\Texts\English.txt) All Letters
Markov Probability: 4.2148901763982914E-92
Corrected Zeroes: 9
HMMER Score: -96.72254209077907

(..\Texts\French.txt) All Letters
Markov Probability: 1.9814441750241465E-90
Corrected Zeroes: 9
HMMER Score: -91.16762862009702

(..\Texts\German.txt) All Letters
Markov Probability: 5.919358467581905E-105
Corrected Zeroes: 14
HMMER Score: -139.41766153806188

(..\Texts\Spanish.txt) All Letters
Markov Probability: 2.4983480586257613E-101
Corrected Zeroes: 12
HMMER Score: -127.37441550467722

(..\Texts\Italian.txt) All Letters
Markov Probability: 6.083262400057097E-116
Corrected Zeroes: 21
HMMER Score: -175.9194661728711

(..\Texts\Vigenere - 1984.txt) All Letters
Markov Probability: 1.669944098510842E-92
Corrected Zeroes: 8
HMMER Score: -98.05823732223358

Initial letters


(..\Texts\English.txt) Initial Letters
Markov Probability: 1.0496288884966237E-55
Corrected Zeroes: 0
HMMER Score: 24.18318171171002

(..\Texts\French.txt) Initial Letters
Markov Probability: 8.83846402382603E-70
Corrected Zeroes: 3
HMMER Score: -22.571823368605646

(..\Texts\German.txt) Initial Letters
Markov Probability: 1.6757717490935368E-89
Corrected Zeroes: 10
HMMER Score: -88.08742718870606

(..\Texts\Spanish.txt) Initial Letters
Markov Probability: 3.8251249716620213E-81
Corrected Zeroes: 7
HMMER Score: -60.32132120442464

(..\Texts\Italian.txt) Initial Letters
Markov Probability: 1.0949986922100798E-57
Corrected Zeroes: 0
HMMER Score: 17.600375336401736


References

  1. ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf Page 43

See also

Back