Difference between revisions of "Markov models"

Latest revision as of 15:33, 10 June 2009

In a first order Markov model, you compute the letter transition probabilities, [math]P(a | b)[/math], etc. from the frequencies, so for example for every pair of letters in abbcab the counts are

[math]C(ab) = 2[/math]
[math]C(bb) = 1[/math]
[math]C(bc) = 1[/math]
[math]C(ca) = 1[/math]

[math]C(a) = 2[/math]
[math]C(b) = 3[/math]
[math]C(c) = 1[/math]

Then compute the probabilities:

[math]P(b|a) = 2 / (2+1+1+1)[/math]
[math]P(b|b) = 1 / (2+1+1+1)[/math]
[math]P(b|c) = 1 / (2+1+1+1)[/math]
[math]P(c|a) = 1 / (2+1+1+1)[/math]
[math]P(\textrm{any other combinations}) = 0[/math]

[math]P(a) = 2/(2+3+1)[/math]
[math]P(b) = 3/(2+3+1)[/math]
[math]P(c) = 1/(2+3+1)[/math]
[math]P(d)..P(z) = 0[/math]

Then for some sequence of letters [math]x_1,\ldots ,x_n[/math] (the code) The likelihood that that code came from a language (or initial letters of language) that our sequence abbcab came from is [math]P(x_1)P(x_2 | x_1)\ldots P(x_n | x_{n-1} ) =[/math] a product of probabilities, assuming independence (since you are treating it as a first Note that you should first take logs (using Java doubles), then compute the sum of the log(probabilities), then raise back as a power of the base of the logarithm, otherwise you'll get too many numerical errors. If the probability is zero, then treat it as -9999 (in the log transformed version). The first term is why you need the single letter frequencies as well.

Doing this for the case where you have fifty samples of a text gives you a confidence level in your final p-value, but it makes the estimates of the probabilities wrong. 50 may be too many for short texts.

You can extend this to higher order Markov models, eg. for a 2nd order one you estimate [math]P(b|bc)[/math] from counts of bcb, etc.

@@ Line 19: / Line 19: @@
 * <math>P(b|c) = 1 / (2+1+1+1)</math>
 * <math>P(c|a) = 1 / (2+1+1+1)</math>
-* <math>P(\mathrm{any other combinations}) = 0</math>
+* <math>P(\textrm{any other combinations}) = 0</math>
-P(a) = 2/(2+3+1)
+* <math>P(a) = 2/(2+3+1)</math>
-P(b) = 3/(2+3+1)
+* <math>P(b) = 3/(2+3+1)</math>
-P(c) = 1/(2+3+1)
+* <math>P(c) = 1/(2+3+1)</math>
-P(d)..P(z) = 0
+* <math>P(d)..P(z) = 0</math>
 Then for some sequence of letters <math>x_1,\ldots ,x_n</math> (the code)
@@ Line 34: / Line 34: @@
 compute the sum of the log(probabilities), then raise back as a
 power of the base of the logarithm, otherwise you'll get too many
-numerical errors. You also need to define <math>log 0 = 0</math>. (i.e. in the
+numerical errors. If the probability is zero, then treat it as -9999 (in the log transformed version).
-coding logic, if the probability is zero, then add zero, otherwise
-then compute the log and add that)
 The first term is why you need the single letter frequencies as well.
@@ Line 45: / Line 43: @@
 You can extend this to higher order Markov models, eg. for a 2nd
-order one you estimate
+order one you estimate <math>P(b|bc)</math> from counts of bcb, etc.
-P(b|bc) from counts of bcb, etc.
+== See also ==
+[http://en.wikipedia.org/wiki/Markov_chain Markov chain (Wikipedia)]

Difference between revisions of "Markov models"

Latest revision as of 15:33, 10 June 2009

See also

Navigation menu

Views

Personal tools

Navigation

Search

Tools