Markov models: Difference between revisions

From Derek
Jump to navigation Jump to search
Mattjb (talk | contribs)
mNo edit summary
Mattjb (talk | contribs)
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 34: Line 34:
compute the sum of the log(probabilities), then raise back as a   
compute the sum of the log(probabilities), then raise back as a   
power of the base of the logarithm, otherwise you'll get too many   
power of the base of the logarithm, otherwise you'll get too many   
numerical errors. You also need to define <math>log 0 = 0</math>. (i.e. in the 
numerical errors. If the probability is zero, then treat it as -9999 (in the log transformed version).
coding logic, if the probability is zero, then add zero, otherwise 
then compute the log and add that)
The first term is why you need the single letter frequencies as well.
The first term is why you need the single letter frequencies as well.


Line 46: Line 44:
You can extend this to higher order Markov models, eg. for a 2nd   
You can extend this to higher order Markov models, eg. for a 2nd   
order one you estimate <math>P(b|bc)</math> from counts of bcb, etc.
order one you estimate <math>P(b|bc)</math> from counts of bcb, etc.
== See also ==
[http://en.wikipedia.org/wiki/Markov_chain Markov chain (Wikipedia)]

Latest revision as of 14:33, 10 June 2009

In a first order Markov model, you compute the letter transition probabilities, P(a|b), etc. from the frequencies, so for example for every pair of letters in abbcab the counts are

  • C(ab)=2
  • C(bb)=1
  • C(bc)=1
  • C(ca)=1
  • C(a)=2
  • C(b)=3
  • C(c)=1

Then compute the probabilities:

  • P(b|a)=2/(2+1+1+1)
  • P(b|b)=1/(2+1+1+1)
  • P(b|c)=1/(2+1+1+1)
  • P(c|a)=1/(2+1+1+1)
  • P(anyothercombinations)=0
  • P(a)=2/(2+3+1)
  • P(b)=3/(2+3+1)
  • P(c)=1/(2+3+1)
  • P(d)..P(z)=0

Then for some sequence of letters x1,,xn (the code) The likelihood that that code came from a language (or initial letters of language) that our sequence abbcab came from is P(x1)P(x2|x1)P(xn|xn1)= a product of probabilities, assuming independence (since you are treating it as a first Note that you should first take logs (using Java doubles), then compute the sum of the log(probabilities), then raise back as a power of the base of the logarithm, otherwise you'll get too many numerical errors. If the probability is zero, then treat it as -9999 (in the log transformed version). The first term is why you need the single letter frequencies as well.

Doing this for the case where you have fifty samples of a text gives you a confidence level in your final p-value, but it makes the estimates of the probabilities wrong. 50 may be too many for short texts.

You can extend this to higher order Markov models, eg. for a 2nd order one you estimate P(b|bc) from counts of bcb, etc.

See also[edit]

Markov chain (Wikipedia)