Difference between revisions of "Transitions within words"
(dumped count and wrote description) |
(added description for calculating probability) |
||
Line 1: | Line 1: | ||
The following table is a raw count of transitions (from 1984 by George Orwell) within a single word. ie. in the phrase "hi there" the pairs counted are 'HI' 'TH' 'HE' 'ER' 'RE' ('IT' is NOT counted as it is not within a single word). | The following table is a raw count of transitions (from 1984 by George Orwell) within a single word. ie. in the phrase "hi there" the pairs counted are 'HI' 'TH' 'HE' 'ER' 'RE' ('IT' is NOT counted as it is not within a single word). | ||
− | A transition from 'O' to 'D' is counted in row 'O' and column 'D'. | + | A transition from 'O' to 'D' is counted in row 'O' and column 'D'. |
+ | |||
+ | To find the probability of transition to letter 'D' given letter 'T' (ie. the pair 'TD' within a single word) we would find the value in row 'T' column 'D' (2 in this case) and divide it by the sum of the elements in the row 'T' (1405 + 12 + 147 + etcetera) | ||
+ | |||
+ | However, to find the probability of a certain pair appearing in the text (as opposed to any other pair possibility) then we (once again) find the value in row 'T' column 'D' (for the pair 'TD') and then divide by the sum total of all of the elements in the table (not just in the same row as above). | ||
Latest revision as of 15:56, 17 August 2009
The following table is a raw count of transitions (from 1984 by George Orwell) within a single word. ie. in the phrase "hi there" the pairs counted are 'HI' 'TH' 'HE' 'ER' 'RE' ('IT' is NOT counted as it is not within a single word).
A transition from 'O' to 'D' is counted in row 'O' and column 'D'.
To find the probability of transition to letter 'D' given letter 'T' (ie. the pair 'TD' within a single word) we would find the value in row 'T' column 'D' (2 in this case) and divide it by the sum of the elements in the row 'T' (1405 + 12 + 147 + etcetera)
However, to find the probability of a certain pair appearing in the text (as opposed to any other pair possibility) then we (once again) find the value in row 'T' column 'D' (for the pair 'TD') and then divide by the sum total of all of the elements in the table (not just in the same row as above).
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 10 | 898 | 1516 | 2186 | 17 | 224 | 837 | 21 | 1538 | 11 | 551 | 3120 | 741 | 6006 | 9 | 867 | 3 | 3654 | 4416 | 4633 | 311 | 700 | 271 | 20 | 927 | 53 |
B | 463 | 97 | 0 | 3 | 2606 | 0 | 0 | 0 | 288 | 53 | 0 | 1051 | 12 | 1 | 831 | 0 | 0 | 635 | 95 | 33 | 850 | 26 | 0 | 0 | 471 | 0 |
C | 1349 | 0 | 190 | 2 | 2091 | 0 | 0 | 1733 | 554 | 0 | 728 | 381 | 0 | 0 | 2260 | 0 | 6 | 615 | 22 | 995 | 364 | 0 | 0 | 0 | 60 | 0 |
D | 470 | 7 | 5 | 226 | 2115 | 13 | 76 | 17 | 1366 | 14 | 0 | 212 | 23 | 62 | 990 | 1 | 0 | 321 | 596 | 17 | 250 | 25 | 13 | 0 | 256 | 0 |
E | 2881 | 66 | 1219 | 4903 | 2134 | 504 | 219 | 111 | 556 | 5 | 80 | 1955 | 1434 | 5171 | 221 | 532 | 77 | 7575 | 3640 | 1522 | 104 | 1341 | 529 | 626 | 896 | 18 |
F | 672 | 3 | 3 | 0 | 889 | 463 | 0 | 0 | 769 | 0 | 0 | 298 | 0 | 1 | 1438 | 0 | 0 | 726 | 19 | 295 | 331 | 0 | 0 | 0 | 25 | 0 |
G | 596 | 0 | 0 | 0 | 1178 | 0 | 111 | 1283 | 470 | 0 | 0 | 250 | 22 | 123 | 484 | 0 | 0 | 482 | 232 | 45 | 286 | 0 | 0 | 0 | 37 | 0 |
H | 5077 | 7 | 0 | 0 | 13695 | 17 | 0 | 0 | 4065 | 0 | 0 | 23 | 17 | 24 | 1879 | 2 | 1 | 319 | 49 | 866 | 286 | 0 | 6 | 0 | 177 | 0 |
I | 664 | 298 | 1830 | 1357 | 1112 | 714 | 913 | 8 | 8 | 0 | 289 | 1322 | 1664 | 9708 | 1398 | 242 | 18 | 995 | 3371 | 4648 | 22 | 627 | 0 | 82 | 0 | 179 |
J | 22 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 84 | 0 | 0 | 0 | 0 | 0 | 250 | 0 | 0 | 0 | 0 | 0 |
K | 40 | 15 | 2 | 1 | 1109 | 4 | 4 | 2 | 537 | 0 | 0 | 58 | 2 | 424 | 2 | 3 | 0 | 0 | 133 | 0 | 15 | 0 | 34 | 0 | 26 | 0 |
L | 1317 | 13 | 24 | 1524 | 3168 | 347 | 7 | 0 | 2261 | 0 | 118 | 2430 | 166 | 6 | 1322 | 64 | 1 | 74 | 423 | 372 | 321 | 91 | 154 | 0 | 1965 | 0 |
M | 1406 | 362 | 1 | 0 | 3192 | 14 | 0 | 0 | 1029 | 0 | 0 | 8 | 206 | 28 | 1237 | 638 | 0 | 69 | 283 | 5 | 363 | 0 | 1 | 0 | 155 | 0 |
N | 552 | 19 | 967 | 4295 | 2957 | 138 | 4333 | 19 | 993 | 12 | 312 | 354 | 34 | 220 | 2350 | 14 | 44 | 16 | 1770 | 3004 | 236 | 119 | 25 | 7 | 450 | 5 |
O | 144 | 430 | 505 | 587 | 76 | 3716 | 231 | 18 | 342 | 4 | 427 | 1180 | 1990 | 5162 | 1155 | 639 | 0 | 3831 | 965 | 2031 | 5134 | 626 | 1592 | 47 | 93 | 21 |
P | 1228 | 0 | 0 | 0 | 1682 | 2 | 0 | 177 | 461 | 0 | 0 | 785 | 8 | 17 | 991 | 588 | 0 | 957 | 211 | 340 | 309 | 0 | 3 | 0 | 48 | 0 |
Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 409 | 0 | 0 | 0 | 0 | 0 |
R | 1596 | 45 | 181 | 736 | 6480 | 99 | 225 | 146 | 2109 | 0 | 286 | 347 | 397 | 421 | 2298 | 93 | 0 | 476 | 1316 | 1351 | 537 | 103 | 74 | 1 | 1067 | 0 |
S | 1131 | 8 | 469 | 8 | 2703 | 41 | 15 | 1332 | 1633 | 0 | 113 | 342 | 259 | 98 | 1380 | 657 | 38 | 3 | 1386 | 4134 | 750 | 0 | 150 | 0 | 148 | 0 |
T | 1405 | 12 | 147 | 2 | 3670 | 20 | 1 | 14399 | 2808 | 0 | 0 | 520 | 53 | 19 | 4163 | 11 | 0 | 1354 | 901 | 737 | 694 | 0 | 348 | 0 | 917 | 0 |
U | 415 | 267 | 455 | 196 | 390 | 65 | 815 | 0 | 261 | 0 | 0 | 1647 | 359 | 1290 | 44 | 600 | 0 | 1752 | 1389 | 2037 | 0 | 15 | 0 | 8 | 12 | 6 |
V | 299 | 0 | 0 | 0 | 3138 | 0 | 0 | 0 | 534 | 0 | 0 | 0 | 0 | 1 | 291 | 0 | 0 | 3 | 0 | 0 | 4 | 0 | 0 | 0 | 35 | 0 |
W | 3469 | 0 | 0 | 39 | 1650 | 2 | 0 | 1947 | 1995 | 0 | 6 | 57 | 0 | 451 | 1242 | 0 | 0 | 175 | 169 | 2 | 2 | 0 | 0 | 0 | 6 | 0 |
X | 81 | 0 | 106 | 0 | 55 | 1 | 0 | 10 | 147 | 0 | 0 | 1 | 0 | 0 | 1 | 113 | 1 | 0 | 0 | 136 | 22 | 0 | 3 | 0 | 16 | 0 |
Y | 28 | 20 | 8 | 14 | 536 | 3 | 1 | 5 | 165 | 1 | 0 | 33 | 72 | 10 | 1346 | 22 | 0 | 34 | 374 | 144 | 0 | 5 | 31 | 0 | 0 | 0 |
Z | 36 | 0 | 0 | 0 | 177 | 0 | 0 | 0 | 40 | 0 | 0 | 9 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 5 | 15 |