"Eugene Onegin" Example.

Now, if we move on from samples in hundreds to single samples, we ascertain that the number

differs strongly from

0.432 * 0.568 = 0.245376 :

the coefficient of dispersion (we deviate here slightly from usual terminology, whereby we should have taken the square root of the number that we call the coefficient of dispersion) is

that is, approximately 1/5 , which is explained well by the connectedness of our samples.
To clarify this connectedness, although not entirely, it will help us to calculate the above-mentioned probabilities p₁ and p₀ approximately. We take the entire text of 20,000 letters, count the number of sequences

vowel, vowel,

and obtain the number 1104; after dividing it by the total number of vowels in the text, we get the following approximate quantity for p₁:

In the same manner, we could find an approximate value for q₀ by counting the number of sequences

consonant, consonant

and dividing it by 11,362, then p₀ = 1 - q₀ . However, we can also substitute the tiring direct count with the following. If we subtract 1104 from 8638, we obtain the number of consonants

7534,

which follow a vowel, and as all consonants apart from the first one must follow either a vowel or a consonant, the number of sequences

consonant, consonant

is determined by the difference

11,361 - 7534 = 3827.

Therefore, we get the following approximate quantity for p₀

<< previous page | next page >>