Main article.

An Example of Statistical Investigation of the
Text "Eugene Onegin" Concerning the Connection of Samples in Chains.



This study investigates a text excerpt containing 20,000 Russian letters of the alphabet, excluding hard and soft sign from Pushkin's novel Eugene Onegin - the entire first chapter andsixteen stanzas of the second.

This sequence provides us with 20,000 connected trials, which are either a vowel or a consonant.Accordingly, we assume the existence of an unknown constant probability p that the observed letter is a vowel. We determine the approximate value of p by observation,by counting all the vowels and consonants. Apart from p, we shall find - also through observation - the approximate values of two numbers p1 and p0, and four numbers p,1 , p1,0 , p0,1 , and p0,0 . They represent the following probabilities: p1 - a vowel follows another vowel; p0 - a vowel follows a consonant; p1,1 - a vowel follows two vowels; p1,0 - a vowel follows a consonant that is preceded by a vowel; p0,1 - a vowel follows a vowel that is preceded by a consonant; and, finally, p0,0- a vowel follows two consonants.

The indices follow the same system that I introduced in my paper "On a Case of Samples Connected in Complex Chain" [Markov 1911b]; with reference to my other paper, "Investigation of a Remarkable Case of Dependent Samples" [Markov 1907a], however, p0= p2 . We denote the opposite probabilities for consonants with q and indices that follow the same pattern. If we seek the value of p, we first find 200 approximate values from which we can determine the arithmetic mean. To be precise, we divide the entire sequence of 20,000 letters into 200 separate sequences of 100 letters, and count how many vowels there are in each 100: we obtain 200 numbers, which, when divided by 100, yield 200 approximate values of p.

next page>>