In this post I explore the first dimension within the many-dimensional space that is produced via the 100 most frequent words found in the New Year addresses . This first dimension divides all the New Year addresses into three groups: all the addresses presented before the year 1985, the addresses presented in the years between 1985 and 1990, and all the addresses given after 1990. These groups can be seen clearly in Figure 1.

Figure 1. New Year addresses mapped by year: Soviet period (red), Perestroika (purple), and post-Soviet period (blue).
These clusters are easy to interpret. The years in blue to the right are the years after the fall of the Soviet Union, i.e., the years when the addresses were given by the presidents of Russia and addressed to the citizens of Russia. The years in purple and red to the left are the years of the Soviet Union; more specifically, the years in purple are the years of Perestroika, with all its new lexicon of change.
One might think that these clusters were formed simply according to chronological year, with the earlier years to the left, the middle years in the center, and the more recent years to the right. However, I can show that the political system, and not the chronological year, is actually a better predictor for these clusters. In order to do so, I use two linear regressions. The first predicts the first dimension coordinate using the year of the address as a predictor, and the second uses the political era – Soviet, Perestroika, or post-Soviet – as a predictor. Both regressions are helpful in predicting the first dimension coordinates. However, the two models are significantly different and can be compared based on how well they predict the outcome. The first one explains 75% of the outcome, whereas the second one explains 96% of the outcome (according to Adjusted R-squared). Thus, the political system is definitely a better predictor of the clusters that we see in Figure 1 than the chronological year.
Why does the political era have an impact on the most frequent words used in a New Year address? The answer is easy to understand if we look at Figure 2, which contains words distributed on the same map as shown in Figure 1. To the right, we can see words that have a negative correlation with Dimension 1. These are words that indicate addresses given by the presidents of Russia. President ‘president’, Rossija ‘Russia’, and graždanin ‘citizen’, which is used to address Russian listeners, instead of the Soviet address, tovarišč ‘comrade’. These are words that never appeared in the Soviet New Year discourse and are clear indicators of post-Soviet times.

Figure 2. Words used in the New Year addresses: Dimensions 1 and 2.
In the middle of Figure 2, at the same location where we see the Perestroika years on Figure 1, we see words such as čelovek ‘human’, mir ‘peace’, and put’ ‘way’ that clearly are associated with disarmament and humanization – new ideas set up during Perestroika.
On the left side of Figure 2 is an illegible cloud of words, which is shown in detail in Figure 3. Here we see words that could have appeared only in the New Year addresses during the Soviet era: tovarišč ‘comrade’ – the Soviet way to address listeners; rabočij ‘worker’ – an important social class during socialist years; words pertaining to the political system, such as socialism ‘socialism’, socialističeskij ‘socialist’, kommunističeskij ‘communist’, leninskij ‘Lenin’s’; and words associated with Soviet institutions, such as central’nyj ‘central’, verxovnyj ‘supreme’, sovet ‘soviet’, KPSS ‘CPSU (Communist Party of the Soviet Union)’, SSSR ‘USSR’, partija ‘party’, and komitet ‘committee’.

Figure 3. Words associated with Soviet years: Dimensions 1 and 2.
It is interesting to pay attention also to the use of personal pronouns that appear among the most frequent words. Figure 4 is the same map as that shown in Figure 2, but now only a few words have been selected. Personal pronouns such as ja ‘I’, my ‘we’, and vy ‘you (plural)’ are highlighted in blue. Interestingly, all these pronouns are gathered on the right side of the map, which is the side associated with the post-Soviet years. By contrast, the Soviet years contain only one word that is compatible with personal pronouns in meaning – narod ‘people’, highlighted in red. This grouping of pronouns indicates a change from more collective thinking, which is characteristic of the Soviet era, to more personal interactions, which is characteristic of the post-Soviet era.

Figure 4. Personal pronouns and the word narod ‘people’: Dimensions 1 and 2.
Thus, we can easily deduce the political era in which a New Year address is given by using only the 100 most frequent words. Political era is the most important dimension in the matrix, and it explains most of the variation found within the 100 most frequent words. My next post will describe the second dimension, which is related to the economic situation.
Pingback: Years of abundance and years of famine: The dark story of oil | Reading the data leaves