Diagram of the frequency of using letters in Russian. The frequency of the use of letters in the Russian language. How to get information about the use of individual forms of a word

The frequency of the use of letters in Russian

Do you know that some letters of the alphabet are found in words more often than others ... Moreover, the frequency of vowels in the language is higher than consonants.

What letters of the Russian alphabet are most or least common in words used to write text?

Discovery and research general patterns deals with statistics. With the help of this scientific direction, one can answer the above question by counting the number of each of the letters of the Russian alphabet, the words used, choosing an excerpt from the works of various authors. For their own interest and for the sake of boredom, everyone can do it on their own. I will refer to the statistics of an already conducted study ...

The Russian alphabet is Cyrillic. During its existence, it has gone through several reforms, which resulted in the formation of the modern Russian alphabetical system, which includes 33 letters.

o - 9.28%
a — 8.66%
e - 8.10%
and - 7.45%
n - 6.35%
t - 6.30%
p - 5.53%
c - 5.45%
l - 4.32%
c — 4.19%
k - 3.47%
n - 3.35%
m - 3.29%
y - 2.90%
e - 2.56%
I - 2.22%
s — 2.11%
b - 1.90%
h - 1.81%
b - 1.51%
d - 1.41%
th - 1.31%
h - 1.27%
yu - 1.03%
x - 0.92%
g - 0.78%
w - 0.77%
c - 0.52%
u - 0.49%
f - 0.40%
e - 0.17%
b — 0.04%

The Russian letter with the highest frequency in use is the vowel " O', as has been rightly suggested here. There are also characteristic examples, like " DEFENSECAPABILITY"(7 pieces in one word and nothing exotic or surprising; very familiar to the Russian language). The high popularity of the letter "O" is largely due to such grammatical phenomenon like full agreement. That is, "cold" instead of "cold" and "frost" instead of "scum".

And at the very beginning of words, the consonant letter “ P". This leadership is also confident and unconditional. Most likely, the explanation gives a large number of prefixes with the letter “P”: re-, pre-, pre-, pre-, pro- and others.

Letter frequency is the basis of cryptanalysis.

I want to warn you that the information presented in this article is somewhat outdated. I did not rewrite it so that later I could compare how SEO standards change over time. Up-to-date information on this topic you can learn from new materials:

Hello, dear readers of the blog site. Today's article will again be devoted to such a topic as search engine optimization (). Earlier, we have already touched on many issues related to such a concept as.

Today I want to continue talking about on-page SEO, while clarifying some of the points mentioned earlier, as well as talk about what we have not discussed yet. If you are able to write good unique texts, but at the same time do not pay due attention to the perception of them by search engines, then they will not be able to make their way to the top of the search results for queries related to the topics of your wonderful articles.

What affects the relevance of the text to the search query

And this is very sad, because in this way you do not realize the full potential of your project, which can be very impressive. You need to understand that search engines for the most part are stupid and straightforward programs that are not able to go beyond their capabilities and look at your project with human eyes.

They will not see much of what is good and necessary on your project (what you have prepared for visitors). They can only analyze the text, taking into account a lot of components, but they are still very far from human perception.

Therefore, we will need to get into the shoes of search robots at least for a while and understand what they focus on when ranking various texts for various search queries (). And for this you need to have an idea about, for this you will need to read the article.

Usually they try to use keywords in the title of the page, in some internal headings, as well as evenly and as naturally as possible to distribute them throughout the article. Yes, of course, highlighting keys in the text can also be used, but do not forget about the re-optimization that may follow.

The density of the occurrence of keys in the text is also important, but now this is rather not a desirable factor, but, on the contrary, a warning one - you can’t overdo it.

The value of the keyword occurrence density in the document is determined quite simply. In fact, this is the frequency of its use in the text, which is determined by dividing the number of its occurrence in the document by the length of the document in words. Previously, the position of the site in the issue directly depended on this.

But you probably understand that it will not be possible to compose all the material only from the keys, because it will not be readable, but thank God this is not necessary. Why, you ask? Yes, because there is a limit to the frequency of using a keyword in the text, after which the relevance of a document for a query containing this keyword will no longer increase.

Those. it will be enough for us to achieve a certain frequency and we, thus, optimize it as much as possible. Or we overdo it and fall under the filter.

It remains to solve two questions (and maybe three): what is the maximum density of the occurrence of the keyword, after which it is already dangerous to increase it, as well as to find out.

The fact is that keywords highlighted with accent tags and enclosed in the TITLE tag have more weight for the search than similar keywords that simply occur in the text. But recent times webmasters began to use this and completely spammed this factor, in connection with which its importance has decreased and may even lead to a ban of the entire site due to the abuse of strongs.

But the keys in the TITLE are still relevant, it is better not to repeat them there and not to try to push them too much into one page title. If the keywords are in the TITLE, then we can significantly reduce their number in the article (and therefore make it easy to read and more suitable for people, and not for search engines), having achieved the same relevance, but without the risk of falling under the filter.

I think that everything is clear with this question - the more keys are enclosed in accent and TITLE tags, the more chances there are to lose everything at once. But if you do not use them at all, then you will not achieve anything either. The most important criterion is the naturalness of the introduction of keywords in the text. If they are, but the reader does not stumble about them, then in general everything is fine.

Now it remains to figure out what frequency of using a keyword in a document is optimal, which allows you to make the page as relevant as possible without entailing sanctions. Let's first remember the formula that most (probably all) search engines use to rank.

How to determine the acceptable frequency of using a key

We have already talked about mathematical model in the article mentioned above. Its essence for this particular search query is expressed by one simplified formula: TF*IDF. Where TF is the direct frequency of occurrence of this query in the text of the document (the frequency with which words occur in it).

IDF - the inverse frequency of occurrence (rarity) of this query in all other Internet documents indexed by this search engine (in the collection).

This formula allows you to determine the correspondence (relevance) of a document to a search query. The higher the value of the product TF*IDF, the more relevant this document will be and the higher it will be, all other things being equal.

Those. it turns out that the weight of the document for a given query (its correspondence) will be the greater, the more often the keys from this query are used in the text, and the less often these keys are found in other Internet documents.

It is clear that we cannot influence the IDF, except by choosing another query for which we will optimize. But we can and will influence TF, because we want to grab our share (and not a small one) of traffic from Yandex and Google search results on the user questions we need.

But the fact is that search algorithms calculate the TF value using a rather tricky formula that takes into account the growth in the frequency of using the keyword in the text only up to a certain limit, after which the growth of TF practically stops, despite the fact that you will increase the frequency. This is a kind of anti-spam filter.

A relatively long time ago (until about 2005), the TF value was calculated using a fairly simple formula and was actually equal to the keyword occurrence density. The results of calculating relevance using this formula were not exactly liked by search engines, because they pandered to spammers.

Then the TF formula became more complicated, such a thing as page nausea appeared and it began to depend not only on the frequency of occurrence, but also on the frequency of the use of other words in the same text. And the optimal value of TF could be achieved if the key turned out to be the most frequently used word.

It was also possible to increase the TF value by increasing the text size while maintaining the occurrence percentage. The larger the towel with the article with the same percentage of keys, the higher this document will be.

Now the TF formula has become even more complicated, but at the same time, now we do not need to bring the density to the point where the text becomes unreadable and search engines will impose ban on our project for spam. And now there is no need to write disproportionately long sheets either.

While maintaining the same ideal density (we will define it a little lower from the corresponding graph), increasing the word size of an article will only improve its position in the SERP until it reaches a certain length. Once you have the ideal length, increasing it further will not affect relevance (more precisely, it will, but very, very little).

All this can be seen clearly if you build a graph based on this tricky TF (direct entry frequency). If on one scale of this graph there is TF, and on the other scale - the percentage of the frequency of occurrence of the keyword in the text, then we will get the so-called hyperbole as a result:

The schedule, of course, is approximate, because few people know the real TF formula used by Yandex or Google. But qualitatively it can be determined optimal range where the frequency should be. This is about 2-3 percent of total number words.

If you take into account that you will still enclose some of the keys in accent tags and the TITLE header, then this will be the limit, after which a further increase in density may be fraught with a ban. It is no longer profitable to saturate and disfigure the text with a large number of keywords, because there will be more minuses than pluses.

What is the length of the text will be sufficient for promotion

Based on the same assumed TF, one can plot its value against word length. In this case, you can take the frequency of keywords constant for any length and equal, for example, to any value from the optimal range (from 2 to 3 percent).

Remarkably, we will get a graph of exactly the same shape as the one discussed above, only the length of the text in thousands of words will be adjusted along the abscissa. And from it it will be possible to draw a conclusion about optimal length range, at which almost the maximum value of TF is already reached.

As a result, it turns out that it will lie in the range from 1000 to 2000 words. With a further increase, relevance will practically not grow, and with a shorter length, it will fall rather sharply.

That. we can conclude that in order for your articles to take high places in the search results, you need to use keywords in the text with a frequency of at least 2-3%. This is the first and main conclusion that we made. Well, the second one is that now it is not at all necessary to write very voluminous articles in order to get into the Top.

It will be enough to surpass the milestone of 1000 - 2000 words and include 2-3% of keywords in it. That's it - that's it recipe for the perfect text, which will be able to compete for a place in the top for low-frequency queries, even without the use of external optimization (buying links to this article with anchors that include keywords). Although, to rummage around a bit in Miralinks , GGL, Rotapost or GetGoodLink is fine as it will help your project.

Let me remind you once again that the length of the text you wrote, as well as the frequency of using certain keywords in it, you can find out with the help of specialized programs or with the help of online services that specialize in their analysis. One of these services is ISTIO, about the work with which I spoke.

Everything I said above is not one hundred percent reliable, but very similar to the truth. Anyway, my personal experience confirms this theory. But the algorithms of Yandex and Google are constantly undergoing changes, and few people know how it will be tomorrow, except for those who are close to their development or developers.

Good luck to you! See you soon on the blog pages site

You may be interested

Internal optimization - keyword selection, nausea check, optimal Title, content duplication and relinking under low frequencies
Keywords in text and headings
How keywords affect website promotion in search engines
Online services for webmasters - everything you need to write articles, their search engine optimization and analysis of its success
Ways to optimize content and take into account the theme of the site during link promotion to minimize costs
Yandex Wordstat and the semantic core - selection of keywords for the site using statistics from the online service Wordstat.Yandex.ru
Anchor - what is it and how important are they in website promotion
What search engine optimization factors affect website promotion and to what extent
Promotion, promotion and optimization of the site independently
Accounting for the morphology of the language and other problems solved by search engines, as well as the difference between HF, MF and LF queries
Website trust - what it is, how to measure it in XTools, what affects it and how to increase the authority of your site

Frequency of use

noun, number of synonyms: 1

commonness (10)


  • - Vocabulary, the use of which is limited in the force of any. extralinguistic reasons. To L.o.u. include: dialectisms, terms and professionalisms, jargon, colloquial words and expressions, vulgarisms...

    Dictionary of sociolinguistic terms

  • General linguistics. Sociolinguistics: Dictionary-Reference

  • - translation of the German term Gebrauchstypen, introduced by Delbrück to denote the established uses of grammatical forms. To T. pack. include, for example, different kinds syntactic use...

    encyclopedic Dictionary Brockhaus and Euphron

  • - Vocabulary, the use of which is limited by extralinguistic reasons: 1) dialectisms, limited territorially; 2) terms used in scientific style...
  • Dictionary of linguistic terms T.V. Foal

  • Dictionary of linguistic terms T.V. Foal

  • - Uses that prohibit the use of the differences of one object from another: Living organisms cannot exist without ...
  • - Uses related to specific representatives of this class of objects: I need to see this person ...

    Terms and concepts of general morphology: Dictionary-reference book

  • - 1) Options provided for by the rules for the design of complex unionless proposals: when explaining or motivating, instead of a colon, a dash can be used: Separation is illusory - we will be together soon ...

    Syntax: Dictionary

  • - adverb, number of synonyms: 1 under a bushel...

    Synonym dictionary

  • - adj., number of synonyms: 10 that went into circulation, became obsolete, did not meet modern requirements, became obsolete, obsolete, retreated into the realm of legend ...

    Synonym dictionary

  • - Cm....

    Synonym dictionary

  • - adj., Number of synonyms: 19

    Synonym dictionary

  • - adj., number of synonyms: 2 unusable uncommon ...

    Synonym dictionary

  • - adj., number of synonyms: 3

    Synonym dictionary

  • - 1) Variants provided for by the rules for the design of complex non-union sentences: for explanation or motivation, a dash can be used instead of a colon: Separation is illusory - we will be together soon 2) When separated ...

    Dictionary of linguistic terms T.V. Foal

"frequency of use" in books

Feeding frequency

by Harmar Hillery

Feeding frequency

by Harmar Hillery

Feeding Frequency The number of times a puppy needs to be fed per day depends on the size of the breed. Most puppies do well when fed every three hours day and night, but if they are born prematurely or weigh less than 85g at birth, they are likely to die.

Feeding frequency

From the book Breeding Dogs by Harmar Hillery

Feeding Frequency The number of times a puppy needs to be fed per day depends on the size of the breed. Most puppies do well when fed every three hours day and night, but if they are born prematurely or weigh less than 85g at birth, they are likely to die.

Feeding frequency

From the book Dogs and their breeding [Breeding dogs] by Harmar Hillery

Feeding Frequency The number of times a puppy needs to be fed per day depends on the size of the breed. Most puppies do well when fed every three hours day and night, but if they are born prematurely or weigh less than 85g at birth, they are likely to die.

Frequency

From the book Real Estate. How to advertise it author Nazaikin Alexander

14.2.3. Interaction frequency

by Dimitri Nicola

14.2.3. Frequency of Interaction The more often the same group of competitors interact, the more persistent the collusion becomes, as violations are punished more promptly. If, for example, firms compete less frequently, then their ability to maintain collusion is lower.

15.4.6. Auction Frequency

From the book Purchasing Guide by Dimitri Nicola

15.4.6. Frequency of auctions As discussed above, some auction rings may transfer funds between themselves after the auction for which they have colluded, or keep records of amounts due, and only from time to time.

8. The frequency of use of function words turns out to be the author's invariant

From the book Book 2. Changing dates - everything changes. [New Chronology of Greece and the Bible. Mathematics reveals the deception of medieval chronologists] author Fomenko Anatoly Timofeevich

8. The frequency of use of function words turns out to be the author's invariant. A remarkable exception is our parameter 3 - the frequency of use of all function words - PREPOSITIONS, UNIONS AND PARTICLES. The evolution of this parameter depending on the growth of the sample size is shown

Frequency

From the book Big Soviet Encyclopedia(NA) author TSB

Frequency

author Nazaikin Alexander

Frequency

From the book Media Planning for 100 author Nazaikin Alexander

Frequency TV channels are broadcast on VHF and UHF frequencies. The meter bands were the first to be mastered on television. In the 90s of the XX century, decimeter channels were actively launched in Moscow. Previously, the frequency was of significant importance, since for receiving different channels

Frequency

From the book Media Planning for 100 author Nazaikin Alexander

Frequency The signal quality depends on the frequency of the signal transmission. To a greater extent, it is provided in the VHF bands (frequency modulation FM). Listeners prefer good sounding, so VHF stations have significant audience ratings and are preferred

3.2. Frequency

author Ivanov Dmitry Olegovich

3.2. Frequency When discussing the significance of any pathology in medicine, in our opinion, it is important to talk not only about the etiology, pathogenesis, clinic and severity of injuries and complications that have arisen or may occur, but also about the prevalence of this pathology. To

4.2. Frequency

From the book Violations heat balance in newborns author Ivanov Dmitry Olegovich

4.2. Frequency Hyperthermia in newborns is probably much less common than hypothermia. This is probably related to the fact that there are very few works devoted to hyperthermia in infants in the scientific literature. Maayan-Metzger A. et al. (2003) analyzed 42313 case histories

Frequency

From the book Glucose Metabolism in Newborns author Ivanov Dmitry Olegovich

Frequency Korblant M., who defined hypoglycemia as a blood glucose concentration of less than 30 mg% (1.67 mmol / l) in the first 72 hours of life, found it in 4.4% of all live births. In 1971, Lubchenco L. O. and Bard N., using the criteria of Korblant M., revealed hypoglycemia in newborns with greater

The dictionary includes the most commonly used words of the modern Russian language (2nd half of the 20th - early 21st centuries), provided with information about the frequency of use, statistical distribution by texts and genres, and by the time the texts were created. The dictionary is based on the texts of the National Corpus of the Russian Language with a volume of 100 million words. More information about the history of frequency dictionaries of the Russian language and methods for creating the "New Frequency Dictionary of Russian Vocabulary" of the dictionary can be found in.

The development of the concept of the dictionary and its preparation for publication was carried out by O.N. Lyashevskaya and S.A. Sharov, electronic version prepared by A. V. Sannikov. The authors are grateful to V. A. Plungyan, A. Ya. Shaikevich, E. A. Grishina, B. P. Kobritsov, E. V. Rakhilina, S. O. Savchuk, D. V. Sichinava and other participants who took part in the discussion of the principles of creating a dictionary. We thank O. Uryupina, D. and G. Bronnikovs, B. Kobritsov, as well as employees of Yandex LLC A. Abroskin, N. Grigoriev, A. Sokirko for their help at various stages of collecting and computer processing of the material.

How to find a word in a dictionary?

The two main sections of the dictionary are a list of words, sorted alphabetically and by general frequency of use in the corpus. All words are given in their original (initial) form: for names, this is the nominative case form (for nouns, as a rule, the singular form, for adjectives - the full form male), for verbs - the infinitive form.

The alphabetical list contains 60 thousand of the most frequent word forms. To find information about the desired word, go to the section, select the first letter of the word and find the word you are looking for in the table. To quickly find a word, you can also use the search box, for example:

Word: bright

In this way, you can find information not only about a particular word, but also about a group of words that begin or end in the same way. To do this, in the search box, use an asterisk (*) after the typed sequence of letters (“all words starting with ...”) or before a string of letters (“all words ending with ...”. For example, if you want to find all words starting with re-, type in the search box:

Word: re*

If you want to find all words ending with - enko, type in the search box:

Word: *nko

In the frequency list of lemmas, the words are ordered by the general frequency of use in the corpus of modern Russian literary language. The frequency list includes 20,000 of the most common lemmas.

To find information about the desired word, go to the section and find the word you are looking for in the table. To search for information about individual words, it is best to use the quick word search window.

Why can't I find the word in the dictionary even though I can find it in the corpus?

This may be due to several reasons. Firstly, the word may have a low frequency (for example, only 3 occurrences in the corpus) or be used only in texts written before 1950. Secondly, a word can occur many times, but in one or two texts: such lemmas were deliberately excluded from the vocabulary of the dictionary. Thirdly, we cannot exclude that there was an error in the automatic determination of the original form or part-of-speech characteristics of the word, or the word was erroneously attributed as a proper name. The site presents a "test" version of the frequency dictionary, and we are going to continue to work on clarifying its lexical composition.

What information about the use of the word can be obtained?

In the dictionary, you can get the following information about the use of the word in the corpus:

  • total number of occurrences of the lemma (total frequency in units of ipm), see sections , frequency dictionaries fiction and other functional styles; frequency dictionaries of nouns, verbs and other parts of speech
  • the frequency rank of the word (that is, the serial number in the general frequency list), see sections, frequency dictionaries of nouns, verbs and other parts of speech.
  • the number of texts in which the word occurred (number of documents), see section ;
  • coefficient of variation D, see sections and frequency dictionaries of nouns, verbs and other parts of speech
  • distribution of the use of the word in texts created in different decades (1950s, 1960s, etc.), see section;
  • general frequency of use of individual word forms, see section Alphabetical list of word forms.

    In dictionaries of significant vocabulary, one can also obtain information about the comparative frequency of a word in the general corpus and in the subcorpus of texts of a certain functional style (fiction, journalism, etc.) and the likelihood indicator LL-score.

    In addition to quantitative indicators, the part of speech is indicated with the word. This is done in order to spread words different parts speeches that have the same original form (cf. bake - noun and verb).

    What is ipm?

    The overall frequency characterizes the number of uses per million words of the corpus, or ipm (instances per million words). This is a unit of measurement of frequency generally accepted in the world practice, which simplifies the comparison of the frequency of a word in different frequency dictionaries and in different corpora. The fact is that the samples of texts on which the frequency is measured can differ quite a lot in size. For example, if the word power occurs 55 times in a corpus of 400 thousand words, 364 times in a million corpus and 40598 times in a 100 million corpus of the modern Russian language and 55673 times in a large 135 million corpus of NKRYA, then its frequency in ipm will be 137.5, 364.0, 372.06 and 412.39, respectively.

    Frequency dictionaries, ed. L.N. Zasorina and L. Lenngren were built on a sample size of one million word usages, respectively, we can assume that the absolute indicators appearing there are also given in ipm.

    What is the coefficient of variation D?

    The coefficient D, introduced by A. Juiland (Juilland et al. 1970), is used in many frequency dictionaries (L. Lenngren's Russian dictionary, British National Corpus dictionary, French business vocabulary). This coefficient allows you to see how evenly the word is distributed in different texts.

    The coefficient value is defined in the range from 0 to 100. For example, the word and occurs in almost all texts of the corpus, and its D value is close to 100. The word commissurotomy occurs 5 times in the corpus, but only in one text; it has a D value of about 0.

    Specifying the coefficient D for each word makes it possible to assess how specific it is to individual subject areas. For example, words overripe and implant have approximately equal frequency (0.56 ipm), but the coefficient D y overripe equals 90, a at the implant 0. This means that the first word occurs evenly in texts of different directions and is significant for a large number of subject areas, while the word implant is present only in a few texts on the subject of "medicine and health".

    What can you learn about the history of the use of the word in different periods?

    Information on the distribution of word frequency in different decades of the 2nd half of the 20th century and at the beginning of the 21st century can be obtained from. For example, one can see how the fate of the word perestroika:

    The sharp surge in its use in the 1980s can be fully explained by the socio-historical realities of that time; at the same time, from a linguistic point of view, this fact can be interpreted as follows: the word perestroika enriched with a new meaning, which became dominant in subsequent years.

    Why are proper names and abbreviations highlighted in a separate list?

    Proper names are separated from the main part of the vocabulary, as they form a group that is much less statistically stable, and their frequency largely depends on the choice of texts in the corpus and on their topic (in particular, on the place and time of the events described). In Lengren 1993, the opinion is expressed that the inclusion of proper names in a frequency dictionary on a general basis inevitably leads to its premature obsolescence.

    The dictionary includes the nuclear part of this list, numbering 3,000 of the most frequent units. To search for data on the use of names, patronymics, surnames, nicknames, nicknames, toponyms, names of organizations and abbreviations, go to the section Alphabetical list of proper names and abbreviations, select the letter that begins with the word you are looking for and find it in the table. You can also use the quick word search box.

    How to get information about the use of individual forms of a word?

    In addition to information about the use of the lemma (that is, the word in all forms of inflection), in the dictionary you can find out how individual word forms are used. Go to the Alphabetical list of word forms section, select the letter with which the word form begins and find it in the table. You can also use the quick search box, for example:

    word form: fly

    To find all word forms that begin (or end) with a specific letter sequence, use the asterisk (*) sign in the search box. For example, all word forms beginning with put to sleep, can be found by typing:

    word form: sleep*

    All word forms ending in ¬ –com, can be found by typing:

    word form: *com

    The alphabetical list of word forms includes all word forms of the corpus with a frequency above 0.1 ipm (about 15 thousand in total) and contains information about their total frequency. Homonymous word forms are marked with * in the table.

    How to find information about the "most common" words?

    With the help of our dictionary, you can find information about classes of words that differ in common statistical characteristics. These are, in particular:

  • the most frequent words in the total sample from the corpus; mid-frequency words for the total sample, etc. (see section );
  • words most frequently found in the subcorpus of fiction (see section Frequency Dictionary of Fiction);
  • the words most frequently found in the journalism subcorpus (see the Frequency Dictionary of journalism section);
  • words most frequently found in the subcorpus of other non-fiction (see section Frequency Dictionary of Other Non-Fiction);
  • words that are most characteristic of oral speech (see the Frequency Dictionary of Live Speech section).
  • the most frequent nouns (see section Frequency list of nouns);
  • the most frequent verbs (see the Frequent list of verbs section);

    and other frequency lists of part-speech classes.

    In addition to the proposed classes, you can independently explore other groups of words using the table “General” in the section of the Alphabetical list of word forms. alphabetical list» (for example, you can explore the most frequent verbs with the prefix re-, words found in more than 200 texts, and much more: the principles of class grouping depend on your tasks and on your imagination).

    How to trace the distribution of frequency in texts of different functional styles?

    LN Zasorina's frequency dictionary provides data on the use of the word in four types of texts: (I) newspaper and magazine texts, (II) dramaturgy, (III) scientific and journalistic texts, (IV) fiction. In our dictionary, you can get similar information using the section “Distribution of lemmas by functional styles”.

    Frequency dictionaries of functional styles are compiled on the basis of subcorpuses of fiction, journalism, other non-fiction and lively oral speech. In comparison with the dictionary of L. N. Zasorina, the composition of the headings has been somewhat changed: instead of dramaturgy, recordings of live oral speech and transcripts of film soundtracks are used, scientific literature is highlighted in a separate heading, along with official business, church and other non-fiction literature.

    The list includes 5000 most frequent lemmas of these subcorpuses. For each lemma, the part of speech, the frequency in the subcorpus, and the coefficient D are indicated.

    What is a meaningful vocabulary (fiction, etc.)?

    There are words that are used much more often in one of the functional styles than in others. For example, for live oral speech, such words are here in general and OK. Indeed, it is difficult to assume that in the scientific and technical literature these words are used as often as in everyday language.

    The list of the most typical lemmas for each functional type of texts was selected based on a comparison of the frequency of lemmas in this subcorpus of texts and in the rest of the corpus. Dictionaries of meaningful vocabulary include 500 lemmas each.

    What do the frq1, frq2 and LL-score mean in the meaningful vocabulary?

    Frq1 is the overall frequency of the lemma in the entire corpus (in units of ipm), frq2 is the frequency of the lemma in the given subcorpus (the subcorpus of fiction, journalism, other non-fiction and lively speech, respectively), LL-score is the likelihood ratio calculated based on frq1 and frq2 according to the formula proposed by P. Rayson and A. Garside (see the Introduction to the dictionary for more details). The higher the LL-score, the more significant the word is for a given functional style.

    How to get a list of the 100 most frequent verbs?

    In chapter " General vocabulary: Parts of Speech” the frequency list of lemmas is divided into seven sublists: nouns, verbs, adjectives, adverbs and predicates, pronouns, numerals and auxiliary parts of speech. Here, for each lemma, its general frequency and rank (serial number) are indicated in general list. Each list contains 1000 most frequent lemmas.

    Thus, you can get a list of the 100 most frequent verbs by going to the Verb Frequency List subsection and selecting the first 100 verbs at the top of the list. Similarly, you can find out which adjective is the most frequent (as indicated in the Frequency List of Adjective Names section, this adjective new) and find out many others interesting facts concerning the composition of part-speech classes.

    How to use auxiliary tables?

    Auxiliary tables include, firstly, data on the frequency of part-of-speech classes, as well as other grammatical categories. These data were obtained on the basis of the NCRL subcorpus with the lexico-grammatical ambiguity removed (manually) (the size is more than 6 million words). Since the statistics concern large classes of words, there is reason to believe that the proportion of parts of speech and other grammatical categories will be the same in the entire corpus.

    Secondly, this section provides information about the coverage of the text with lexemes, the average length of a word, word form and sentence.

    Thirdly, here are frequency lists of the use of letters of the Russian alphabet, punctuation marks, as well as two-letter and multi-letter combinations.