Journal Ukrainian Language №3 (63) 2017
UDC 811.161.2’33

Vasyl Starko
Candidate of Philological Sciences, Associate Professor, Doctoral Student, Department of Applied Linguistics, Lesya Ukrainka Eastern European National University, Lutsk

Heading: Researches
Language: Ukrainian

Abstract: The article discusses a series of non-commercial, open-source projects developed for Ukrainian by the r2u team. The dictionary website r2u.org.ua provides full-text search for a collection of mostly Russian-Ukrainian and Ukrainian-Russian dictionaries. It brings back into circulation the dictionaries that were banned by the Soviet government as part of its policy to Russianize the Ukrainian language. The pearl of the collection is the academic Russian-Ukrainian Dictionary edited by Ahatanhel Krymsky and Serhii Yefremov (1924-1933). It was the last top-grade general dictionary published before the launch of the Russianization policy and remains an unparalleled source of proper Ukrainian lexis today.

The dictionary website e2u.org.ua offers a series of modern English-Ukrainian and Ukrainian-English dictionaries. The most prominent ones are the terminological dictionaries in physics and related sciences (by Olha Kocherga and Eugen Meinarovich, over 280,000 headwords), mathematics and informatics (by Eugen Meinarovich and Myroslav Kratko, 43,000 terms), economics (by Anna Shymkiw, 20,000 headwords), and linguistics (by Lada Kolomiiets et al., 9,500 terms). A general English-Ukrainian dictionary is being compiled, one entry at a time, in response to search queries. A large phraseological dictionary is also being added to the website in a piecemeal fashion.

The Large Electronic Dictionary of Ukrainian (VESUM) is a machine-readable POS dictionary. With 316,000 lemmas and over four million generated word forms, it is the biggest of its kind for Ukrainian and has been adopted for full-text search in the Ukrainian-language Wikipedia.

Another project is the Pravopysnyk LanguageTool – an advanced Ukrainian spellchecker which checks also grammar and style (http://languagetool.org/uk/). Finally, the Brown Ukrainian Corpus (BrUK) is a project to build a one-million POS-tagged, lemmatized and disambiguated corpus of modern Ukrainian that can be used, inter alia, to train a Ukrainian POS tagger. All r2u projects are available online, and the corresponding links are provided in the article.

Keywords: computational linguistics, natural language processing, computer lexicography, corpus linguistics, corpus, r2u, electronic dictionary, VESUM, Wikipedia, Pravopysnyk, LanguageTool, Brown Corpus, BrUK.


