Nowadays natural language processing is widely used. For instance, it can be used to translate text, in search
engines systems, text topic identification. Such applications require preprocessing of text. It should be done, because
preprocessing of text can influence on system accuracy. Text preprocessing can be done by several ways. One approach
is identifying root of word. Advantage of identifying root of word is that it can save memory of computer, because
repeated roots will be saved one time. This paper describes stemming systems, which can identify root of word. In
literature review part authors reviewed to stemming algorithms, which can identify roots of words of Russian, Uzbek,
Turkish languages. Then authors proposed stemming system, which can identify root of word of Kazakh language. In
current paper authors describe how their system works. To test the system words from various parts of speech were
entered. Proposed system can identify roots of noun, verb, adjective, numeral words. The system response can be seen
in table 1. Pictures below show what kinds of suffixes, endings can be concatenated with root of word of Kazakh
language. However not all combinations are shown in pictures. In conclusion part advices for how to develop stemming
system are written.
STEMMING OF KAZAKH LANGUAGE
Published March 2021
123
128
Abstract
Language
English
How to Cite
[1]
Bogdanchikov, A., Baimuratov, O. and Ayazbayev, D. 2021. STEMMING OF KAZAKH LANGUAGE. Bulletin of Abai KazNPU. Series of Physical and mathematical sciences. 73, 1 (Mar. 2021), 169–173. DOI:https://doi.org/10.51889/2021-1.1728-7901.24.