Skip to main content Skip to main navigation menu Skip to site footer
Bulletin of Abai KazNPU. Series of Physical and mathematical sciences

A A TASK OF SYNTHETIC CORPORA GENERATION FOR THE LOW-RESOURCE LANGUAGE

Published December 2022

124

115

D. Rakhimova+
al-Farabi Kazakh National University, Almaty
E. Adali+
Istanbul Technical University, Istanbul
A. Shormakova+
al-Farabi Kazakh National University, Almaty
A. Turarbek+
al-Farabi Kazakh National University, Almaty
Y. Suleimenov+
Institute of information and computational technologies, Almaty
al-Farabi Kazakh National University, Almaty
Istanbul Technical University, Istanbul
al-Farabi Kazakh National University, Almaty
al-Farabi Kazakh National University, Almaty
Institute of information and computational technologies, Almaty
Abstract

Recently, various areas of artificial language processing have been actively developing, such as search engines, machine translation technologies, speech technologies, etc. using machine learning technology and non-neural networks. For the implementation and development of these areas, first of all, the task of electronic linguistic resources such as corpora, dictionaries, a set of rules, etc. is acute. These resources should be of a very large volume of good quality. In this article, the problem of shortage of buildings for low-resource languages, which include the Turkic-speaking group, is considered. This is a problem for low-resource languages, such as Kazakh, because there are very few available corpora. This article presents an approach to the creation of synthetic corpora by the method of determining and replacing a candidate word from the list of synonymous dictionary of the Kazakh language. Test experiments were conducted. As a result, the specified case was enlarged 3.37 times.

pdf
Language

English

How to Cite

[1]
Rakhimova, D., Adali, E., Shormakova, A., Turarbek, A. and Suleimenov, Y. 2022. A A TASK OF SYNTHETIC CORPORA GENERATION FOR THE LOW-RESOURCE LANGUAGE. Bulletin of Abai KazNPU. Series of Physical and mathematical sciences. 80, 4 (Dec. 2022), 169–179. DOI:https://doi.org/10.51889/2938.2022.14.84.020.