A A TASK OF SYNTHETIC CORPORA GENERATION FOR THE LOW-RESOURCE LANGUAGE

D. Rakhimova; E. Adali; A. Shormakova; A. Turarbek; Y. Suleimenov

doi:10.51889/2938.2022.14.84.020

Vol. 80 No. 4 (2022)

A A TASK OF SYNTHETIC CORPORA GENERATION FOR THE LOW-RESOURCE LANGUAGE

Published December 2022

187

207

D. Rakhimova⁺⁻

al-Farabi Kazakh National University, Almaty

E. Adali⁺⁻

Istanbul Technical University, Istanbul

A. Shormakova⁺⁻

al-Farabi Kazakh National University, Almaty

A. Turarbek⁺⁻

al-Farabi Kazakh National University, Almaty

Y. Suleimenov⁺⁻

Institute of information and computational technologies, Almaty

al-Farabi Kazakh National University, Almaty

Istanbul Technical University, Istanbul

al-Farabi Kazakh National University, Almaty

Institute of information and computational technologies, Almaty

DOI: 10.51889/2938.2022.14.84.020

Abstract

Recently, various areas of artificial language processing have been actively developing, such as search engines, machine translation technologies, speech technologies, etc. using machine learning technology and non-neural networks. For the implementation and development of these areas, first of all, the task of electronic linguistic resources such as corpora, dictionaries, a set of rules, etc. is acute. These resources should be of a very large volume of good quality. In this article, the problem of shortage of buildings for low-resource languages, which include the Turkic-speaking group, is considered. This is a problem for low-resource languages, such as Kazakh, because there are very few available corpora. This article presents an approach to the creation of synthetic corpora by the method of determining and replacing a candidate word from the list of synonymous dictionary of the Kazakh language. Test experiments were conducted. As a result, the specified case was enlarged 3.37 times.

pdf

Keywords

corpora Kazakh language synonyms linguistic resources

Language

English

How to Cite

[1]

Rakhimova, D., Adali, E., Shormakova, A., Turarbek, A. and Suleimenov, Y. 2022. A A TASK OF SYNTHETIC CORPORA GENERATION FOR THE LOW-RESOURCE LANGUAGE. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 80, 4 (Dec. 2022), 169–179. DOI:https://doi.org/10.51889/2938.2022.14.84.020.

A A TASK OF SYNTHETIC CORPORA GENERATION FOR THE LOW-RESOURCE LANGUAGE

Download Citation