Skip to main content Skip to main navigation menu Skip to site footer
Bulletin of Abai KazNPU. Series of Physical and mathematical sciences

IDENTIFICATION OFFENSIVE COMMENTS BY USING BIDIRECTIONAL LONG-SHORT-TERM MEMORY

Published December 2023

127

64

B. Omarov+
Al-Farabi Kazakh National University, Almaty, Kazakhstan
A. Toktarova+
Khoja Ahmet Yassawi International Kazakh-Turkish University, Turkestan, Kazakhstan
Zh. Azhibekova+
Asfendiyarov kazakh national medical university, Almaty, Kazakhstan
G. Rakhimbayeva+
Asfendiyarov kazakh national medical university, Almaty, Kazakhstan
G. Beissenova +
M. Auezov South Kazakhstan University, Shymkent, Kazakhstan
Al-Farabi Kazakh National University, Almaty, Kazakhstan
Khoja Ahmet Yassawi International Kazakh-Turkish University, Turkestan, Kazakhstan
Asfendiyarov kazakh national medical university, Almaty, Kazakhstan
Asfendiyarov kazakh national medical university, Almaty, Kazakhstan
M. Auezov South Kazakhstan University, Shymkent, Kazakhstan
Abstract

The detection of profanity is a critical task in the current digital age, which allows you to create effective content moderation systems. However, this creates problems in resource-constrained languages where small amounts of annotated data are available. This research work attempts to solve the problem of defining offensive language in a low-resource language, Kazakh. We propose a new approach based on Bidirectional Long Short Term Memory (BiLSTM) networks, high performance in natural language processing tasks, this approach solves this problem.

We can more accurately identify the offending language in the input text by capturing both long term and context dependencies using the bidirectional nature of the BiLSTM architecture. To reduce the shortage of annotated data with limited resources, our method also uses transfer learning methods. After conducting extensive experiments with a data set of offensive languages in the Kazakh language, we demonstrate the effectiveness of our proposed method. These experiments show the most up-to-date results in identifying offensive languages in low-resource Kazakh.

In addition, we consider how different model configurations and training methods affect the performance of our method. Our research provides useful information on how offensive language detected in low-resource languages. In addition, they pave the way for more robust content moderation systems that are appropriate for certain language contexts.

pdf (Қазақ)
Language

Қазақ

How to Cite

[1]
Омаров, Б., Тоқтарова, А., Ажибекова, Ж., Рахимбаева, Г. and Бейсенова, Г. 2023. IDENTIFICATION OFFENSIVE COMMENTS BY USING BIDIRECTIONAL LONG-SHORT-TERM MEMORY. Bulletin of Abai KazNPU. Series of Physical and mathematical sciences. 84, 4 (Dec. 2023), 173–182. DOI:https://doi.org/10.51889/2959-5894.2023.84.4.017.