Skip to main content Skip to main navigation menu Skip to site footer

Уважаемые пользователи! На нашем хостинге ведутся технические работы, на сайте могут быть ошибки. Приносим свои извинения за временные неудобства.

Bulletin of the Abai KazNPU, the series of "Physical and Mathematical Sciences"

IDENTIFICATION OFFENSIVE COMMENTS BY USING BIDIRECTIONAL LONG-SHORT-TERM MEMORY

Published December 2023
Al-Farabi Kazakh National University, Almaty, Kazakhstan
Khoja Ahmet Yassawi International Kazakh-Turkish University, Turkestan, Kazakhstan
Asfendiyarov kazakh national medical university, Almaty, Kazakhstan
Asfendiyarov kazakh national medical university, Almaty, Kazakhstan
M. Auezov South Kazakhstan University, Shymkent, Kazakhstan
Abstract

The detection of profanity is a critical task in the current digital age, which allows you to create effective content moderation systems. However, this creates problems in resource-constrained languages where small amounts of annotated data are available. This research work attempts to solve the problem of defining offensive language in a low-resource language, Kazakh. We propose a new approach based on Bidirectional Long Short Term Memory (BiLSTM) networks, high performance in natural language processing tasks, this approach solves this problem.

We can more accurately identify the offending language in the input text by capturing both long term and context dependencies using the bidirectional nature of the BiLSTM architecture. To reduce the shortage of annotated data with limited resources, our method also uses transfer learning methods. After conducting extensive experiments with a data set of offensive languages in the Kazakh language, we demonstrate the effectiveness of our proposed method. These experiments show the most up-to-date results in identifying offensive languages in low-resource Kazakh.

In addition, we consider how different model configurations and training methods affect the performance of our method. Our research provides useful information on how offensive language detected in low-resource languages. In addition, they pave the way for more robust content moderation systems that are appropriate for certain language contexts.

pdf (Қаз)
Language

Қаз

How to Cite

[1]
Омаров, Б., Тоқтарова, А., Ажибекова, Ж., Рахимбаева, Г. and Бейсенова, Г. 2023. IDENTIFICATION OFFENSIVE COMMENTS BY USING BIDIRECTIONAL LONG-SHORT-TERM MEMORY. Bulletin of the Abai KazNPU, the series of "Physical and Mathematical Sciences". 84, 4 (Dec. 2023), 173–182. DOI:https://doi.org/10.51889/2959-5894.2023.84.4.017.