ҒАЙБАТ ПІКІРЛЕРДІ АНЫҚТАУДА ЕКІ БАҒЫТТЫ ҰЗАҚ – ҚЫСҚА МЕРЗІМДІ ЖАД ЖЕЛІСІН ҚОЛДАНУ

B. Omarov; A. Toktarova; Zh. Azhibekova; G. Rakhimbayeva; G. Beissenova

doi:10.51889/2959-5894.2023.84.4.017

Vol. 84 No. 4 (2023)

IDENTIFICATION OFFENSIVE COMMENTS BY USING BIDIRECTIONAL LONG-SHORT-TERM MEMORY

Published December 2023

176

125

B. Omarov⁺⁻

Al-Farabi Kazakh National University, Almaty, Kazakhstan

A. Toktarova⁺⁻

Khoja Ahmet Yassawi International Kazakh-Turkish University, Turkestan, Kazakhstan

Zh. Azhibekova⁺⁻

Asfendiyarov kazakh national medical university, Almaty, Kazakhstan

G. Rakhimbayeva⁺⁻

Asfendiyarov kazakh national medical university, Almaty, Kazakhstan

G. Beissenova ⁺⁻

M. Auezov South Kazakhstan University, Shymkent, Kazakhstan

Al-Farabi Kazakh National University, Almaty, Kazakhstan

Khoja Ahmet Yassawi International Kazakh-Turkish University, Turkestan, Kazakhstan

Asfendiyarov kazakh national medical university, Almaty, Kazakhstan

M. Auezov South Kazakhstan University, Shymkent, Kazakhstan

DOI: 10.51889/2959-5894.2023.84.4.017

Abstract

The detection of profanity is a critical task in the current digital age, which allows you to create effective content moderation systems. However, this creates problems in resource-constrained languages where small amounts of annotated data are available. This research work attempts to solve the problem of defining offensive language in a low-resource language, Kazakh. We propose a new approach based on Bidirectional Long Short Term Memory (BiLSTM) networks, high performance in natural language processing tasks, this approach solves this problem.

We can more accurately identify the offending language in the input text by capturing both long term and context dependencies using the bidirectional nature of the BiLSTM architecture. To reduce the shortage of annotated data with limited resources, our method also uses transfer learning methods. After conducting extensive experiments with a data set of offensive languages in the Kazakh language, we demonstrate the effectiveness of our proposed method. These experiments show the most up-to-date results in identifying offensive languages in low-resource Kazakh.

In addition, we consider how different model configurations and training methods affect the performance of our method. Our research provides useful information on how offensive language detected in low-resource languages. In addition, they pave the way for more robust content moderation systems that are appropriate for certain language contexts.

pdf (Қазақ)

Keywords

obscene language low-resource language BiLSTM machine learning algorithms

Language

Қазақ

How to Cite

[1]

Omarov Б., Toktarova А., Azhibekova Ж., Rakhimbayeva Г. and Beissenova Г. 2023. IDENTIFICATION OFFENSIVE COMMENTS BY USING BIDIRECTIONAL LONG-SHORT-TERM MEMORY. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 84, 4 (Dec. 2023), 173–182. DOI:https://doi.org/10.51889/2959-5894.2023.84.4.017.

IDENTIFICATION OFFENSIVE COMMENTS BY USING BIDIRECTIONAL LONG-SHORT-TERM MEMORY

Download Citation