The detection of profanity is a critical task in the current digital age, which allows you to create effective content moderation systems. However, this creates problems in resource-constrained languages where small amounts of annotated data are available. This research work attempts to solve the problem of defining offensive language in a low-resource language, Kazakh. We propose a new approach based on Bidirectional Long Short Term Memory (BiLSTM) networks, high performance in natural language processing tasks, this approach solves this problem.
We can more accurately identify the offending language in the input text by capturing both long term and context dependencies using the bidirectional nature of the BiLSTM architecture. To reduce the shortage of annotated data with limited resources, our method also uses transfer learning methods. After conducting extensive experiments with a data set of offensive languages in the Kazakh language, we demonstrate the effectiveness of our proposed method. These experiments show the most up-to-date results in identifying offensive languages in low-resource Kazakh.
In addition, we consider how different model configurations and training methods affect the performance of our method. Our research provides useful information on how offensive language detected in low-resource languages. In addition, they pave the way for more robust content moderation systems that are appropriate for certain language contexts.