Skip to main content Skip to main navigation menu Skip to site footer
Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences

SEMANTIC ROLE LABELING FOR KAZAKH: MODELS AND DATASETS

Published March 2026

0

0

A.K. Aitim+
International Information Technology University, Almaty, Kazakhstan
https://orcid.org/0000-0003-2982-214X
International Information Technology University, Almaty, Kazakhstan
Abstract

A fundamental component of natural language understanding, semantic role labeling (SRL) clarifies the relationship between predicates and their arguments, therefore enabling activities including information extraction, machine translation, and question answering. Though much study has been done on SRL for high-resource languages, low-resource languages like Kazakh still relatively underexplored. This work fills the gap by offering both unique datasets and model architectures tailored specifically for Kazakh SRL. Starting with annotated SRL datasets that reflect Kazakh's rich morphological characteristics, including agglutinative suffixes and case-marking patterns, we build. Building on these data sources, we create and contrast many SRL models, from feature-driven traditional machine learning techniques to neural architectures improved by morphological embeddings. Our findings show how using Kazakh's unique language traits improves performance and draw attention to ongoing issues caused by data sparsity and complex morphology. We also address pragmatic issues for dataset generation, annotation consistency, and generalization to other Turkic languages. The findings highlight the possibility of high-quality SRL in low-resource environments and open new paths for Kazakh-language NLP study.

pdf
Language

English

How to Cite

[1]
Aitim, A. 2026. SEMANTIC ROLE LABELING FOR KAZAKH: MODELS AND DATASETS. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 93, 1 (Mar. 2026), 128–140. DOI:https://doi.org/10.51889/2959-5894.2026.93.1.011.