A fundamental component of natural language understanding, semantic role labeling (SRL) clarifies the relationship between predicates and their arguments, therefore enabling activities including information extraction, machine translation, and question answering. Though much study has been done on SRL for high-resource languages, low-resource languages like Kazakh still relatively underexplored. This work fills the gap by offering both unique datasets and model architectures tailored specifically for Kazakh SRL. Starting with annotated SRL datasets that reflect Kazakh's rich morphological characteristics, including agglutinative suffixes and case-marking patterns, we build. Building on these data sources, we create and contrast many SRL models, from feature-driven traditional machine learning techniques to neural architectures improved by morphological embeddings. Our findings show how using Kazakh's unique language traits improves performance and draw attention to ongoing issues caused by data sparsity and complex morphology. We also address pragmatic issues for dataset generation, annotation consistency, and generalization to other Turkic languages. The findings highlight the possibility of high-quality SRL in low-resource environments and open new paths for Kazakh-language NLP study.
Language
English
How to Cite
[1]
Aitim, A. 2026. SEMANTIC ROLE LABELING FOR KAZAKH: MODELS AND DATASETS. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 93, 1 (Mar. 2026), 128–140. DOI:https://doi.org/10.51889/2959-5894.2026.93.1.011.
https://orcid.org/0000-0003-2982-214X