SEMANTIC ROLE LABELING FOR KAZAKH: MODELS AND DATASETS

A.K. Aitim

doi:10.51889/2959-5894.2026.93.1.011

Vol. 93 No. 1 (2026)

SEMANTIC ROLE LABELING FOR KAZAKH: MODELS AND DATASETS

Published March 2026

28

14

A.K. Aitim⁺⁻

International Information Technology University, Almaty, Kazakhstan

https://orcid.org/0000-0003-2982-214X

International Information Technology University, Almaty, Kazakhstan

DOI: 10.51889/2959-5894.2026.93.1.011

Abstract

A fundamental component of natural language understanding, semantic role labeling (SRL) clarifies the relationship between predicates and their arguments, therefore enabling activities including information extraction, machine translation, and question answering. Though much study has been done on SRL for high-resource languages, low-resource languages like Kazakh still relatively underexplored. This work fills the gap by offering both unique datasets and model architectures tailored specifically for Kazakh SRL. Starting with annotated SRL datasets that reflect Kazakh's rich morphological characteristics, including agglutinative suffixes and case-marking patterns, we build. Building on these data sources, we create and contrast many SRL models, from feature-driven traditional machine learning techniques to neural architectures improved by morphological embeddings. Our findings show how using Kazakh's unique language traits improves performance and draw attention to ongoing issues caused by data sparsity and complex morphology. We also address pragmatic issues for dataset generation, annotation consistency, and generalization to other Turkic languages. The findings highlight the possibility of high-quality SRL in low-resource environments and open new paths for Kazakh-language NLP study.

pdf

Keywords

Kazakh language semantic role labeling low-resource languages semantic role labeling models data resources natural language processing

Language

English

How to Cite

[1]

Aitim, A. 2026. SEMANTIC ROLE LABELING FOR KAZAKH: MODELS AND DATASETS. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 93, 1 (Mar. 2026), 128–140. DOI:https://doi.org/10.51889/2959-5894.2026.93.1.011.

SEMANTIC ROLE LABELING FOR KAZAKH: MODELS AND DATASETS

Download Citation