Sign language recognition is vital for bridging communication between deaf communities and the hearing world, yet automatically decoding both static and dynamic gestures remains challenging. Over the past decade, skeleton-based approaches—where only joint coordinates (2D or 3D) are extracted from video—have emerged to reduce computational demands and preserve signer privacy. In this review, we examine nineteen key studies from 2015 to 2024, selected by PRISMA 2020 guidelines via a Scopus search, that employ deep models (ST‐GCN, GCN+RNN hybrids, Transformers) trained on skeletal data. We find that while isolated‐word tasks achieve Top‐1 accuracies of 98–99 % (e.g., on AUTSL), performance for continuous sign streams falls to 80–90 % (word error rates of 10–20 %). Multi‐stream architectures—fusing joint positions, bone vectors, and motion cues with attention mechanisms or Transformer layers—offer the best gains, boosting dynamic sign recognition by 3–5 %. Key limitations include the lack of large, richly annotated skeletal corpora, under‐representation of non‐manual signals (facial expressions, torso motion), and limited model portability across different sign languages. We conclude that skeleton‐based methods hold strong potential for privacy‐preserving, noise‐robust gesture recognition; yet realizing real‐time continuous translation will hinge on expanding dataset diversity, tightly integrating non‐manual articulators, and developing hybrid architectures capable of handling the full complexity of live sign language.
RECENT ADVANCEMENTS IN SKELETON-BASED SIGN LANGUAGE RECOGNITION
Published December 2025
0
Abstract
Language
English
How to Cite
[1]
Sembayev, T. and Akbarov, D. 2025. RECENT ADVANCEMENTS IN SKELETON-BASED SIGN LANGUAGE RECOGNITION. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 92, 4 (Dec. 2025). DOI:https://doi.org/10.51889/2959-5894.2025.92.4.018.
https://orcid.org/0000-0003-2360-8767