Skip to main content Skip to main navigation menu Skip to site footer
Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences

MODELING AND OPTIMIZATION OF A HYBRID HADOOP–SPARK ARCHITECTURE TO IMPROVE BIG DATA PROCESSING EFFICIENCY

Published June 2026

0

A.B. Kassymova+
JSC Kazakh National Research Technical University named after K.I. Satbayev, Almaty, Kazakhstan
R.K. Uskenbayeva+
JSC Kazakh National Research Technical University named after K.I. Satbayev, Almaty, Kazakhstan
A. Razaque+
Arkansas Tech University, United States, Russellville
S. Aliaskarov+
International IT University, Almaty, Kazakhstan
V. Elle+
JSC Kazakh National Research Technical University named after K.I. Satbayev, Almaty, Kazakhstan
JSC Kazakh National Research Technical University named after K.I. Satbayev, Almaty, Kazakhstan
JSC Kazakh National Research Technical University named after K.I. Satbayev, Almaty, Kazakhstan
Arkansas Tech University, United States, Russellville
International IT University, Almaty, Kazakhstan
JSC Kazakh National Research Technical University named after K.I. Satbayev, Almaty, Kazakhstan
Abstract

With the rapid growth of data volumes, heterogeneity, and intensity, the demands on architectures that provide not only high performance, but also robust scalability, efficient use of computing resources, and fault tolerance are increasing. This article examines a hybrid big data processing architecture that combines the Hadoop Distributed File System and Apache Spark operational processing mechanisms. The goal of the study is to develop a formalized approach to evaluating and optimizing the efficiency of a hybrid environment compared to the standalone use of Hadoop and Spark.

The paper proposes a system of analytical models describing the processing speed, scalability, resource utilization, overhead, and overall efficiency of the hybrid architecture. Unlike studies that limit platform comparisons to general characteristics or isolated benchmarks, this article focuses on the relationship between data storage, inter-node communication, computing load, and cluster configuration parameters. It is demonstrated that combining Hadoop's distributed storage mechanisms with Spark's in-memory processing reduces the impact of disk I/O, improves resilience to increasing load, and ensures a more balanced use of memory and CPU resources.

These results confirm that the hybrid architecture is a promising solution for building scalable analytics platforms designed to process heterogeneous data under variable and intensive workloads. The practical significance of this study lies in the potential use of the proposed models in the design and configuration of regional and enterprise big data analytics systems.

Language

English

How to Cite

[1]
Kassymova, A., Uskenbayeva, R., Razaque, A., Aliaskarov, S. and Elle, V. 2026. MODELING AND OPTIMIZATION OF A HYBRID HADOOP–SPARK ARCHITECTURE TO IMPROVE BIG DATA PROCESSING EFFICIENCY. Bulletin of Abai KazNPU. Series of Physical and Mathematical sciences. 94, 2 (Jun. 2026). DOI:https://doi.org/10.51889/2959-5894.2026.94.2.019.