With the rapid growth of data volumes, heterogeneity, and intensity, the demands on architectures that provide not only high performance, but also robust scalability, efficient use of computing resources, and fault tolerance are increasing. This article examines a hybrid big data processing architecture that combines the Hadoop Distributed File System and Apache Spark operational processing mechanisms. The goal of the study is to develop a formalized approach to evaluating and optimizing the efficiency of a hybrid environment compared to the standalone use of Hadoop and Spark.
The paper proposes a system of analytical models describing the processing speed, scalability, resource utilization, overhead, and overall efficiency of the hybrid architecture. Unlike studies that limit platform comparisons to general characteristics or isolated benchmarks, this article focuses on the relationship between data storage, inter-node communication, computing load, and cluster configuration parameters. It is demonstrated that combining Hadoop's distributed storage mechanisms with Spark's in-memory processing reduces the impact of disk I/O, improves resilience to increasing load, and ensures a more balanced use of memory and CPU resources.
These results confirm that the hybrid architecture is a promising solution for building scalable analytics platforms designed to process heterogeneous data under variable and intensive workloads. The practical significance of this study lies in the potential use of the proposed models in the design and configuration of regional and enterprise big data analytics systems.