In the context of the rapid growth of data volumes, the problem of class imbalance has become one of the key challenges in classification tasks, significantly reducing the accuracy and generalization ability of machine learning models. The aim of this study is to improve classification accuracy on imbalanced datasets through the development and application of a hybrid model that combines data preprocessing techniques and ensemble learning methods. To achieve this goal, existing approaches to handling class imbalance were analyzed, including resampling techniques (oversampling and undersampling), cost-sensitive learning, and modern ensemble strategies.
The research methodology is based on the integration of synthetic data generation with gradient boosting and random forest algorithms. This approach enhances sensitivity to the minority class while maintaining model robustness against overfitting. The proposed hybrid model was evaluated on several open-source and applied datasets with varying degrees of class imbalance. The performance assessment was conducted using metrics suitable for imbalanced data, including F1-score, balanced accuracy and other.
The obtained results demonstrate a statistically significant improvement in classification performance compared to baseline models, especially in detecting the minority class. The scientific significance of the study lies in the development of a reproducible approach to improving classification effectiveness under class imbalance conditions, thereby expanding the applicability of machine learning methods in domains such as healthcare, finance, and risk analysis.