QUANTUM-ASSISTED FEATURE SELECTION FOR IMPROVING PREDICTION MODEL ACCURACY ON LARGE AND IMBALANCED DATASETS

Authors

  • Safii Safii STIKOM TUNAS BANGSA
  • Mochamad Wahyudi Universitas Bina Sarana Infomatika
  • Dedy Hartama STIKOM Tunas Bangsa

DOI:

https://doi.org/10.33480/jitk.v11i2.7040

Keywords:

feature selection , prediction , quantum , quadratic unconstrained binary optimization (QUBO) , random forest

Abstract

One of the biggest obstacles to creating precise machine learning models is choosing representative and pertinent characteristics from big, unbalanced datasets. While too many features raise the risk of overfitting and computational expense, class imbalance frequently results in decreased accuracy and bias. The Simulated Annealing technique is used in this study to tackle a Quadratic Unconstrained Binary Optimization (QUBO) problem that is formulated as a quantum-assisted feature selection method to handle these problems. The technique seeks to reduce inter-feature redundancy and the number of selected features. There are 102,487 samples in the majority class and 11,239 in the minority class, totaling 28 characteristics in the experimental dataset. Nine ideal features were found during the feature selection method (12, 14, 15, 22, 23, 24, 25, 27, and 28). Ten-fold cross-validation was used to assess a Random Forest Classifier that was trained using an 80:20 split. With precision, recall, f1-score, and accuracy all hitting 1.00, the suggested QUBO+SMOTE method demonstrated exceptional performance. Comparatively, QUBO without SMOTE performed worse with accuracy 0.95 and minority-class f1-score of only 0.71, whereas a traditional Recursive Feature Elimination (RFE) approach obtained accuracy 0.97 with minority-class f1-score of 0.94. These findings indicate that QUBO can reduce dimensionality and address class imbalance which requires its integration with SMOTE. This study demonstrates how quantum computing can enhance the effectiveness and efficiency of machine learning, especially for large-scale imbalanced datasets

Downloads

Download data is not yet available.

References

W. Chen, K. Yang, Z. Yu, Y. Shi, and C. L. P. Chen, “A survey on imbalanced learning: latest research, applications and future directions,” Artif. Intell. Rev., vol. 57, no. 6, p. 137, 2024, doi: 10.1007/s10462-024-10759-6.

M. Altalhan, A. Algarni, and T. Monia, “Imbalanced Data Problem in Machine Learning: A Review,” IEEE Access, vol. PP, p. 1, Jan. 2025, doi: 10.1109/ACCESS.2025.3531662.

S. Matharaarachchi, M. Domaratzki, and S. Muthukumarana, “Enhancing SMOTE for imbalanced data with abnormal minority instances,” Mach. Learn. with Appl., vol. 18, p. 100597, 2024, doi: https://doi.org/10.1016/j.mlwa.2024.100597.

D. Coelho et al., “Leveraging Feature Extraction to Perform Time-Efficient Selection for Machine Learning Applications,” Applied Sciences, vol. 15, no. 15. 2025. doi: 10.3390/app15158196.

X. Cheng, “A Comprehensive Study of Feature Selection Techniques in Machine Learning Models,” Insights Comput. Signals Syst., vol. 1, pp. 65–78, Nov. 2024, doi: 10.70088/xpf2b276.

S. Rawat and A. Mishra, Review of Methods for Handling Class-Imbalanced in Classification Problems. 2022. doi: 10.48550/arXiv.2211.05456.

K. Rajwar, K. Deep, and S. Das, “An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges,” Artif. Intell. Rev., vol. 56, no. 11, pp. 13187–13257, 2023, doi: 10.1007/s10462-023-10470-y.

S. Almufti, A. Shaban, R. Ali, and J. Fuente, “Overview of Metaheuristic Algorithms,” Polaris Glob. J. Sch. Res. Trends, vol. 2, pp. 10–32, Apr. 2023, doi: 10.58429/pgjsrt.v2n2a144.

K. Reddy and A. K. Saha, “A review of swarm-based metaheuristic optimization techniques and their application to doubly fed induction generator,” Heliyon, vol. 8, no. 10, p. e10956, 2022, doi: https://doi.org/10.1016/j.heliyon.2022.e10956.

H. Rezk, A. Ghani Olabi, T. Wilberforce, and E. Taha Sayed, “Metaheuristic optimization algorithms for real-world electrical and civil engineering application: A review,” Results Eng., vol. 23, p. 102437, 2024, doi: https://doi.org/10.1016/j.rineng.2024.102437.

S. Mücke, R. Heese, S. Müller, M. Wolter, and N. Piatkowski, “Feature selection on quantum computers,” Quantum Mach. Intell., vol. 5, no. 1, p. 11, 2023, doi: 10.1007/s42484-023-00099-z.

I. Turkalj et al., “Quadratic Unconstrained Binary Optimization Approach for Incorporating Solvency Capital into Portfolio Optimization,” Risks, vol. 12, p. 23, Jan. 2024, doi: 10.3390/risks12020023.

G. Hellstern, V. Dehn, and M. Zaefferer, Quantum computer based Feature Selection in Machine Learning. 2023. doi: 10.48550/arXiv.2306.10591.

D. Von Dollen, F. Neukart, D. Weimer, and T. Bäck, “Predicting vehicle prices via quantum-assisted feature selection,” Int. J. Inf. Technol., vol. 15, Jul. 2023, doi: 10.1007/s41870-023-01370-z.

Y. Zhang, L. Deng, and B. Wei, “Imbalanced Data Classification Based on Improved Random-SMOTE and Feature Standard Deviation,” Mathematics, vol. 12, no. 11. 2024. doi: 10.3390/math12111709.

A. A. Khan, O. Chaudhari, and R. Chandra, “A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation,” Expert Syst. Appl., vol. 244, p. 122778, 2024, doi: https://doi.org/10.1016/j.eswa.2023.122778.

S. Kwon, J. Huh, S. J. Kwon, S. Choi, and O. Kwon, “Leveraging Quantum Machine Learning to Address Class Imbalance: A Novel Approach for Enhanced Predictive Accuracy,” Symmetry, vol. 17, no. 2. 2025. doi: 10.3390/sym17020186.

S. Kim, S.-W. Ahn, I.-S. Suh, A. W. Dowling, E. Lee, and T. Luo, “Quantum annealing for combinatorial optimization: a benchmarking study,” npj Quantum Inf., vol. 11, no. 1, p. 77, 2025, doi: 10.1038/s41534-025-01020-1.

T. Guilmeau, E. Chouzenoux, and V. Elvira, Simulated Annealing: a Review and a New Scheme. 2021. doi: 10.1109/SSP49050.2021.9513782.

K. Mei, M. Tan, Z. Yang, and S. Shi, “Modeling of Feature Selection Based on Random Forest Algorithm and Pearson Correlation Coefficient,” J. Phys. Conf. Ser., vol. 2219, p. 12046, Apr. 2022, doi: 10.1088/1742-6596/2219/1/012046.

D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach. Learn., vol. 113, no. 7, pp. 4903–4923, 2024, doi: 10.1007/s10994-022-06296-4.

T. G.S., Y. Hariprasad, S. S. Iyengar, N. R. Sunitha, P. Badrinath, and S. Chennupati, “An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets,” Mach. Learn. with Appl., vol. 8, p. 100267, 2022, doi: https://doi.org/10.1016/j.mlwa.2022.100267.

Downloads

Published

2025-11-27

How to Cite

[1]
S. Safii, M. Wahyudi, and D. Hartama, “QUANTUM-ASSISTED FEATURE SELECTION FOR IMPROVING PREDICTION MODEL ACCURACY ON LARGE AND IMBALANCED DATASETS”, jitk, vol. 11, no. 2, pp. 520–527, Nov. 2025.

Most read articles by the same author(s)