HYBRID RESAMPLING METHOD AND HYPERPARAMETER OPTIMIZATION FOR HIV/AIDS PREDICTION: EVIDENCE FROM EIGHT MACHINE-LEARNING MODELS

Lydia Nur Sa'adah; Fatkhurokhman Fauzi; Prizka Rismawati Arum; M Al Haris; Yan Nazala Bisoumi

doi:10.33480/jitk.v11i4.7533

Authors

Lydia Nur Sa'adah Universitas Muhammadiyah Semarang
Fatkhurokhman Fauzi Universitas Muhammadiyah Semarang
Prizka Rismawati Arum Universitas Muhammadiyah Semarang
M Al Haris Universitas Muhammadiyah Semarang
Yan Nazala Bisoumi Universitas Muhammadiyah Semarang

DOI:

https://doi.org/10.33480/jitk.v11i4.7533

Keywords:

HIV/AIDS Prediction, Machine Learning Algorithm, SMOTE-ENN Method

Abstract

HIV/AIDS remains a global health challenge with continuously increasing infection rates, highlighting the importance of accurate prediction models to support prevention and early detection. However, the development of such models is often constrained by class imbalance and irrelevant features. This study aims to improve HIV/AIDS infection prediction by integrating feature selection, data balancing techniques, and eight machine learning algorithms. Feature selection was performed using Mutual Information and Chi-Square to identify the most relevant features. The dataset used was the HIV/AIDS Infection Prediction Dataset from Kaggle, consisting of 2,139 instances and 23 features, with an imbalanced distribution of 1,618 non-infected and 521 infected cases. The dataset was divided into 80% training data and 20% testing data, with resampling applied only to the training set to prevent data leakage. Three resampling scenarios were evaluated: no sampling, SMOTE, and SMOTE-ENN. Hyperparameter tuning was conducted using Bayesian Optimization integrated with 5-fold Cross-Validation to improve model robustness and reliability. Eight machine learning algorithms were evaluated, including Decision Tree, Random Forest, AdaBoost, Gradient Boosting, XGBoost, LightGBM, K-Nearest Neighbors, and Logistic Regression. The results show that SMOTE-ENN combined with hyperparameter optimization significantly improved model performance. The best model, Gradient Boosting + SMOTE-ENN, achieved 96.1% accuracy, 94.8% precision, 98.4% recall, and 96.5% F1-score. These findings indicate that the proposed integrated framework is highly effective for predicting HIV/AIDS infection and has strong potential to support early diagnosis and data-driven decision-making in healthcare.

Downloads

Download data is not yet available.

References

[1] A. Aryani, Widiyono, and A. Anitasari, “Gambaran Pengetahuan Remaja Tentang Penyakit Hiv/Aids,” J. Ilmu Keperawatan, vol. 14, no. 2, pp. 44–50, 2021, doi: https://doi.org/10.47942/jiki.v14i2.794.

[2] World Health Organization, “HIV/AIDS Fact Sheets,” WHO, 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/hiv-aids. [Accessed: October 25, 2025].

[3] Kementerian Kesehatan, “HIV/AIDS,” Kemenkes, 2023. [Online]. Available: https://ayosehat.kemkes.go.id/topik-penyakit/hivaids--ims/hiv. [Accessed: October 26, 2025].

[4] Kurniawati Yenni, “Pengaruh Tingkat Pendidikan Dengan Kejadian HIV/AIDS,” J. Bidan Pint., vol. 3, no. 2, 2022.

[5] D. P. Sinambela, H. Naparin, M. Zulfadhilah, and N. Hidayah, “Implementasi Algoritma Decision Tree dan Random Forest dalam Prediksi Perdarahan Pascasalin,” J. Inf. dan Teknol., vol. 5, no. 3, pp. 58–64, Sep. 2023, doi: 10.60083/jidt.v5i3.393.

[6] A. M. A. Rahim, A. Ridwan, B. P. Hartato, and F. Asharudin, “Machine Learning-Based Approach for HIV/AIDS Prediction: Feature Selection and Data Balancing Strategy,” J. Appl. Informatics Comput., vol. 9, no. 2, pp. 338–347, Mar. 2025, doi: 10.30871/jaic.v9i2.9125.

[7] Z. M. Kusumaadhi, N. Farhanah, and M. A. Udji Sofro, “Risk Factors for Mortality among HIV/AIDS Patients,” Diponegoro Int. Med. J., vol. 2, no. 1, pp. 20–19, Mar. 2021, doi: 10.14710/dimj.v2i1.9667.

[8] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A Study of The Behavior of Several Methods for Balancing Machine Learning Training Data,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, Jun. 2024, doi: 10.1145/1007730.1007735.

[9] T. Liao, H. Chen, C. Song, C. Huang, Y. Wu, and Z. Xu, “Using machine learning models to predict the duration of the recovery of COVID- patients hospitalized in Fangcang shelter hospital during the Omicron BA .,” Front. Med, vol. 9, no. 2, 2022, doi: https://doi.org/10.3389/fmed.2022.1001801.

[10] M. Izraiq et al., “Impact of Diabetes Mellitus on Heart Failure Patients : Insights from a Comprehensive Analysis and Machine Learning Model Using the Jordanian Heart Failure Registry [Corrigendum],” International Journal of General Medicine, vol. 17, pp. 3371–3372, 2024, doi: 10.2147/IJGM.S487404.

[11] R. S. Abdulsadig and E. Rodriguez-villegas, “imbalance mitigation when physiological signals,” no. March, pp. 1–11, 2024, doi: 10.3389/fdgth.2024.1377165.

[12] G. Husain et al., “SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models,” Algorithms, vol. 18, no. 1, p. 37, Jan. 2025, doi: 10.3390/a18010037.

[13] N. K. Majid, C. Supriyanto, and A. Marjuni, “Peningkatan Keberagaman Data untuk Klasifikasi Penyakit Diabetes Berbasis Stacking Ensemble Learning,” J. Inform. J. Pengemb. IT, vol. 10, no. 1, pp. 1–10, 2025, doi: 10.30591/jpit.v10i1.7375.

[14] A. Puri and M. Kumar Gupta, “Improved Hybrid Bag-Boost Ensemble With K-Means-SMOTE–ENN Technique for Handling Noisy Class Imbalanced Data,” Comput. J., vol. 65, no. 1, pp. 124–138, Jan. 2022, doi: 10.1093/comjnl/bxab039.

[15] I. Pratama et al., “Seleksi Fitur dan Penanganan Imbalanced Data menggunakan RFECV dan ADASYN,” pp. 38–49, 2021, doi: 10.30864/eksplora.v11i1.578.

[16] I. Maulana and S. Ernawati, “Meningkatkan Klasifikasi Penyakit Diabetes Menggunakan Metode Ensemble Softvoting Dengan SMOTE-ENN dan Optimasi Bayesian,” vol. 13, no. 1, pp. 71–86, 2025, doi: https://doi.org/10.31294/evolusi.v13i1.8267.

[17] A. S. Fatih Gurcan, “Learning from Imbalanced Data : Integration of Advanced Resampling Techniques and Machine Learning Models for,” Cancer Res. Care, vol. 16, no. 19, 2024, doi: https://doi.org/10.3390/cancers16193417.

[18] A. M. A. Rahim, Inggrid Yanuar Risca Pratiwi, and Muhammad Ainul Fikri, “Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique Dan Random Forest Clasifier,” Indones. J. Comput. Sci., vol. 12, no. 5, Nov. 2023, doi: 10.33022/ijcs.v12i5.3413.

[19] E. Saputra and E. R. Susanto, “Implementation of Deep Learning with Multilayer Perceptron ( MLP ) for Heart Disease Prediction Using the SMOTE-ENN Technique,” vol. 9, no. 3, pp. 1034–1041, 2025, doi: doi: 10.1007/s11053-021-09973-8.

[20] I. Saputra, “Pengkategorian Data Angket Mahasiswa Dengan Mutual Information Dan K-Nearest Neighbor,” Pros. Seniati, no. 1, pp. 28–35, 2019, doi: https://doi.org/10.36040/seniati.v5i1.320.

[21] T. Ernayanti, M. Mustafid, A. Rusgiyono, and A. R. Hakim, “Penggunaan Seleksi Fitur Chi-Square dan Algoritma Multinomial Naive Bayes untuk Analisis Sentimen Pelanggan Tokopedia,” J. Gaussian, vol. 11, no. 4, pp. 562–571, Feb. 2023, doi: 10.14710/j.gauss.11.4.562-571.

[22] J. Pardede and D. P. Pamungkas, “The Impact of Balanced Data Techniques on Classification Model Performance,” vol. 11, no. 2, pp. 401–412, 2024, doi: 10.15294/sji.v11i2.3649.

[23] C. W. Oei et al., “Explainable Risk Prediction of Post-Stroke Adverse Mental Outcomes Using Machine Learning Techniques in a Population of 1780 Patients,” vol. 23, no. 18, pp. 1–12, 2023, doi: https://doi.org/10.3390/s23187946.

[24] M. A. Amou, K. Xia, S. Kamhi, and M. Mouhafid, “A Novel MRI Diagnosis Method for Brain Tumor Classification Based on CNN and Bayesian Optimization,” vol. 10, no. 3, pp. 1–21, 2022, doi: https://doi.org/10.3390/healthcare10030494.

[25] I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Comput. Sci., vol. 2, no. 3, p. 160, May 2021, doi: 10.1007/s42979-021-00592-x.

HYBRID RESAMPLING METHOD AND HYPERPARAMETER OPTIMIZATION FOR HIV/AIDS PREDICTION: EVIDENCE FROM EIGHT MACHINE-LEARNING MODELS

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Information

statistikblok

menutama

indexing

Open Access

Indexing JITK