PREDICTION OF HAJJ PILGRIMS' HEALTH RISK USING K-NN, DECISION TREE, CROSS VALIDATION, AND SMOTE

Authors

  • Widi Astuti 085866669076
  • Fajar Sarasati Universitas Nusa Mandiri

DOI:

https://doi.org/10.33480/techno.v22i1.6367

Keywords:

hajj pilgrims, health risk prediction, SMOTE

Abstract

The background of this study is predicting the health risk levels of hajj pilgrims, which is a significant challenge in improving healthcare services during the pilgrimage. This research contributes by systematically evaluating several machine learning techniques and applying SMOTE to balance the dataset, as opposed to previous studies that relied on single-model classification approaches. The data analyzed includes 5,000 health records of pilgrims, covering various attributes such as age, gender, medical history, and disease diagnosis, sourced from the Siskohat database of the Directorate General of Hajj and Umrah Management. The results show that Cross-Validation (Logistic Regression) achieved the highest accuracy (87.9%) after applying SMOTE, outperforming Decision Tree (86.4%) and K-NN (83.1%). These findings highlight that SMOTE significantly enhances recall, ensuring better identification of high-risk patients. The implications of these results contribute to hajj health management by providing a robust predictive framework that improves early risk detection and medical resource allocation, while also demonstrating a novel approach to handling imbalanced healthcare datasets.

References

Abubakar, R. (2021). Pengantar Metodologi Kesehatan. In Kesehatan (Issue November).

Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake News Detection Using Machine Learning Ensemble Methods. Complexity, 2020. https://doi.org/10.1155/2020/8885861

Apriliah, W., Kurniawan, I., Baydhowi, M., & Haryati, T. (2021). Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest. Sistemasi, 10(1), 163. https://doi.org/10.32520/stmsi.v10i1.1129

Ardi Ramdani, Christian Dwi Sofyan, Fauzi Ramdani, Muhamad Fauzi Arya Tama, & Muhammad Angga Rachmatsyah. (2022). Algoritma Klasifikasi Data Mining Untuk Memprediksi Masyarakat Dalam Menerima Bantuan Sosial. Jurnal Ilmiah Sistem Informasi, 1(2), 39–47. https://doi.org/10.51903/juisi.v1i2.363

Attamami, N., Triayudi, A., & Aldisa, R. T. (2023). Analisis Performa Algoritma Klasifikasi Naive Bayes dan C4.5 untuk Prediksi Penerima Bantuan Jaminan Kesehatan. Jurnal JTIK (Jurnal Teknologi Informasi Dan Komunikasi), 7(2), 262–269. https://doi.org/10.35870/jtik.v7i2.756

Brandt, J., & Lanzén, E. (2020). A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification. 2021‏, 42. https://www.diva-portal.org/smash/record.jsf?pid=diva2:1519153

Dablain, D., Krawczyk, B., & Chawla, N. V. (2023). DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data. IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6390–6404. https://doi.org/10.1109/TNNLS.2021.3136503

Dharmawan, W. S. (2021). I N F O R M a T I K a Dalam Prediksi Penyakit Jantung. Jurnal Informatika, Manajemen Dan Komputer, 13(2), 31–41.

Efrizoni, L., Defit, S., Tajuddin, M., & Anggrawan, A. (2022). Komparasi Ekstraksi Fitur dalam Klasifikasi Teks Multilabel Menggunakan Algoritma Machine Learning. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(3), 653–666. https://doi.org/10.30812/matrik.v21i3.1851

Fitrianah, D., Gunawan, W., & Puspita Sari, A. (2022). Studi Komparasi Algoritma Klasifikasi C5.0, SVM dan Naive Bayes dengan Studi Kasus Prediksi Banjir Comparative Study of Classification Algorithm between C5.0, SVM and Naive Bayes with Case Study of Flood Prediction. Februari, 21(1), 1–11.

Furthermore, L. et al. (2023). Development and External Validation of a Machine Learning–based Fall Prediction Model for Nursing Home Residents: A Prospective Cohort Study. https://www.sciencedirect.com/science/article/abs/pii/S1525861024005917

Gumelar, G., Ain, Q., Marsuciati, R., Agustanti Bambang, S., Sunyoto, A., & Syukri Mustafa, M. (2021). Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan Performa Klasifikasi Dataset Imbalance. SISFOTEK : Sistem Informasi Dan Teknologi, 250–255.

Kiramy, R. Al, Permana, I., & Marsal, A. (2024). Comparison of RNN and LSTM Algorithm Performance in Predicting the Number of Umrah Pilgrims at PT . Hajar Aswad Perbandingan Performa Algoritma RNN dan LSTM dalam Prediksi Jumlah Jamaah Umrah pada PT . Hajar Aswad. 4(October), 1224–1234.

Li, M., Jiang, Y., Zhang, Y., & Zhu, H. (2023). Medical image analysis using deep learning algorithms. Frontiers in Public Health, 11(November), 1–28. https://doi.org/10.3389/fpubh.2023.1273253

Nugroho, A., & Religia, Y. (2021). Analisis Optimasi Algoritma Klasifikasi Naive Bayes menggunakan Genetic Algorithm dan Bagging. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(3), 504–510. https://doi.org/10.29207/resti.v5i3.3067

Putri, N. B., & Wijayanto, A. W. (2022). Analisis Komparasi Algoritma Klasifikasi Data Mining Dalam Klasifikasi Website Phishing. Komputika : Jurnal Sistem Komputer, 11(1), 59–66. https://doi.org/10.34010/komputika.v11i1.4350

Rahman, F. Y., Purnomo, I. I., & Hijriana, N. (2022). Penerapan Algoritma Data Mining Untuk Klasifikasi Kualitas Air. Technologia : Jurnal Ilmiah, 13(3), 228. https://doi.org/10.31602/tji.v13i3.7070

Ramadhan, N. G. (2021). Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus. Scientific Journal of Informatics, 8(2), 276–282. https://doi.org/10.15294/sji.v8i2.32484

Safitri, D., Hilabi, S. S., & Nurapriani, F. (2023). Analisis Penggunaan Algoritma Klasifikasi Dalam Prediksi Kelulusan Menggunakan Orange Data Mining. Rabit : Jurnal Teknologi Dan Sistem Informasi Univrab, 8(1), 75–81. https://doi.org/10.36341/rabit.v8i1.3009

Salma, A. (2023). Analysis on Symptoms Driven Disease Risk Assessment using Artificial Intelligence Approach. https://ieeexplore.ieee.org/abstract/document/10522221

Sepharni, A., Hendrawan, I. E., & Rozikin, C. (2022). Klasifikasi Penyakit Jantung dengan Menggunakan Algoritma C4.5. STRING (Satuan Tulisan Riset Dan Inovasi Teknologi), 7(2), 117. https://doi.org/10.30998/string.v7i2.12012

Smith, K., Fernie, S., & Pilcher, N. (2021). Aligning the times: Exploring the convergence of researchers, policy makers and research evidence in higher education policy making. Research in Education, 110(1), 38–57. https://doi.org/10.1177/0034523720920677

Syahputra, S., Hasibuan, M. S., Komputer, J. I., Islam, U., Sumatera, N., Perjalanan, B., & Bayes, N. (2024). Analisis Sentimen Jamaah Umrah Di Media Sosial X Menggunakan Algoritma Naive Bayes. 19(x), 107–116.

Downloads

Published

2025-03-14

How to Cite

Astuti, W., & Sarasati, F. (2025). PREDICTION OF HAJJ PILGRIMS’ HEALTH RISK USING K-NN, DECISION TREE, CROSS VALIDATION, AND SMOTE. Jurnal Techno Nusa Mandiri, 22(1), 35–43. https://doi.org/10.33480/techno.v22i1.6367

Most read articles by the same author(s)