EVALUATING LOGISTIC REGRESSION, SVM, KNN, AND ENSEMBLE MODELS FOR ACCURATE HEART DISEASE RISK PREDICTION

Amalia Shifa Aldila; Lawrence Supriyono

doi:10.33480/jitk.v11i3.6738

Authors

Amalia Shifa Aldila Universitas Jakarta Internasional
Lawrence Supriyono Universitas Jakarta Internasional

DOI:

https://doi.org/10.33480/jitk.v11i3.6738

Keywords:

Classification Algorithms, Cardiovascular Disease, Prediction, Supervised Learning

Abstract

Cardiovascular disease remains the most significant contributor to global mortality, highlighting the importance of early and precise risk assessment within preventive healthcare frameworks. Alongside the rapid growth of clinical data availability, machine learning approaches have increasingly been adopted to assist medical decision-making, particularly for interpreting complex and high-dimensional health information. This research investigates the predictive capability of six supervised machine learning models in determining the likelihood of cardiovascular disease incidence: Logistic Regression, Support Vector Machine, k-Nearest Neighbors, Decision Tree, Random Forest, and Gradient Boosting. The Cleveland Heart Disease dataset from the UCI Machine Learning Repository served as the study's foundation. It includes 303 patient samples with a total of 76 recorded attributes. From this dataset, 14 clinically significant variables frequently reported in previous studies were selected for analysis. Considering the relatively small dataset size and the possibility of redundant or low-impact features, a feature selection approach was implemented to improve model robustness, minimize overfitting, and enhance interpretability. The data preparation process involved cleaning, normalization, feature selection, and division into datasets for testing and training. Metrics like accuracy, precision, recall, and F1-score were used to evaluate the model. The results of the experiment show that Random Forest and Logistic Regression models produced the highest predictive performance, followed by k-Nearest Neighbours and Support Vector Machine. These results indicate that supervised machine learning techniques, when supported by appropriate feature selection methods, are effective as decision-support tools for the early detection of cardiovascular disease.

Downloads

Download data is not yet available.

References

[1] World Health Organization, “Cardiovascular Diseases (CVDs),” World Health Organization, 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

[2] S. Subramani et al., “Cardiovascular diseases prediction by machine learning incorporation with deep learning,” Front. Med. (Lausanne)., vol. 10, 2023, doi: 10.3389/fmed.2023.1150933.

[3] A. A. Almazroi, E. A. Aldhahri, S. Bashir, and S. Ashfaq, “A Clinical Decision Support System for Heart Disease Prediction Using Deep Learning,” IEEE Access, vol. 11, pp. 61646–61659, 2023, doi: 10.1109/ACCESS.2023.3285247.

[4] M. A. Naser, A. A. Majeed, M. Alsabah, T. R. Al-Shaikhli, and K. M. Kaky, “A Review of Machine Learning’s Role in Cardiovascular Disease Prediction: Recent Advances and Future Challenges,” Feb. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/a17020078.

[5] D. Bertsimas, L. Mingardi, and B. Stellato, “Machine Learning for Real-Time Heart Disease Prediction,” IEEE J. Biomed. Health Inform., vol. 25, no. 9, pp. 3627–3637, Sep. 2021, doi: 10.1109/JBHI.2021.3066347.

[6] H. Heidari, G. Hellstern, and M. Murugappan, “Heart Disease Detection using Quantum Computing and Partitioned Random Forest Methods,” 2022.

[7] C. D. Fernando, P. T. Weerasinghe, and C. K. Walgampaya, “Heart Disease Risk Identification using Machine Learning Techniques for a Highly Imbalanced Dataset: a Comparative Study,” KDU Journal of Multidisciplinary Studies, vol. 4, no. 2, pp. 43–55, Dec. 2022, doi: 10.4038/kjms.v4i2.50.

[8] A. Rahim, Y. Rasheed, F. Azam, M. W. Anwar, M. A. Rahim, and A. W. Muzaffar, “An Integrated Machine Learning Framework for Effective Prediction of Cardiovascular Diseases,” IEEE Access, vol. 9, pp. 106575–106588, 2021, doi: 10.1109/ACCESS.2021.3098688.

[9] UCI, “Heart Disease Dataset”, 2021. [Online].Available: https://archive.ics.uci.edu/ml/datasets/heart+disease.

[10] KEEL Dataset Repository, “Cleveland Dataset”, 2021. [Online]. Available: https://sci2s.ugr.es/keel/dataset.php?cod=57

[11] A. Tiwari, A. Chugh, and A. Sharma, “Ensemble framework for cardiovascular disease prediction,” Comput. Biol. Med., vol. 146, Art. no. 105624, 2022, doi: 10.1016/j.compbiomed.2022.105624.

[12] J. Pirgazi, A. Ghanbari Sorkhi, and M. Iranpour Mobarkeh, “An Accurate Heart Disease Prognosis Using Machine Intelligence and IoMT,” Wirel. Commun. Mob. Comput., vol. 2022, 2022, doi: 10.1155/2022/9060340.

[13] C. Zhang, J. Chang, Y. Guan, Q. Li, X. Wang, and X. Zhang, “A Low-Power ECG Processor ASIC Based on an Artificial Neural Network for Arrhythmia Detection,” Applied Sciences (Switzerland), vol. 13, no. 17, Sep. 2023, doi: 10.3390/app13179591.

[14] P. Shinde, M. Sanghavi, and T. A. Tran, “A Survey on Machine Learning Techniques for Heart Disease Prediction,” SN Comput. Sci., vol. 6, no. 4, Apr. 2025, doi: 10.1007/s42979-025-03860-2.

[15] A. M. Paciorek et al., “Automated assessment of cardiac pathologies on cardiac MRI using T1-mapping and late gadolinium phase sensitive inversion recovery sequences with deep learning,” BMC Med. Imaging, vol. 24, no. 1, Dec. 2024, doi: 10.1186/s12880-024-01217-4.

[16] T. Polidori et al., “Radiomics applications in cardiac imaging: a comprehensive review,” Radiologia Medica, vol. 128, no. 8, pp. 922–933, Aug. 2023, doi: 10.1007/s11547-023-01658-x.

[17] A. Almulihi et al., “Ensemble Learning Based on Hybrid Deep Learning Model for Heart Disease Early Prediction,” Diagnostics, vol. 12, no. 12, Dec. 2022, doi: 10.3390/diagnostics12123215.

[18] C. A. ul Hassan et al., “Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers,” Sensors, vol. 22, no. 19, Oct. 2022, doi: 10.3390/s22197227.

[19] S. P. Knight et al., “Cardiovascular Signal Entropy Predicts All-Cause Mortality: Evidence from The Irish Longitudinal Study on Ageing (TILDA),” Entropy, vol. 24, no. 5, May 2022, doi: 10.3390/e24050676.

[20] Y. Tian, I. Luthra, and X. Zhang, “Forecasting COVID-19 cases using Machine Learning models,” Jul. 04, 2020. doi: 10.1101/2020.07.02.20145474.

[21] P. Shrivastava, S. Kashikar, P. H. Parihar, P. Kasat, P. Bhangale, and P. Shrivastava, “A systematic review on deep learning-enabled coronary CT angiography for plaque and stenosis quantification and cardiac risk prediction,” Eur. J. Radiol. Open, vol. 14, Jun. 2025, doi: 10.1016/j.ejro.2025.100652.

[22] N. Kallus, B. Pennicooke, and M. Santacatterina, “More robust estimation of average treatment effects using kernel optimal matching in an observational study of spine surgical interventions,” Stat. Med., vol. 40, no. 10, pp. 2305–2320, May 2021, doi: 10.1002/sim.8904.

[23] T. Hočevar, B. Zupan, and J. Stålring, “Conformal prediction with orange,” J. Stat. Softw., vol. 98, 2021, doi: 10.18637/jss.v098.i07.

[24] V. A. Sangolgi, M. B. Patil, S. S. Vidap, S. S. Doijode, S. Y. Mulmane, and A. S. Vadaje, “Enhancing Cross-Linguistic Image Caption Generation with Indian Multilingual Voice Interfaces using Deep Learning Techniques,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 547–557.doi: 10.1016/j.procs.2024.03.244.

[25] C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” Artif. Intell. Rev., vol. 54, no. 3, pp. 1937–1967, Mar. 2021, doi: 10.1007/s10462-020-09896-5.

EVALUATING LOGISTIC REGRESSION, SVM, KNN, AND ENSEMBLE MODELS FOR ACCURATE HEART DISEASE RISK PREDICTION

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Information

statistikblok

menutama

indexing

Open Access

Indexing JITK