IMPLEMENTATION MEAN IMPUTATION AND OUTLIER DETECTION FOR LOAN PREDICTION USING THE RANDOM FOREST ALGORITHM
DOI:
https://doi.org/10.33480/jitk.v10i4.6437Keywords:
Accuracy, loan prediction, pre-processing, AdaBoost Classifier, Heart Failure, Machine Learning, Random Forest, ClassificationAbstract
Loans and credit are among the most in-demand banking products, making accurate loan prediction systems essential for minimizing bank credit risks and boosting profitability. This study proposed a loan prediction model using the Random Forest algorithm, with mean imputation and 3 outlier detection (Boxplot, Z-score, and Interquartile Range (IQR)) as data pre-processing methods. Using Lending Club loan data from 2014-2021 (466,285 records, split 70/30 for training/testing), model performance was assessed using accuracy, recall, and F1 Score. The proposed approach achieved a 95% prediction accuracy, outperforming previous models at 83%. The best results were obtained using mean imputation with IQR-based outlier detection. However, the determination of the mean imputation mean can be a limitation of this study. This highlights the importance of thorough pre-processing in enhancing prediction accuracy. The study underscores the role of machine learning and financial technology (fintech) in informing credit decisions and support incorporating imputation and outlier handling as standard steps in financial modeling pipeline
Downloads
References
E. Yudisthira and M. Barthos, “Key Factors and Legal Obstacles in Banking Loan Approval,” European Alliance for Innovation n.o., Feb. 2022. doi: 10.4108/eai.30-10-2021.2315743.
S. Alvionita, “Sistem Informasi Pengajuan Pinjaman Kredit Usaha Rakyat (KUR) Pada Bank Rakyat Indonesia (BRI) Unit Sukarame,” Ilmudata.org, vol. 2, no. 2, pp. 1–13, 2022.
H. Haeruddin, E. Erick, and H. W. Aripradono, “Perbandingan Support Vector Machine, Random Forest Classifier, dan K-Nearest Neighbour dalam Pendeteksian Anomali pada Jaringan DDos,” JTIM : Jurnal Teknologi Informasi dan Multimedia, vol. 7, no. 1, pp. 23–33, Jan. 2025, doi: 10.35746/jtim.v7i1.628.
B. Yi, “P2P Investment Data Analytics: A Case Study of Lending Club,” 2023.
P. Lalwani, M. K. Mishra, J. S. Chadha, and P. Sethi, “Customer churn prediction system: a machine learning approach,” Computing, vol. 104, no. 2, pp. 271–294, Feb. 2022, doi: 10.1007/s00607-021-00908-y.
X. Dong, “Loan Default Prediction based on Machine Learning (LightGBM Model),” 2022.
R Nancy Deborah, S Alwyn Rajiv, A Vinora, C Manjula Devi, S Mohammed Arif, and G S Mohammed Arif, “An Efficient Loan Approval Status Prediction Using Machine Learning,” in 2023 International Conference on Advanced Computing Technologies and Applications (ICACTA), Mumbai, India: IEEE Xplor, Oct. 2023.
M. Madaan, A. Kumar, C. Keshri, R. Jain, and P. Nagrath, “Loan default prediction using decision trees and random forest: A comparative study,” IOP Conf Ser Mater Sci Eng, vol. 1022, no. 1, 2021, doi: 10.1088/1757-899X/1022/1/012042.
B. Prasojo and E. Haryatmi, “Analisa Prediksi Kelayakan Pemberian Kredit Pinjaman dengan Metode Random Forest,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 7, no. 2, pp. 79–89, 2021, doi: 10.25077/teknosi.v7i2.2021.79-89.
K. B. Simarmata, K. D. Hartomo, and K. D. Hartomo, “Analisa Rekomendasi Fitur Persetujuan Pinjaman Perusahaan Financial Technology Menggunakan Metode Random Forest,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 9, no. 3, pp. 2055–2070, 2022, doi: 10.35957/jatisi.v9i3.2258.
A. Mirzaei, S. R. Carter, A. E. Patanwala, and C. R. Schneider, “Missing data in surveys: Key concepts, approaches, and applications,” Research in Social and Administrative Pharmacy, vol. 18, no. 2, pp. 2308–2316, 2022, doi: 10.1016/j.sapharm.2021.03.009.
A. Desiani, N. R. Dewi, A. N. Fauza, N. Rachmatullah, M. Arhami, and M. Nawawi, “Handling Missing Data Using Combination of Deletion Technique, Mean, Mode and Artificial Neural Network Imputation for Heart Disease Dataset,” Science and Technology Indonesia, vol. 6, no. 4, pp. 303–312, 2021, doi: 10.26554/sti.2021.6.4.303-312.
H. Nugroho, N. Priya utama, and K. Surendro, “kNN Imputation Versus Mean Imputation for Handling Missing Data on Vulnerability Index in Dealing with Covid-19 in Indonesia,” in The 2023 12th International Conference on Software and Computer Applications, Kuantan, Malaysia, Feb. 2023, pp. 20–25.
L. O. Joel, W. Doorsamy, and B. S. Paul, “On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets,” pp. 1–20, 2024, [Online]. Available: http://arxiv.org/abs/2403.14687
K. Seu, M.-S. Kang, and H. Lee, “An Intelligent Missing Data Imputation Techniques: A Review,” INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION, vol. 6, no. May, pp. 278–283, May 2022, [Online]. Available: www.joiv.org/index.php/joiv
A. K. Srivastava, Y. Kumar, and P. K. Singh, “Hybrid diabetes disease prediction framework based on data imputation and outlier detection techniques,” Expert Syst, vol. 39, no. 3, Mar. 2022, doi: 10.1111/exsy.12785.
Ni’matul Ma’muriyah, P. Purwanto, E. Noersasongko, S. Winarno, and M. I. Ashiddiq, “XG Boost Based Data Imputation and Outlier Detection Methods for Classification of Stunting,” in 7th International Seminar on Research of Information Technology And Intelligent Systems (ISRITI), yogjakarta, Dec. 2024, p. 109.
N. Sri Sai Venkata Subba Rao, S. John Justin Thangaraj, and Saveetha, “Flight Ticket Prediction using Random Forest Regressor Compared with Decision Tree Regressor,” in 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), Chennai, India : IEEE, Apr. 2023.
Q. Zhang, “Financial Data Anomaly Detection Method Based on Decision Tree and Random Forest Algorithm,” Journal of Mathematics, vol. 2022, 2022, doi: 10.1155/2022/9135117.
S. Saleem, M. Aslam, and M. Rukh Shaukat, “A REVIEW AND EMPIRICAL COMPARISON OF UNIVARIATE OUTLIER DETECTION METHODS,” 2021.
H. Yun, “Prediction model of algal blooms using logistic regression and confusion matrix,” International Journal of Electrical and Computer Engineering, vol. 11, no. 3, pp. 2407–2413, Jun. 2021, doi: 10.11591/ijece.v11i3.pp2407-2413.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nimatul Mamuriyah, Richard; Haeruddin

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.