MACHINE LEARNING TO IDENTIFY ELIGIBILITY OF STUDENTS RECEIVING SINGLE TUITION RELIEF

Authors

  • M. Ghofar Rohman Universitas Islam Lamongan image/svg+xml
  • Zubaile Abdullah University Tun Hussein Onn Malaysia
  • Shahreen Kasim University Tun Hussein Onn Malaysia, Johor, Malaysia
  • M Ulul Albab Universitas Islam Lamongan image/svg+xml

DOI:

https://doi.org/10.33480/jitk.v11i3.7294

Keywords:

Classification, Higher Education, Machine Learning, SHAP, Single Tuition

Abstract

The cost of higher education in Indonesia varies greatly and often becomes a financial burden for students. Socio-economic factors such as parental income, occupation, number of dependents, vehicle ownership, and place of residence influence the determination of single tuition as regulated by the Ministry of Education Regulation No. 55 of 2013. This study aims to classify freshmen eligibility for single tuition relief using five machine learning models: RF, LR, KNN, SVM, and NB. The dataset contains 2000 rows of data with six socio-economic attributes divided into two classes: eligible and ineligible. The data were split into 80% training and 20% testing, and model performance was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC. Results show that without SMOTE, all models suffer from severe majority-class bias, yielding critically low recall for the minority class  SVM = 0.014; NB = 0.004. SMOTE significantly improves minority-class detection, with RF and SVM achieving the highest performance F1-scores of 0.820 and 0.801, and ROC-AUC of 0.966 and 0.990, respectively. SHAP analysis identifies Number of Dependents of Parents as the most influential predictor across all models, highlighting its central role in financial need assessment. These findings demonstrate that combining SMOTE with ensemble or margin-based models enhances classifiying  fairness and sensitivity in educational support systems. The future work recommend expanding features to include behavioral, academic, and regional indicators, using multi-institutional data, and exploring deep learning or advanced resampling methods to enhance generalizability and robustness

Downloads

Download data is not yet available.

References

[1] N. T. Syam, Irmawati, and Z. Saharuna, “Penerapan Machine Learning untuk Mengatasi Ketimpangan Data dalam Menentukan Klasifikasi Uang Kuliah Tunggal (UKT),” Journal of Informatics and Computer Engineering Research, vol. 1, no. 1, pp. 7–14, Jun. 2024, doi: 10.31963/jicer.v1i1.4921.

[2] M. Ardiansyah, T. Suharto, and A. S. Farid, “Upaya Penanganan Uang Kuliah Tunggal (UKT) Bermasalah bagi Mahasiswa yang tidak Mampu pada Perguruan Tinggi,” JIIP - Jurnal Ilmiah Ilmu Pendidikan, vol. 5, no. 10, pp. 4432–4441, Oct. 2022, doi: 10.54371/jiip.v5i10.1036.

[3] R. A. Sigit, Z. Kurniawan, and R. Rahmaddeni, “Komparasi Algoritma Machine Learning untuk Klasifikasi Kelulusan Mahasiswa,” JSR : Jaringan Sistem Informasi Robotik, vol. 8, no. 1, pp. 108–113, 2024.

[4] I. K. N. Ananda, N. P. N. P. Dewi, N. W. Marti, and L. J. E. Dewi, “Klasifikasi Multilabel pada Gaya Belajar Siswa Sekolah Dasar Menggunakan Algoritma Machine Learning,” Journal of Applied Computer Science and Technology, vol. 5, no. 2, pp. 144–154, Dec. 2024, doi: 10.52158/jacost.v5i2.940.

[5] M. Putra and E. Harahap, “Machine Learning pada Prediksi Kelulusan Mahasiswa Menggunakan Algoritma Random Forest,” Jurnal Riset Matematika, vol. 4, no. 2, pp. 127–136, Dec. 2024, doi: 10.29313/jrm.v4i2.5102.

[6] J. Jefri and Z. Fatah, “Klasifikasi Data Mining untuk Memprediksi Kelulusan Mahasiswa Menggunakan Metode Naive Bayes,” Jurnal Ilmiah Multidisiplin Ilmu, vol. 2, no. 1, pp. 29–37, Feb. 2025, doi: 10.69714/mhjq1v85.

[7] M. Fadhilla, R. Wandri, A. Hanafiah, P. R. Setiawan, Y. Arta, and S. Daulay, “Analisis Performa Algoritma Machine Learning untuk Identifikasi Depresi pada Mahasiswa,” Journal of Informatics Management and Information Technology , vol. 5, no. 1, pp. 40–47, Jan. 2025.

[8] S. S. M. Putri, M. Arhami, and H. Hendrawaty, “Penerapan Metode SVM pada Klasifikasi Kualitas Air,” Journal of Artificial Intelligence and Software Engineering (J-AISE), vol. 3, no. 2, p. 93, Nov. 2023, doi: 10.30811/jaise.v3i2.4630.

[9] I. M. D. P. Asana and N. P. D. T. Yanti, “Sistem Klasifikasi Pengajuan Kredit dengan Metode Support Vector Machine (SVM),” Jurnal Sistem Cerdas, vol. 6, no. 2, pp. 123–133, Aug. 2023.

[10] S. Rabbani, D. Safitri, N. Rahmadhani, A. A. F. Sani, and M. K. Anam, “Perbandingan Evaluasi Kernel SVM untuk Klasifikasi Sentimen dalam Analisis Kenaikan Harga BBM,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 2, pp. 153–160, Oct. 2023, doi: 10.57152/malcom.v3i2.897.

[11] S. Wahyuni, N. Sutriningsih, and S. Rahayu, “Penerapan Media GeoGebra pada Pembelajaran Matematika,” Cartesian: Jurnal Pendidikan Matematika, vol. 2, no. 2, pp. 234–240, Apr. 2023, doi: 10.33752/cartesian.v2i2.3508.

[12] M. Mardewi, N. Yarkuran, S. Sofyan, and F. Aziz, “Klasifikasi Kategori Obat Menggunakan Algoritma Support Vector Machine,” Journal Pharmacy and Application of Computer Sciences (JOPACS), vol. 1, no. 1, pp. 27–32, Feb. 2023.

[13] O. Daswati, I. Indahwati, E. Erfiani, A. Fitrianto, and M. A. Aliu, “Model Klasifikasi Regresi Logistik Biner untuk Laporan Masyarakat di Ombudsman Republik Indonesia,” Lebesgue: JurnalIlmiahPendidikan Matematika, Matematika dan Statistika, vol. 5, no. 2, pp. 964–973, Aug. 2024.

[14] H. Achmadi, I. Fatmawati, and S. Samuel, “Karakteristik Siswa Siswi SMA yang Menentukan Pemilihan Perguruan Tinggi Swasta di Indonesia dengan Menggunakan Logistik Regresi,” in 6th NCBMA 2023 “Business Analytics and Artificial Intelligence for Supporting Business Sustainability” , Tangerang: Universitas Pelita Harapan, Indonesia, May 2023, pp. 251–259.

[15] R. Aristawidya, I. Indahwati, E. Erfiani, A. Fitrianto, and M. A. A, “Perbandingan Analisis Regresi Logistik Biner dan Naïve Bayes Classifier untuk Memprediksi Faktor Resiko Diabetes,” Lebesgue: Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika, vol. 5, no. 2, pp. 782–794, Aug. 2024.

[16] D. Nasien, R. Darwin, A. Cia, and et al., “Perbandingan Implementasi Machine Learning Menggunakan Metode KNN, Naive Bayes, dan Logistik Regression Untuk Mengklasifikasi Penyakit Diabetes,” Jekin- Jurnal Teknik Informatika, vol. 4, no. 1, pp. 10–17, Feb. 2024.

[17] V. Vajrobol, B. B. Gupta, and A. Gaurav, “Mutual information based logistic regression for phishing URL detection,” Cyber Security and Applications, vol. 2, p. 1, Mar. 2024, doi: 10.1016/j.csa.2024.100044.

[18] S. Zhang, “Challenges in KNN Classification,” IEEE Trans Knowl Data Eng, vol. 34, no. 10, pp. 4663–4675, Oct. 2022, doi: 10.1109/TKDE.2021.3049250.

[19] M. F. Kurniawan and D. A. Megawaty, “Comparison of Logistic Regression, Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Algorithms in Diabetes Prediction,” Journal of Applied Informatics and Computing, vol. 9, no. 5, pp. 2154–2162, Oct. 2025, doi: 10.30871/jaic.v9i5.9815.

[20] M. AlShaikh, Y. Alrajeh, S. Alamri, S. Melhem, and A. Abu-Khadrah, “Supervised Methods of Machine Learning for Email Classification: A Literature Survey,” Systems Science & Control Engineering, vol. 13, no. 1, pp. 1–15, Dec. 2025, doi: 10.1080/21642583.2025.2474450.

[21] M. Zou, W.-G. Jiang, Q.-H. Qin, Y.-C. Liu, and M.-L. Li, “Optimized XGBoost Model with Small Dataset for Predicting Relative Density of Ti-6Al-4V Parts Manufactured by Selective Laser Melting,” Materials, vol. 15, no. 15, p. 5298, Aug. 2022, doi: 10.3390/ma15155298.

[22] A. Shmuel, O. Glickman, and T. Lazebnik, “A Comprehensive Benchmark of Machine and Deep Learning Models on Structured Data for Regression and Classification,” Neurocomputing, vol. 655, pp. 1–14, Nov. 2025, doi: 10.1016/j.neucom.2025.131337.

[23] R. Shwartz-Ziv and A. Armon, “Tabular Data: Deep Learning is not All You Need,” Information Fusion, vol. 81, pp. 84–90, May 2022, doi: 10.1016/j.inffus.2021.11.011.

[24] R. Guetari, H. Ayari, and H. Sakly, “Computer-Aided Diagnosis Systems: A Comparative Study of Classical Machine Learning Versus Deep Learning-Based Approaches,” Knowl Inf Syst, vol. 65, no. 10, pp. 3881–3921, Oct. 2023, doi: 10.1007/s10115-023-01894-7.

[25] Y. Arslan et al., “Towards Refined Classifications Driven by SHAP Explanations,” 2022, pp. 68–81. doi: 10.1007/978-3-031-14463-9_5.

[26] L. Bernal, G. Rastelli, and L. Pinzi, “Improving Machine Learning Classification Predictions through SHAP and Features Analysis Interpretation,” J Chem Inf Model, Oct. 2025, doi: 10.1021/acs.jcim.5c02015.

[27] R. Rina, M. H. Puspita, N. Ayu, and R. A. Saputra, “Klasifikasi Keringanan UKT Mahasiswa UHO Menggunakan K-Nearest Neighbor (KNN),” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 8, no. 6, pp. 11939–11945, Nov. 2024, doi: 10.36040/jati.v8i6.11757.

[28] F. Sulianta, Basic Data Mining from A to Z, 1st ed. Bandung, 2023.

[29] D. Abriha, P. K. Srivastava, and S. Szabó, “Smaller is Better? Unduly Nice Accuracy Assessments in Roof Detection Using Remote Sensing Data with Machine Learning and k-Fold Cross-Validation,” Heliyon, vol. 9, no. 3, pp. 1–17, 2023, doi: 10.1016/j.heliyon.2023.e14045.

[30] E. Novianto, A. Hermawan, and D. Avianto, “Perbandingan Metode K-Nearest Neighbor dan Support Vector Machine untuk Memprediksi Penerima Beasiswa Keringanan UKT,” Jurnal Media Informatika Budidarma, vol. 8, no. 1, p. 654, Feb. 2024, doi: 10.30865/mib.v8i1.6913.

[31] A. Khaidar, M. Arhami, and M. Abdi, “Application of the Random Forest Method for UKT Classification at Politeknik Negeri Lhokseumawe,” Journal of Artificial Intelligence and Software Engineering (J-AISE), vol. 4, no. 2, p. 94, Nov. 2024, doi: 10.30811/jaise.v4i2.6131.

[32] R. Susetyoko, W. Yuwono, E. Purwantini, and N. Ramadijanti, “Perbandingan Metode Random Forest, Regresi Logistik, Naïve Bayes, dan Multilayer Perceptron Pada Klasifikasi Uang Kuliah Tunggal (UKT),” Jurnal Infomedia, vol. 7, no. 1, p. 8, Jun. 2022, doi: 10.30811/jim.v7i1.2916.

[33] R. C. A. Fajardo, F. B. Yara, R. F. Ardeña, M. K. L. Hernandez, and J. C. T. Arroyo, “A Data-Driven Approach in Predicting Scholarship Grants of a Local Government Unit in the Philippines Using Machine Learning,” International Journal of Engineering Trends and Technology, vol. 72, no. 6, pp. 74–81, Jun. 2024, doi: 10.14445/22315381/IJETT-V72I6P108.

[34] A. Gunakala and A. H. Shahid, “A Comparative Study on Performance of Basic and Ensemble Classifiers with Various Datasets,” Applied Computer Science, vol. 19, no. 1, pp. 107–132, Mar. 2023, doi: 10.35784/acs-2023-08.

[35] J. Hu and S. Szymczak, “A Review on Longitudinal Data Analysis with Random Forest,” Brief Bioinform, vol. 24, no. 2, pp. 1–11, Mar. 2023, doi: 10.1093/bib/bbad002.

[36] J. Zhao, C.-D. Lee, G. Chen, and J. Zhang, “Research on the Prediction Application of Multiple Classification Datasets Based on Random Forest Model,” in 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS), IEEE, Jul. 2024, pp. 156–161. doi: 10.1109/ICPICS62053.2024.10795875.

[37] G. S. Jamnal, “Instils Trust in Random Forest Predictions,” in 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, Oct. 2023, pp. 1–9. doi: 10.1109/DSAA60987.2023.10302640.

[38] A. Yaqoob et al., “SGA-Driven Feature Selection and Random Forest Classification for Enhanced Breast Cancer Diagnosis: A Comparative Study,” Sci Rep, vol. 15, no. 1, Mar. 2025, doi: 10.1038/s41598-025-95786-1.

[39] Y. Miao and Y. Xu, “Random Forest-Based Analysis of Variability in Feature Impacts,” in 2024 IEEE 2nd International Conference on Image Processing and Computer Applications (ICIPCA), IEEE, Jun. 2024, pp. 1130–1135. doi: 10.1109/ICIPCA61593.2024.10708791.

[40] J. K. Harris, “Primer on Binary Logistic Regression,” Fam Med Community Health, vol. 9, no. 1, pp. 1–7, Dec. 2021, doi: 10.1136/fmch-2021-001290.

[41] D. Cornilly, L. Tubex, S. Van Aelst, and T. Verdonck, “Robust and Sparse Logistic Regression,” Adv Data Anal Classif, vol. 18, no. 3, pp. 663–679, Sep. 2024, doi: 10.1007/s11634-023-00572-4.

[42] S. Naik, P. Kumar, S. Saha, S. Das Bairagya, D. Rawat, and S. K. Baliarsingh, “Predictive Healthcare Analytics: A Multidisease Approach Using Logistic Regression,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), IEEE, Jun. 2024, pp. 1–6. doi: 10.1109/ICCCNT61001.2024.10725194.

[43] R. Suryawanshi, V. Kulkarni, P. Ghule, K. Patil, H. Patil, and Y. Manala, “Brain Stroke Prediction Using Logistic Regression with Logarithmic Transform,” in 2024 4th International Conference on Sustainable Expert Systems (ICSES), IEEE, Oct. 2024, pp. 873–877. doi: 10.1109/ICSES63445.2024.10763034.

[44] A. Putri et al., “Komparasi Algoritma K-NN Naive Bayes dan SVM untuk Prediksi Kelulusan Mahasiswa Tingkat Akhir,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 1, pp. 20–26, May 2023, doi: 10.57152/malcom.v3i1.610.

[45] R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review,” Information, vol. 15, no. 4, pp. 1–36, Apr. 2024, doi: 10.3390/info15040235.

[46] Y. Chen, A. Zhu, and Q. Zhao, “Rolling Bearing Fault Diagnosis Based On Flock Optimization Support Vector Machine,” in 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference (ITOEC), IEEE, Sep. 2023, pp. 1700–1703. doi: 10.1109/ITOEC57671.2023.10292080.

[47] Z. Jun, “The Development and Application of Support Vector Machine,” J Phys Conf Ser, vol. 1748, no. 5, pp. 1–6, Jan. 2021, doi: 10.1088/1742-6596/1748/5/052006.

[48] S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative Performance Analysis of K-Nearest Neighbour (KNN) Algorithm and Its Different Variants for Disease Prediction,” Sci Rep, vol. 12, no. 6256, pp. 1–11, Apr. 2022, doi: 10.1038/s41598-022-10358-x.

[49] H. Vega-Huerta et al., “K-Nearest Neighbors Model to Optimize Data Classification According to the Water Quality Index of the Upper Basin of the City of Huarmey,” Applied Sciences, vol. 15, no. 18, pp. 1–19, Sep. 2025, doi: 10.3390/app151810202.

[50] A. A. Amer, S. D. Ravana, and R. A. A. Habeeb, “Effective k-Nearest Neighbor Models for Data Classification Enhancement,” J Big Data, vol. 12, no. 86, pp. 1–41, Apr. 2025, doi: 10.1186/s40537-025-01137-2.

[51] R. K. Halder, M. N. Uddin, Md. A. Uddin, S. Aryal, and A. Khraisat, “Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications,” J Big Data, vol. 11, no. 113, pp. 1–55, Aug. 2024, doi: 10.1186/s40537-024-00973-y.

[52] F. Ramadhani, A.-K. Al-Khowarizmi, and I. P. Sari, “Improving the Performance of Naïve Bayes Algorithm by Reducing the Attributes of Dataset Using Gain Ratio and Adaboost,” in 2021 International Conference on Computer Science and Engineering (IC2SE), IEEE, Nov. 2021, pp. 1–5. doi: 10.1109/IC2SE52832.2021.9792027.

[53] D. Prabha, J. Aswini, B. Maheswari, R. S. Subramanian, R. Nithyanandhan, and P. Girija, “A Survey on Alleviating the Naive Bayes Conditional Independence Assumption,” in 2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), IEEE, Nov. 2022, pp. 654–657. doi: 10.1109/ICAISS55157.2022.10011103.

[54] R. Blanquero, E. Carrizosa, P. Ramírez-Cobo, and M. R. Sillero-Denamiel, “Variable Selection for Naïve Bayes Classification,” Comput Oper Res, vol. 135, pp. 1–11, Nov. 2021, doi: 10.1016/j.cor.2021.105456.

[55] B. Phatcharathada and P. Srisuradetchai, “Randomized Feature and Bootstrapped Naive Bayes Classification,” Applied System Innovation, vol. 8, no. 4, pp. 1–20, Jul. 2025, doi: 10.3390/asi8040094.

[56] A. S. Antonini et al., “Machine Learning Model Interpretability Using SHAP Values: Application to Igneous Rock Classification Task,” Applied Computing and Geosciences, vol. 23, p. 100178, Sep. 2024, doi: 10.1016/j.acags.2024.100178.

Downloads

Published

2026-02-10

How to Cite

[1]
“MACHINE LEARNING TO IDENTIFY ELIGIBILITY OF STUDENTS RECEIVING SINGLE TUITION RELIEF”, jitk, vol. 11, no. 3, pp. 613–626, Feb. 2026, doi: 10.33480/jitk.v11i3.7294.