SCHOLAR AI: INNOVATION IN SCHOLARSHIP SELECTION CLUSTERING BASED ELIGIBILITY CLASSIFICATION BASED ON MACHINE LEARNING

Authors

  • Tutik Lestari University of Darunnajah image/svg+xml
  • Achmad Farouq Abdullah University of Darunnajah image/svg+xml
  • Oddy Virgantara Putra University of Darussalam Gontor image/svg+xml
  • Onno Widodo Purbo Institute of Technology South Tangerang

DOI:

https://doi.org/10.33480/jitk.v11i4.7196

Keywords:

Clustering, Machine Learning, Pondok Pesantren Scholarships, Prediction, Scholarship Recipients

Abstract

Scholarship allocation is a crucial process that ensures financial support for students based on academic performance, potential, and financial need. However, the scholarship selection process at Pondok Pesantren Darunnajah has faced challenges in capturing the holistic characteristics of applicants. This research proposes a machine learning model that integrates clustering and predictive techniques to improve the scholarship selection process. The dataset consists of 300 student samples with attributes such as academic scores, tahfidz (Qur'an memorization), family income, and extracurricular activities. These features help determine if a student qualifies for one of three scholarship schemes: Beasiswa Tahfidz, Beasiswa Prestasi, or Beasiswa Ashabunnajah, or if they are deemed "not eligible." The model follows the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework and utilizes machine learning algorithms for classification. To ensure the model's robustness, its performance is evaluated using K-fold cross-validation, with 5-fold validation employed to validate the model's predictions. The results show a high mean validation accuracy of 90.61% and an F1-score of 0.9311, indicating strong generalization capabilities. These findings highlight the model's potential to improve the scholarship allocation process, ensuring scholarships are awarded to the most deserving students based on academic performance, leadership potential, and financial need. Despite its high performance, the study acknowledges limitations such as potential biases in the dataset and challenges in capturing all relevant factors. These issues may affect the overall effectiveness of the model, suggesting room for improvement in addressing the complexity of the selection process.

Downloads

Download data is not yet available.

References

[1] E. Gazali and A. A. Budiana, “A Bibliometric Analysis of Pesantren's Educational Impact: Insights from The Scopus Database (1994–2022),” Jurnal Pendidikan Islam, vol. 12, no. 1, pp. 15–33, 2023, doi: 10.14421/jpi.2023.121.15-33.

[2] K. M. Lestari, S. Zakir, D. Ilmi, and R. A. Gusli, “Evaluasi perubahan Kurikulum 2013 dengan Kurikulum Merdeka di SMAN 3 Bukittinggi,” Idarah Tarbawiyah: Journal of Management in Islamic Education, vol. 5, no. 2, 2024, doi: 10.32832/itjmie.v5i2.16620.

[3] O. W. P. T. L. Sofian Lusa, Peran e-Commerce dalam Mendukung Ekonomi Digital Indonesia, Yogyakarta, Indonesia: Andi Offset, 2024.

[4] T. Lestari, “Ethics in Technological Innovation: Strengthening Human Responsibility and Values,” in Proceeding of International Conference on Islamic Boarding School, vol. 1, no. 1, pp. 138–144, 2025, doi: 10.61159/icop.v1i1.428.

[5] H. Alias, M. Adif, M. A. Abdul Aziz, N. Hambali, and M. N. Taib, “Student performance classification: A comparison of feature selection methods based on online learning activities,” International Journal of Electrical and Computer Engineering, vol. 14, no. 4, pp. 4675–4685, 2024, doi: 10.11591/ijece.v14i4.pp4675-4685.

[6] A. R. Tutik Lestari, “Transformation of Pesantren Education in the Digital Era: AI Innovation and Adaptation for Technology-Based Learning,” The Electronic Integrated Computer Algorithm Journal, vol. 2, no. 1, pp. 69-90, 2025, doi: 10.62123/enigma.v2i2.58.

[7] U. Kannengiesser and J. S. Gero, “Modelling the Design of Models: An Example Using CRISP-DM,” Proceedings of the Design Society, vol. 3, pp. 2705–2714, 2023, doi: 10.1017/pds.2023.271.

[8] M. A. Karabiyik, B. Turkoglu, and T. Asuroglu, “A cluster-assisted differential evolution-based hybrid oversampling method for imbalanced datasets,” PeerJ Computer Science, vol. 11, p. e3177, 2025, doi: 10.7717/peerj-cs.3177.

[9] S. Nazuah, S. S. Hilabi, A. Hananto, B. Huda, and Tukino, “Seleksi Penerimaan Beasiswa Dengan Metode K-Means Clustering Menggunakan Orange,” JUSTINDO: Jurnal Sistem dan Teknologi Informasi Indonesia, vol. 8, no. 1, p. 1–10, 2023, doi: 10.32528/justindo.v8i1.212.

[10] R. A. Nugrahaeni and K. Mutijarsa, “Comparative analysis of machine learning KNN, SVM, and random forests algorithm for facial expression classification,” in 2016 International Seminar on Application for Technology of Information and Communication (ISemantic), 2016, pp. 163–168, doi: 10.1109/ISEMANTIC.2016.7873831.

[11] M. Kumar, N. Singh, J. Wadhwa, and P. Singh, “Utilizing Random Forest and XGBoost Data Mining Algorithms for Anticipating Students’ Academic Performance,” I.J. Modern Education and Computer Science, vol. 16, no. 2, pp. 29–44, 2024, doi: 10.5815/ijmecs.2024.02.03.

[12] E. Cahapin, B. Malabag, C. Santiago Jr., J. Reyes, G. Legaspi, and K. Adrales, “Clustering of students admission data using k-means, hierarchical, and DBSCAN algorithms,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 6, pp. 3647–3656, 2023, doi: 10.11591/eei.v12i6.4849.

[13] M. Sompa and R. Ishak, “Clustering Tingkat Ekonomi Mahasiswa Calon Penerima Kartu Indonesia Pintar (KIP) Kuliah Metode K-Means,” Jurnal Ilmiah Ilmu Komputer Banthayo Lo Komputer, vol. 1, no. 2, pp. 65–71, 2022, doi: 10.37195/balok.v1i2.175.

[14] V. W. Lumumba, D. Kiprotich, M. L. Mpaine, N. G. Makena, and M. D. Kavita, “Comparative Analysis of Cross-Validation Techniques: LOOCV, K-folds Cross-Validation, and Repeated K-folds Cross-Validation in Machine Learning Models,” SSRN Electronic Journal, Jun. 2024, doi: 10.2139/ssrn.5266507.

[15] S. Romero, X. Li, N. Xi, R. A. Romero, and M. S.-V. Romero, “Statistical and machine learning models for predicting university dropout and scholarship impact,” PLOS ONE, vol. 20, no. 6, p. e0325047, 2025, doi: 10.1371/journal.pone.0325047.

[16] B. Zhu, X. Jing, L. Qiu, and R. Li, “An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine,” Computers, Materials & Continua, vol. 79, no. 3, p. 3977, 2024, doi: 10.32604/cmc.2024.048062.

[17] Y. Zhang, L. Deng, and B. Wei, “Imbalanced Data Classification Based on Improved Random-SMOTE and Feature Standard Deviation,” Mathematics, vol. 12, no. 11, p. 1709, 2024, doi: 10.3390/math12111709.

[18] M. Han, A. Li, Z. Gao, D. Mu, and S. Liu, “Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream,” Applied Sciences, vol. 13, no. 10, p. 5924, 2023, doi: 10.3390/app13105924.

[19] C. S. Metzner, S. Gao, D. Herrmannova, E. Lima-Walton, and H. A. Hanson, “Attention Mechanisms in Clinical Text Classification: A Comparative Evaluation,” IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 4, pp. 2247–2258, 2024, doi: 10.1109/JBHI.2024.3355951.

[20] J. Yang, A. A. S. Soltan, D. W. Eyre, et al., “Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning,” Nature Machine Intelligence, vol. 5, pp. 884–894, 2023, doi: 10.1038/s42256-023-00697-3.

Downloads

Published

2026-05-13

How to Cite

[1]
“SCHOLAR AI: INNOVATION IN SCHOLARSHIP SELECTION CLUSTERING BASED ELIGIBILITY CLASSIFICATION BASED ON MACHINE LEARNING”, jitk, vol. 11, no. 4, pp. 1143–1151, May 2026, doi: 10.33480/jitk.v11i4.7196.