IMPLEMENTATION OF K-NEAREST NEIGHBOR AND GINI INDEX METHOD IN CLASSIFICATION OF STUDENT PERFORMANCE

Penerapan Metode K-Nearest Neighbor Dan Gini Index Pada Klasifikasi Kinerja Siswa

  • Tyas Setiyorini STMIK Nusa Mandiri
  • Rizky Tri Asmono Teknik Informatika STMIK Swadarma
Keywords: K-Nearest Neighbor, Gini Index, Student Performance

Abstract

Predicting student academic performance is one of the important applications in data mining in education. However, existing work is not enough to identify which factors will affect student performance. Information on academic values ​​or progress on student learning is not enough to be a factor in predicting student performance and helps students and educators to make improvements in learning and teaching. K-Nearest Neighbor is a simple method for classifying student performance, but K-Nearest Neighbor has problems in terms of high feature dimensions. To solve this problem, we need a method of selecting the Gini Index feature in reducing the high feature dimensions. Several experiments were conducted to obtain an optimal architecture and produce accurate classifications. The results of 10 experiments with values ​​of k (1 to 10) in the student performance dataset with the K-Nearest Neighbor method showed the highest average accuracy of 74.068 while the K-Nearest Neighbor and Gini Index methods showed the highest average accuracy of 76.516. From the results of these tests it can be concluded that the Gini Index is able to overcome the problem of high feature dimensions in K-Nearest Neighbor, so the application of the K-Nearest Neighbor and Gini Index can improve the accuracy of student performance classification better than using the K-Nearest Neighbor method.

References

Adeniyi, D. A., Wei, Z., & Yongquan, Y. (2016). Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics, 12(1), 90–108. https://doi.org/10.1016/j.aci.2014.10.001

Al-Shehri, H., Al-Qarni, A., Al-Saati, L., Batoaq, A., Badukhen, H., Alrashed, S., … Olatunji, S. O. (2017). Student performance prediction using Support Vector Machine and K-Nearest Neighbor. Canadian Conference on Electrical and Computer Engineering, 17–20. https://doi.org/10.1109/CCECE.2017.7946847

Alkhasawneh, R., & Hobson, R. (2011). Modeling student retention in science and engineering disciplines using neural networks. In 2011 IEEE Global Engineering Education Conference, EDUCON 2011 (pp. 660–663). https://doi.org/10.1109/EDUCON.2011.5773209

Altujjar, Y., Altamimi, W., Al-Turaiki, I., & Al-Razgan, M. (2016). Predicting Critical Courses Affecting Students Performance: A Case Study. Procedia Computer Science, 82(March), 65–71. https://doi.org/10.1016/j.procs.2016.04.010

Breiman, L. (2001). Classification and regression tree.

Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from LMS data: A comparison of 17 blended courses using moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29. https://doi.org/10.1109/TLT.2016.2616312

Cortez, P., & Silva, A. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), 5–12.

Cover, T. M., & Hart, P. E. (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964

de Vries, A. P., Mamoulis, N., Nes, N., & Kersten, M. (2003). Efficient k-NN search on vertically decomposed data (p. 322). https://doi.org/10.1145/564728.564729

Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Erven, G. Van. (2019). Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil. Journal of Business Research, 94(February), 335–343. https://doi.org/10.1016/j.jbusres.2018.02.012

Gou, J., Zhan, Y., Rao, Y., Shen, X., Wang, X., & He, W. (2014). Improved pseudo nearest neighbor classification. Knowledge-Based Systems, 70, 361–375. https://doi.org/10.1016/j.knosys.2014.07.020

Han, J., Kamber, M., & Pei, J. (2012). Data Mining Concepts and Techniques. Data Mining. https://doi.org/10.1016/b978-0-12-381479-1.00001-0

Koncz, P., & Paralic, J. (2011). An approach to feature selection for sentiment analysis. In INES 2011 - 15th International Conference on Intelligent Engineering Systems, Proceedings (pp. 357–362). https://doi.org/10.1109/INES.2011.5954773

Lin, Y., Li, J., Lin, M., & Chen, J. (2014). A new nearest neighbor classifier via fusing neighborhood information. Neurocomputing, 143, 164–169. https://doi.org/10.1016/j.neucom.2014.06.009

Lopez Guarin, C. E., Guzman, E. L., & Gonzalez, F. A. (2015). A Model to Predict Low Academic Performance at a Specific Enrollment Using Data Mining. Revista Iberoamericana de Tecnologias Del Aprendizaje, 10(3), 119–125. https://doi.org/10.1109/RITA.2015.2452632

López, J., & Maldonado, S. (2018). Redefining nearest neighbor classification in high-dimensional settings. Pattern Recognition Letters, 110, 36–43. https://doi.org/10.1016/j.patrec.2018.03.023

Pandey, M., & Taruna, S. (2016). Towards the integration of multiple classifier pertaining to the Student’s performance prediction. Perspectives in Science, 8, 364–366. https://doi.org/10.1016/j.pisc.2016.04.076

Setiyorini, T., & Asmono, R. T. (2017). Penerapan Gini Index dan K-Nearest Neighbor untuk Klasifikasi Tingkat Kognitif Soal pada Taksonomi Bloom. Jurnal Pilar Nusa Mandiri, 13(2), 209–216.

Setiyorini, T., & Asmono, R. T. (2019). Laporan Akhir Penelitian Mandiri.

Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157

Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., & Wang, Z. (2007). A novel feature selection algorithm for text categorization. Expert Systems with Applications, 33(1), 1–5. https://doi.org/10.1016/j.eswa.2006.04.001

Shankar, S., & Karypis, G. (2000). A Feature Weight Adjustment Algorithm for Document Categorization.

Villagrá-Arnedo, C. J., Gallego-Durán, F. J., Llorens-Largo, F., Compañ-Rosique, P., Satorre-Cuerda, R., & Molina-Carmona, R. (2017). Improving the expressiveness of black-box models for predicting student performance. Computers in Human Behavior, 72, 621–631. https://doi.org/10.1016/j.chb.2016.09.001

Wang, S., Li, D., Song, X., Wei, Y., & Li, H. (2011). A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications, 38(7), 8696–8702. https://doi.org/10.1016/j.eswa.2011.01.077

Won Yoon, J., & Friel, N. (2015). Efficient model selection for probabilistic K nearest neighbour classification. Neurocomputing, 149(PB), 1098–1108. https://doi.org/10.1016/j.neucom.2014.07.023

Xu, T., Peng, Q., & Cheng, Y. (2012). Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowledge-Based Systems, 35, 279–289. https://doi.org/10.1016/j.knosys.2012.04.011

Yang, F., & Li, F. W. B. (2018). Study on student performance estimation, student progress analysis, and student potential prediction based on data mining. Computers and Education, 123(October 2017), 97–108. https://doi.org/10.1016/j.compedu.2018.04.006

Published
2019-09-06
How to Cite
Setiyorini, T., & Asmono, R. (2019). IMPLEMENTATION OF K-NEAREST NEIGHBOR AND GINI INDEX METHOD IN CLASSIFICATION OF STUDENT PERFORMANCE. Jurnal Techno Nusa Mandiri, 16(2), 121-126. https://doi.org/10.33480/techno.v16i2.747