PENERAPAN METODE K-NEAREST NEIGHBOR DAN INFORMATION GAIN PADA KLASIFIKASI KINERJA SISWA
Abstract
Education is a very important problem in the development of a country. One way to reach the level of quality of education is to predict student academic performance. The method used is still using an ineffective way because evaluation is based solely on the educator's assessment of information on the progress of student learning. Information on the progress of student learning is not enough to form indicators in evaluating student performance and helping students and educators to make improvements in learning and teaching. K-Nearest Neighbor is an effective method for classifying student performance, but K-Nearest Neighbor has problems in terms of large vector dimensions. This study aims to predict the academic performance of students using the K-Nearest Neighbor algorithm with the Information Gain feature selection method to reduce vector dimensions. Several experiments were conducted to obtain an optimal architecture and produce accurate classifications. The results of 10 experiments with k values (1 to 10) in the student performance dataset with the K-Nearest Neighbor method showed the largest average accuracy of 74.068 while the K-Nearest Neighbor and Information Gain methods obtained the highest average accuracy of 76.553. From the results of these tests it can be concluded that Information Gain can reduce vector dimensions, so that the application of K-Nearest Neighbor and Information Gain can improve the accuracy of the classification of student performance better than using the K-Nearest Neighbor method.
Downloads
References
Adeniyi, D. A., Wei, Z., & Yongquan, Y. (2016). Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics, 12(1), 90–108. https://doi.org/10.1016/j.aci.2014.10.001
Aghbari, Z. Al. (2005). Array-index: A plug&search K nearest neighbors method for high-dimensional data. Data and Knowledge Engineering, 52(3), 333–352. https://doi.org/10.1016/j.datak.2004.06.015
Al-Shehri, H., Al-Qarni, A., Al-Saati, L., Batoaq, A., Badukhen, H., Alrashed, S., … Olatunji, S. O. (2017). Student performance prediction using Support Vector Machine and K-Nearest Neighbor. Canadian Conference on Electrical and Computer Engineering, 17–20. https://doi.org/10.1109/CCECE.2017.7946847
Alkhasawneh, R., & Hobson, R. (2011). Modeling student retention in science and engineering disciplines using neural networks. In 2011 IEEE Global Engineering Education Conference, EDUCON 2011 (pp. 660–663). https://doi.org/10.1109/EDUCON.2011.5773209
Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from LMS data: A comparison of 17 blended courses using moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29. https://doi.org/10.1109/TLT.2016.2616312
Cortez, P., & Silva, A. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), 5–12.
de Vries, A. P., Mamoulis, N., Nes, N., & Kersten, M. (2003). Efficient k-NN search on vertically decomposed data (p. 322). https://doi.org/10.1145/564728.564729
Gallager, R. G. (2001). Claude E. Shannon: A retrospective on his life, work, and impact. IEEE Transactions on Information Theory, 47(7), 2681–2695. https://doi.org/10.1109/18.959253
George Gorman. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305.
Gou, J., Zhan, Y., Rao, Y., Shen, X., Wang, X., & He, W. (2014). Improved pseudo nearest neighbor classification. Knowledge-Based Systems, 70, 361–375. https://doi.org/10.1016/j.knosys.2014.07.020
Hamsa, H., Indiradevi, S., & Kizhakkethottam, J. J. (2016). Student Academic Performance Prediction Model Using Decision Tree and Fuzzy Genetic Algorithm. Procedia Technology, 25, 326–332. https://doi.org/10.1016/j.protcy.2016.08.114
Han, J., Kamber, M., & Pei, J. (2012). Data Mining. In Data Mining (pp. 1–38). https://doi.org/10.1016/B978-0-12-381479-1.00001-0
Hand, D. J. (2007). Principles of data mining. Drug Safety, 30(7), 621–622. https://doi.org/10.2165/00002018-200730070-00010
Ibrahim, Z., & Rusli, D. (2007). Predicting Students’ Academic Performance: Comparing Artificial Neural Network, Decision tree And Linear Regression. Proceedings of the 21st Annual SAS Malaysia Forum, (September), 1–6. Retrieved from https://www.researchgate.net/profile/Daliela_Rusli/publication/228894873_Predicting_Students’_Academic_Performance_Comparing_Artificial_Neural_Network_Decision_Tree_and_Linear_Regression/links/0deec51bb04e76ed93000000.pdf
Koncz, P., & Paralic, J. (2011). An approach to feature selection for sentiment analysis. In INES 2011 - 15th International Conference on Intelligent Engineering Systems, Proceedings (pp. 357–362). https://doi.org/10.1109/INES.2011.5954773
Lin, Y., Li, J., Lin, M., & Chen, J. (2014). A new nearest neighbor classifier via fusing neighborhood information. Neurocomputing, 143, 164–169. https://doi.org/10.1016/j.neucom.2014.06.009
Lopez Guarin, C. E., Guzman, E. L., & Gonzalez, F. A. (2015). A Model to Predict Low Academic Performance at a Specific Enrollment Using Data Mining. Revista Iberoamericana de Tecnologias Del Aprendizaje, 10(3), 119–125. https://doi.org/10.1109/RITA.2015.2452632
Lu, L. R., & Fa, H. Y. (2004). A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification [J]. Journal of Computer Research and Development, 4, 003.
Pandey, M., & Taruna, S. (2016). Towards the integration of multiple classifier pertaining to the Student’s performance prediction. Perspectives in Science, 8, 364–366. https://doi.org/10.1016/j.pisc.2016.04.076
Setiyorini, T., & Asmono, R. T. (2017). Penerapan Gini Index dan K-Nearest Neighbor untuk Klasifikasi Tingkat Kognitif Soal pada Taksonomi Bloom. Jurnal Pilar Nusa Mandiri, 13(2), 209–216.
Setiyorini, T., & Asmono, R. T. (2019). Laporan Akhir Penelitian Mandiri.
Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157
Vercellis, C. (2009). Data mining and optomization for decision making. Business Intelligence (Vol. 1). https://doi.org/10.1017/CBO9781107415324.004
Wang, S., Li, D., Song, X., Wei, Y., & Li, H. (2011). A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications, 38(7), 8696–8702. https://doi.org/10.1016/j.eswa.2011.01.077
Won Yoon, J., & Friel, N. (2015). Efficient model selection for probabilistic K nearest neighbour classification. Neurocomputing, 149(PB), 1098–1108. https://doi.org/10.1016/j.neucom.2014.07.023
Xu, T., Peng, Q., & Cheng, Y. (2012). Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowledge-Based Systems, 35, 279–289. https://doi.org/10.1016/j.knosys.2012.04.011
Yang, F., & Li, F. W. B. (2018). Study on student performance estimation, student progress analysis, and student potential prediction based on data mining. Computers and Education, 123(October 2017), 97–108. https://doi.org/10.1016/j.compedu.2018.04.006
Zhang, J., & Tan, S. (2008). An empirical study of sentiment analysis for chinese documents. EXPERT SYSTEMS WITH APPLICATIONS, 34(4), 2622–2629. https://doi.org/10.1016/j.eswa.2007.05.028