PENERAPAN METODE K-NEAREST NEIGHBOR DAN INFORMATION GAIN PADA KLASIFIKASI KINERJA SISWA

  • Tyas Setiyorini STMIK Nusa Mandiri
  • Rizky Tri Asmono Teknik Informatika STMIK Swadharma
Keywords: Student Performance Classification, K-Nearest Neighbor, Information Gain

Abstract

Education is a very important problem in the development of a country. One way to reach the level of quality of education is to predict student academic performance. The method used is still using an ineffective way because evaluation is based solely on the educator's assessment of information on the progress of student learning. Information on the progress of student learning is not enough to form indicators in evaluating student performance and helping students and educators to make improvements in learning and teaching. K-Nearest Neighbor is an effective method for classifying student performance, but K-Nearest Neighbor has problems in terms of large vector dimensions. This study aims to predict the academic performance of students using the K-Nearest Neighbor algorithm with the Information Gain feature selection method to reduce vector dimensions. Several experiments were conducted to obtain an optimal architecture and produce accurate classifications. The results of 10 experiments with k values ​​(1 to 10) in the student performance dataset with the K-Nearest Neighbor method showed the largest average accuracy of 74.068 while the K-Nearest Neighbor and Information Gain methods obtained the highest average accuracy of 76.553. From the results of these tests it can be concluded that Information Gain can reduce vector dimensions, so that the application of K-Nearest Neighbor and Information Gain can improve the accuracy of the classification of student performance better than using the K-Nearest Neighbor method.

Downloads

Download data is not yet available.

References

Adeniyi, D. A., Wei, Z., & Yongquan, Y. (2016). Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics, 12(1), 90–108. https://doi.org/10.1016/j.aci.2014.10.001

Aghbari, Z. Al. (2005). Array-index: A plug&search K nearest neighbors method for high-dimensional data. Data and Knowledge Engineering, 52(3), 333–352. https://doi.org/10.1016/j.datak.2004.06.015

Al-Shehri, H., Al-Qarni, A., Al-Saati, L., Batoaq, A., Badukhen, H., Alrashed, S., … Olatunji, S. O. (2017). Student performance prediction using Support Vector Machine and K-Nearest Neighbor. Canadian Conference on Electrical and Computer Engineering, 17–20. https://doi.org/10.1109/CCECE.2017.7946847

Alkhasawneh, R., & Hobson, R. (2011). Modeling student retention in science and engineering disciplines using neural networks. In 2011 IEEE Global Engineering Education Conference, EDUCON 2011 (pp. 660–663). https://doi.org/10.1109/EDUCON.2011.5773209

Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from LMS data: A comparison of 17 blended courses using moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29. https://doi.org/10.1109/TLT.2016.2616312

Cortez, P., & Silva, A. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), 5–12.

de Vries, A. P., Mamoulis, N., Nes, N., & Kersten, M. (2003). Efficient k-NN search on vertically decomposed data (p. 322). https://doi.org/10.1145/564728.564729

Gallager, R. G. (2001). Claude E. Shannon: A retrospective on his life, work, and impact. IEEE Transactions on Information Theory, 47(7), 2681–2695. https://doi.org/10.1109/18.959253

George Gorman. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305.

Gou, J., Zhan, Y., Rao, Y., Shen, X., Wang, X., & He, W. (2014). Improved pseudo nearest neighbor classification. Knowledge-Based Systems, 70, 361–375. https://doi.org/10.1016/j.knosys.2014.07.020

Hamsa, H., Indiradevi, S., & Kizhakkethottam, J. J. (2016). Student Academic Performance Prediction Model Using Decision Tree and Fuzzy Genetic Algorithm. Procedia Technology, 25, 326–332. https://doi.org/10.1016/j.protcy.2016.08.114

Han, J., Kamber, M., & Pei, J. (2012). Data Mining. In Data Mining (pp. 1–38). https://doi.org/10.1016/B978-0-12-381479-1.00001-0

Hand, D. J. (2007). Principles of data mining. Drug Safety, 30(7), 621–622. https://doi.org/10.2165/00002018-200730070-00010

Ibrahim, Z., & Rusli, D. (2007). Predicting Students’ Academic Performance: Comparing Artificial Neural Network, Decision tree And Linear Regression. Proceedings of the 21st Annual SAS Malaysia Forum, (September), 1–6. Retrieved from https://www.researchgate.net/profile/Daliela_Rusli/publication/228894873_Predicting_Students’_Academic_Performance_Comparing_Artificial_Neural_Network_Decision_Tree_and_Linear_Regression/links/0deec51bb04e76ed93000000.pdf

Koncz, P., & Paralic, J. (2011). An approach to feature selection for sentiment analysis. In INES 2011 - 15th International Conference on Intelligent Engineering Systems, Proceedings (pp. 357–362). https://doi.org/10.1109/INES.2011.5954773

Lin, Y., Li, J., Lin, M., & Chen, J. (2014). A new nearest neighbor classifier via fusing neighborhood information. Neurocomputing, 143, 164–169. https://doi.org/10.1016/j.neucom.2014.06.009

Lopez Guarin, C. E., Guzman, E. L., & Gonzalez, F. A. (2015). A Model to Predict Low Academic Performance at a Specific Enrollment Using Data Mining. Revista Iberoamericana de Tecnologias Del Aprendizaje, 10(3), 119–125. https://doi.org/10.1109/RITA.2015.2452632

Lu, L. R., & Fa, H. Y. (2004). A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification [J]. Journal of Computer Research and Development, 4, 003.

Pandey, M., & Taruna, S. (2016). Towards the integration of multiple classifier pertaining to the Student’s performance prediction. Perspectives in Science, 8, 364–366. https://doi.org/10.1016/j.pisc.2016.04.076

Setiyorini, T., & Asmono, R. T. (2017). Penerapan Gini Index dan K-Nearest Neighbor untuk Klasifikasi Tingkat Kognitif Soal pada Taksonomi Bloom. Jurnal Pilar Nusa Mandiri, 13(2), 209–216.

Setiyorini, T., & Asmono, R. T. (2019). Laporan Akhir Penelitian Mandiri.

Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157

Vercellis, C. (2009). Data mining and optomization for decision making. Business Intelligence (Vol. 1). https://doi.org/10.1017/CBO9781107415324.004

Wang, S., Li, D., Song, X., Wei, Y., & Li, H. (2011). A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications, 38(7), 8696–8702. https://doi.org/10.1016/j.eswa.2011.01.077

Won Yoon, J., & Friel, N. (2015). Efficient model selection for probabilistic K nearest neighbour classification. Neurocomputing, 149(PB), 1098–1108. https://doi.org/10.1016/j.neucom.2014.07.023

Xu, T., Peng, Q., & Cheng, Y. (2012). Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowledge-Based Systems, 35, 279–289. https://doi.org/10.1016/j.knosys.2012.04.011

Yang, F., & Li, F. W. B. (2018). Study on student performance estimation, student progress analysis, and student potential prediction based on data mining. Computers and Education, 123(October 2017), 97–108. https://doi.org/10.1016/j.compedu.2018.04.006

Zhang, J., & Tan, S. (2008). An empirical study of sentiment analysis for chinese documents. EXPERT SYSTEMS WITH APPLICATIONS, 34(4), 2622–2629. https://doi.org/10.1016/j.eswa.2007.05.028

Published
2019-06-26
How to Cite
[1]
T. Setiyorini and R. Asmono, “PENERAPAN METODE K-NEAREST NEIGHBOR DAN INFORMATION GAIN PADA KLASIFIKASI KINERJA SISWA”, jitk, vol. 5, no. 1, pp. 7-14, Jun. 2019.