IMPLEMENTATION OF GAIN RATIO AND K-NEAREST NEIGHBOR FOR CLASSIFICATION OF STUDENT PERFORMANCE

Penerapan Metode Gain Ratio Dan K-Nearest Neighbor Untuk Klasifikasi Kinerja Siswa

  • Tyas Setiyorini (1*) STMIK Nusa Mandiri
  • Rizky Tri Asmono (2) STMIK Swadharma, Jakarta Indonesia

  • (*) Corresponding Author
Keywords: K-Nearest Neighbor, Student Performance, Gain Ratio Method

Abstract

Predicting student performance is very useful in analyzing weak students and providing support to students who face difficulties. However, the work done by educators has not been effective enough in identifying factors that affect student performance. The main predictor factor is an informative student academic score, but that alone is not good enough in predicting student performance. Educators utilize Educational Data Mining (EDM) to predict student performance. KK-Nearest Neighbor is often used in classifying student performance because of its simplicity, but the K-Nearest Neighbor has a weakness in terms of the high dimensional features. To overcome these weaknesses, a Gain Ratio is used to reduce the high dimension of features. The experiment has been carried out 10 times with the value of k is 1 to 10 using the student performance dataset. The results of these experiments are obtained an average accuracy of 74.068 with the K-Nearest Neighbor and obtained an average accuracy of 75.105 with the Gain Ratio and K-Nearest Neighbor. The experimental results show that Gain Ratio is able to reduce the high dimensions of features that are a weakness of K-Nearest Neighbor, so the implementation of Gain Ratio and K-Nearest Neighbor can increase the accuracy of the classification of student performance compared to using the K-Nearest Neighbor alone.

Downloads

Download data is not yet available.

References

Adeniyi, D. A., Wei, Z., & Yongquan, Y. (2016). Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics, 12(1), 90–108. https://doi.org/10.1016/j.aci.2014.10.001

Altujjar, Y., Altamimi, W., Al-Turaiki, I., & Al-Razgan, M. (2016). Predicting Critical Courses Affecting Students Performance: A Case Study. Procedia Computer Science, 82(March), 65–71. https://doi.org/10.1016/j.procs.2016.04.010

Buttrey, S. E., & Karo, C. (2002). Using k-nearest-neighbor classification in the leaves of a tree. Computational Statistics & Data Analysis, 40(1), 27–37. Retrieved from https://www.sciencedirect.com/science/article/pii/S0167947301000986

Carnegie, D. A., Watterson, C., Andreae, P., & Browne, W. N. (2012). Prediction of success in engineering study. In IEEE Global Engineering Education Conference, EDUCON. https://doi.org/10.1109/EDUCON.2012.6201020

Chen, J., Huang, H., Tian, F., & Tian, S. (2008). A selective Bayes Classifier for classifying incomplete data based on gain ratio. Knowledge-Based Systems, 21(7), 530–534. https://doi.org/10.1016/j.knosys.2008.03.013

Cortez, P., & Silva, A. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), 5–12.

Dai, J., & Xu, Q. (2013). Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Applied Soft Computing Journal, 13(1), 211–221. https://doi.org/10.1016/j.asoc.2012.07.029

de Vries, A. P., Mamoulis, N., Nes, N., & Kersten, M. (2003). Efficient k-NN search on vertically decomposed data (p. 322). https://doi.org/10.1145/564728.564729

Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Erven, G. Van. (2019). Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil. Journal of Business Research, 94(February), 335–343. https://doi.org/10.1016/j.jbusres.2018.02.012

Gou, J., Zhan, Y., Rao, Y., Shen, X., Wang, X., & He, W. (2014). Improved pseudo nearest neighbor classification. Knowledge-Based Systems, 70, 361–375. https://doi.org/10.1016/j.knosys.2014.07.020

Gray, G., McGuinness, C., & Owende, P. (2014). An application of classification models to predict learner progression in tertiary education. In Souvenir of the 2014 IEEE International Advance Computing Conference, IACC 2014 (pp. 549–554). https://doi.org/10.1109/IAdCC.2014.6779384

Han, J., Kamber, M., & Pei, J. (2012). Data Mining Concepts and Techniques. Data Mining. https://doi.org/10.1016/b978-0-12-381479-1.00001-0

Karegowda, A. G., & Manjunath, A. S. (2010). COMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE SELECTION. International Journal of Information Technology and Knowledge Management, 2(2), 271–277. Retrieved from http://csjournals.com/IJITKM/PDF 3-1/19.pdf

Lin, Y., Li, J., Lin, M., & Chen, J. (2014). A new nearest neighbor classifier via fusing neighborhood information. Neurocomputing, 143, 164–169. https://doi.org/10.1016/j.neucom.2014.06.009

Lopez Guarin, C. E., Guzman, E. L., & Gonzalez, F. A. (2015). A Model to Predict Low Academic Performance at a Specific Enrollment Using Data Mining. Revista Iberoamericana de Tecnologias Del Aprendizaje, 10(3), 119–125. https://doi.org/10.1109/RITA.2015.2452632

López, J., & Maldonado, S. (2018). Redefining nearest neighbor classification in high-dimensional settings. Pattern Recognition Letters, 110, 36–43. https://doi.org/10.1016/j.patrec.2018.03.023

Mayilvaganan, M., & Kalpanadevi, D. (2015). Comparison of classification techniques for predicting the performance of students academic environment. In 2014 International Conference on Communication and Network Technologies, ICCNT 2014 (Vol. 2015-March, pp. 113–118). https://doi.org/10.1109/CNT.2014.7062736

Minaei-Bidgoli, B., & Kashy, D. (2003). Predicting student performance: an application of data mining methods with the educational web-based system LON-CAPA. Frontiers in Education, 2003, 1, 1–6.

Pandey, M., & Taruna, S. (2016). Towards the integration of multiple classifier pertaining to the Student’s performance prediction. Perspectives in Science, 8, 364–366. https://doi.org/10.1016/j.pisc.2016.04.076

Quinlan, J. R. (1993). {C4}.5 - Programs for Machine Learning.

Setiyorini, T., & Asmono, R. T. (2019a). Laporan Akhir Penelitian Mandiri. Jakarta.

Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157

Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., & Wang, Z. (2007). A novel feature selection algorithm for text categorization. Expert Systems with Applications, 33(1), 1–5. https://doi.org/10.1016/j.eswa.2006.04.001

Snousy, M. B. Al, El-Deeb, H. M., Badran, K., & Khlil, I. A. Al. (2011). Suite of decision tree-based classification algorithms on cancer gene expression data. Egyptian Informatics Journal, 12(2), 73–82. https://doi.org/10.1016/j.eij.2011.04.003

Villagrá-Arnedo, C. J., Gallego-Durán, F. J., Llorens-Largo, F., Compañ-Rosique, P., Satorre-Cuerda, R., & Molina-Carmona, R. (2017). Improving the expressiveness of black-box models for predicting student performance. Computers in Human Behavior, 72, 621–631. https://doi.org/10.1016/j.chb.2016.09.001

Won Yoon, J., & Friel, N. (2015). Efficient model selection for probabilistic K nearest neighbour classification. Neurocomputing, 149(PB), 1098–1108. https://doi.org/10.1016/j.neucom.2014.07.023

Yang, F., & Li, F. W. B. (2018). Study on student performance estimation, student progress analysis, and student potential prediction based on data mining. Computers and Education, 123(October 2017), 97–108. https://doi.org/10.1016/j.compedu.2018.04.006

Zhang, H., & Sheng, S. (2004). Learning weighted naive bayes with accurate ranking. In Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004 (pp. 567–570). https://doi.org/10.1109/ICDM.2004.10030

Published
2020-03-02
How to Cite
Setiyorini, T., & Asmono, R. (2020). IMPLEMENTATION OF GAIN RATIO AND K-NEAREST NEIGHBOR FOR CLASSIFICATION OF STUDENT PERFORMANCE. Jurnal Pilar Nusa Mandiri, 16(1), 19-24. https://doi.org/10.33480/pilar.v16i1.813
Article Metrics

Abstract viewed = 618 times
PDF downloaded = 422 times