PENERAPAN GINI INDEX DAN K-NEAREST NEIGHBOR UNTUK KLASIFIKASI TINGKAT KOGNITIF SOAL PADA TAKSONOMI BLOOM

Tyas Setiyorini; Rizky Tri Asmono

Authors

Tyas Setiyorini Teknik Informatika STMIK Nusa Mandiri Jakarta
Rizky Tri Asmono Teknik Informatika STMIK Swadharma

Keywords:

K-Nearest Neighbor, Taksonomi Bloom, Gini Index, Clasification

Abstract

Sebagai pedoman dalam merancang ujian yang layak, yang terdiri dari soal-soal yang memiliki berbagai tingkatan secara kognitif, Taksonomi Bloom telah diterapkan secara luas. Saat ini, kalangan pendidik mengidentifikasi tingkat kognitif soal pada Taksonomi Bloom masih menggunakan cara manual. Hanya sedikit pendidik yang dapat mengidentifikasi tingkat kognitif dengan benar, sebagian besar melakukan kesalahan dalam mengklasifikasikan soal-soal. K-Nearest Neighbor (KNN) adalah metode yang efektif untuk klasifikasi tingkat kognitif soal pada Taksonomi Bloom, tetapi KNN memiliki kelemahan yaitu kompleksitas komputasi kemiripan datanya besar apabila dimensi fitur datanya tinggi. Untuk menyelesaikan kelemahan tersebut diperlukan metode Gini Index untuk mengurangi dimensi fitur yang tinggi. Beberapa percobaan dilakukan untuk memperoleh arsitektur yang terbaik dan menghasilkan klasifikasi yang akurat. Hasil dari 10 percobaan pada dataset Question Bank dengan KNN diperoleh akurasi tertinggi yaitu 59,97% dan kappa tertinggi yaitu 0,496. Kemudian pada KNN+Gini Index diperoleh akurasi tertinggi yaitu 66,18% dan kappa tertinggi yaitu 0,574. Berdasarkan hasil tersebut maka dapat disimpulkan bahwa Gini Index mampu mengurangi dimensi fitur yang tinggi, sehingga meningkatkan kinerja KNN dan meningkatkan tingkat akurasi klasifikasi tingkat kognitif soal pada Taksonomi Bloom.

Downloads

Download data is not yet available.

References

Aghbari, Z. Al. (2005). Array-index : a plug & search K nearest neighbors method for high-dimensional data. Data & Knowledge Engineering, 52, 333–352. https://doi.org/10.1016/j.datak.2004.06.015

Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals. Longman Group.

Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. https://doi.org/10.1002/widm.8

Cherfi, H., Napoli, A., & Toussaint, Y. (2005). Towards a text mining methodology using association rule extraction. Soft Computing, 10(5), 431–441. https://doi.org/10.1007/s00500-005-0504-x

Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory IT, (11), 21–27.

DD, L., & Ringuette, M. (1994). Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval ({SDAIR?4)}.

de Vries, A. P., Mamoulis, N., Nes, N., & Kersten, M. (2002). Efficient k-NN search on vertically decomposed data. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data - SIGMOD ’02, 322. https://doi.org/10.1145/564728.564729

Esuli, A., Fagni, T., & Sebastiani, F. (2008). Boosting multi-label hierarchical text categorization. Information Retrieval, 11(4), 287–313. https://doi.org/10.1007/s10791-008-9047-y

Fayyad, U. et al. (1996). From Data Mining to Knowledge Discovery in Databases. The Computer Journal, 58(1), 1–6. https://doi.org/10.1093/comjnl/bxt107

Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res, 3, 1289–1305.

Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-Scale Bayesian Logistic Regression for Text Categorization. Technometrics, 49(3), 291–304. https://doi.org/10.1198/004017007000000245

Gorunescu, F. (2011). Data Mining: Concepts, models and techniques (2011th ed.). Springer. Retrieved from http://books.google.com/books?hl=en&lr=&id=yJvKY-sB6zkC&oi=fnd&pg=PP2&dq=Data+Mining:+Concepts,+Models+and+Techniques&ots=prPCrasSvA&sig=BQoeQzA2JgBjjkngCL1_XpvpfDY

Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN Model-Based Approach in Classification. On the Move to Meaningful Internet …, 986–996. https://doi.org/10.1007/978-3-540-39964-3_62

Hao, P.-Y., Chiang, J.-H., & Tu, Y.-K. (2007). Hierarchically SVM classification based on support vector clustering method and its application to document categorization. Expert Systems with Applications, 33(3), 627–635. https://doi.org/10.1016/j.eswa.2006.06.009

HUI, C. J. (2009). Feature Reduction For Neural Network In Determining The Bloom’s Cognitive Level Of Question Items, (October).

Jain, M. M., & Richariya, P. V. (2012). An Improved Techniques Based on Naive Bayesian for Attack Detection. International Journal of Emerging Technology and Advanced Engineering Website:, 2(1), 324–331.

Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proceeedings of the 10th European Conference on Machine Learning, 137–142. https://doi.org/10.1007/BFb0026683

Jones, K. O., Harland, J., Reid, J. M. V, & Bartlett, R. (2009). Relationship between examination questions and bloom’s taxonomy. In Proceedings - Frontiers in Education Conference, FIE. https://doi.org/10.1109/FIE.2009.5350598

Khairuddin, N. N., & Hashim, K. (2008). Application of Bloom’s taxonomy in software engineering assessments. Proceedings of the 8th Conference on Applied Computer Science, 66–69. Retrieved from http://portal.acm.org/citation.cfm?id=1504034.1504048

Koncz, P., & Paralic, J. (2011). An approach to feature selection for sentiment analysis. 2011 15th IEEE International Conference on Intelligent Engineering Systems. https://doi.org/10.1109/INES.2011.5954773

Larose, D. T. (2006a). Data Mining Methodes And Model. https://doi.org/10.1002/0471756482

Larose, D. T. (2006b). Data Mining Methods and Model. New Jersey: John Willey & Sons, Inc.

Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval (pp. 4–15). https://doi.org/10.1007/BFb0026666

Liaw, Yi-Ching, Leou Maw-Lin, W. C.-M. (2010). Fast exact k nearest neighbors search using anorthogonal search tree. Pattern Recognition, 43(6), 2351–2358. https://doi.org/10.1016/j.patcog.2010.01.003

Liaw, Y. C., Wu, C. M., & Leou, M. L. (2010). Fast k-nearest neighbors search using modified principal axis search tree. Digital Signal Processing: A Review Journal, 20(5), 1494–1501. https://doi.org/10.1016/j.dsp.2010.01.009

Lu, L. R., & Fa, H. Y. (2004). A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification [J]. Journal of Computer Research and Development, 4, 3.

Maimon, Oded&Rokach, L. (2010). Data mining and knowledge discovey handbook. New York: Springer.

Miner, G. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. https://doi.org/10.1016/B978-0-12-386979-1.09002-2

Mladenic, D., & Grobelnik, M. (1999). Feature selection for unbalanced class distribution and naive bayes. In Proceedings of the Sixteenth International Conference (ICML 1999) (pp. 258–267). https://doi.org/10.1214/aoms/1177705148.

Mladenić, D., & Grobelnik, M. (2003). Feature selection on hierarchy of web documents. Decision Support Systems, 35(1), 45–87. https://doi.org/10.1016/S0167-9236(02)00097-0

Setiyorini, T, Asmono, R, T. (2017). Laporan Akhir Penelitian Mandiri. Jakarta: STMIK Nusa Mandiri

Shang, W., Huang, H., & Zhu, H. (2007). A novel feature selection algorithm for text categorization, 33, 1–5. https://doi.org/10.1016/j.eswa.2006.04.001

Shankar, S., & Karypis, G. (2000). A Feature Weight Adjustment Algorithm for Document Categorization.

SU, J.-S. (2006). Advances in Machine Learning Based Text Categorization. Journal of Software, 17(9), 1848. https://doi.org/10.1360/jos171848

Sumanto. (2014). Statistika Deskriptif. Yogyakarta: Center of Academic Publishing Service.

Supriyanto, C., Yusof, N., Nurhadiono, B., & Sukardi. (2013). Two-level feature selection for naive bayes with kernel density estimation in question classification based on Bloom’s cognitive levels. In 2013 International Conference on Information Technology and Electrical Engineering (ICITEE) (pp. 237–241). IEEE. https://doi.org/10.1109/ICITEED.2013.6676245

Tan, S. (2005). Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 28(4), 667–671. https://doi.org/10.1016/j.eswa.2004.12.023

Thompson, E., Luxton-Reilly, A., Whalley, J. L., Hu, M., & Robbins, P. (2008). Bloom’s taxonomy for CS assessment. Conferences in Research and Practice in Information Technology Series, 78, 155–161.

Wang, S., Li, D., Song, X., Wei, Y., & Li, H. (2011). A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications, 38(7), 8696–8702. https://doi.org/10.1016/j.eswa.2011.01.077

Weiss, S. M., Apte, C., Damerau, F. J., Johnson, D. E., Oles, F. J., Goetz, T., & Hampp, T. (1999). Maximizing text-mining performance. Intelligent Systems and Their Applications, IEEE, 14(4), 63–69. https://doi.org/10.1109/5254.784086

Xu, T., Peng, Q., & Cheng, Y. (2012). Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowledge-Based Systems, 35, 279–289. https://doi.org/10.1016/j.knosys.2012.04.011

Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’99 (pp. 42–49). https://doi.org/10.1145/312624.312647

Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the 14 th International Conference on Machine Learning. (pp. 412–420).

Yusof, N., & Hui, C. J. (2010). Determination of Bloom’s cognitive level of question items using artificial neural network. In Proceedings of the 2010 10th International Conference on Intelligent Systems Design and Applications, ISDA’10 (pp. 866–870). https://doi.org/10.1109/ISDA.2010.5687152

PENERAPAN GINI INDEX DAN K-NEAREST NEIGHBOR UNTUK KLASIFIKASI TINGKAT KOGNITIF SOAL PADA TAKSONOMI BLOOM

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Current Issue

Information

Language