K-MEANS-BASED TRAINING DATA PROCESSING FOR IMPROVING TOURISM RECOMMENDATION ACCURACY
DOI:
https://doi.org/10.33480/jitk.v11i4.7274Keywords:
Classification, Clustering, K-Means, Optimization, Pseudo-LabellingAbstract
This study investigates the enhancement of tourism destination recommendation systems through the use of K-Means clustering to improve training data quality and model accuracy. The rapid advancement of information technology has increased the demand for personalized and accurate recommendation systems within the tourism industry. Despite this, achieving high prediction accuracy remains a significant challenge. This study employs K-Means clustering to segment training data into homogeneous clusters, thereby improving data representation and enhancing the predictive accuracy of recommendation models. The research methodology includes a comprehensive literature review, data collection, preprocessing, clustering, and model testing using K-Nearest Neighbors (KNN), Decision Tree, and Naive Bayes algorithms. The results show that after applying K-Means clustering, KNN's accuracy increased by 2.27%, and its kappa and precision values also improved, indicating enhanced reliability and prediction accuracy. Naive Bayes exhibited substantial improvements with a 9.09% increase in accuracy, alongside significant enhancements in kappa and precision metrics. Conversely, the Decision Tree algorithm experienced a decline in performance after clustering. Therefore, clustering techniques are not suitable for application to the Decision Tree algorithm.
Downloads
References
[1] P. J. Antony, R. Kannan, and A. Professor, “Revolutionizing the Tourism Industry through Artificial Intelligence: A Comprehensive Review of AI Integration, Impact on Customer Experience, Operational Efficiency, and Future Trends,” 2024. [Online]. Available: www.chandigarhphilosophers.com
[2] D. Shrestha, T. Wenan, D. Shrestha, N. Rajkarnikar, and S. R. Jeong, “Personalized Tourist Recommender System: A Data-Driven and Machine-Learning Approach,” Computation, vol. 12, no. 3, Mar. 2024, doi: 10.3390/computation12030059.
[3] R. Jiang and B. Dai, “Cultural tourism attraction recommendation model based on optimized weighted association rule algorithm,” Systems and Soft Computing, vol. 6, Dec. 2024, doi: 10.1016/j.sasc.2024.200094.
[4] I. Trišić, S. Stanić Jovanović, S. Štetić, F. Nechita, and A. N. Candrea, “Satisfaction with Sustainable Tourism—A Case of the Special Nature Reserve ‘Meadows of Great Bustard’, Vojvodina Province,” Land (Basel), vol. 12, no. 8, Aug. 2023, doi: 10.3390/land12081511.
[5] B. Heinrich, M. Hopf, D. Lohninger, A. Schiller, and M. Szubartowicz, “Data quality in recommender systems: the impact of completeness of item content data on prediction accuracy of recommender systems,” Electronic Markets, vol. 31, no. 2, pp. 389–409, Jun. 2021, doi: 10.1007/s12525-019-00366-7.
[6] M. A. Hodovychenko and A. A. Gorbatenko, “Recommender systems: models, challenges and opportunities,” Herald of Advanced Information Technology, vol. 6, no. 4, pp. 308–319, Dec. 2023, doi: 10.15276/hait.06.2023.20.
[7] I. H. Sarker, “Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective,” Sep. 01, 2021, Springer. doi: 10.1007/s42979-021-00765-8.
[8] H. Ko, S. Lee, Y. Park, and A. Choi, “A Survey of Recommendation Systems: Recommendation Models, Techniques, and Application Fields,” Jan. 01, 2022, MDPI. doi: 10.3390/electronics11010141.
[9] S. Yadav and Dr. S. Sharma, “Study Of Existing Methods & Techniques Of K-Means Clustering,” Educational Administration: Theory and Practice, pp. 1806–1813, Apr. 2024, doi: 10.53555/kuey.v30i4.1755.
[10] M. Chaudhry, I. Shafi, M. Mahnoor, D. L. R. Vargas, E. B. Thompson, and I. Ashraf, “A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective,” Sep. 01, 2023, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/sym15091679.
[11] M. Kossakov, A. Mukasheva, G. Balbayev, S. Seidazimov, D. Mukammejanova, and M. Sydybayeva, “Quantitative Comparison of Machine Learning Clustering Methods for Tuberculosis Data Analysis †,” Engineering Proceedings, vol. 60, no. 1, 2024, doi: 10.3390/engproc2024060020.
[12] A. P. Joshi and B. V. Patel, “Data Preprocessing: The Techniques for Preparing Clean and Quality Data for Data Analytics Process,” Oriental journal of computer science and technology, vol. 13, no. 0203, pp. 78–81, Jan. 2021, doi: 10.13005/ojcst13.0203.03.
[13] Legito, F. Y. Wattimena, Yulianto Umar Rofi’i, and Munawir, “E-Commerce Product Recommendation System Using Case-Based Reasoning (CBR) and K-Means Clustering,” International Journal Software Engineering and Computer Science (IJSECS), vol. 3, no. 2, pp. 162–173, Aug. 2023, doi: 10.35870/ijsecs.v3i2.1527.
[14] S. Souabi, A. Retbi, M. K. Idrissi, and S. Bennani, “A recommendation approach in social learning based on K-Means clustering,” Advances in Science, Technology and Engineering Systems, vol. 6, no. 1, pp. 719–725, 2021, doi: 10.25046/aj060178.
[15] H. Hu, J. Liu, X. Zhang, and M. Fang, “An Effective and Adaptable K-Means Algorithm for Big Data Cluster Analysis,” Pattern Recognit, vol. 139, Jul. 2023, doi: 10.1016/j.patcog.2023.109404.
[16] M. Rashidi, S. M. SeyedHosseini, and A. Naderan, “Understanding the Relation of Psychological/Behavioral Factors and Cycling During the Covid-19 Pandemic: A Case Study in Iran,” International Journal of Intelligent Transportation Systems Research, vol. 21, no. 1, pp. 207–218, Apr. 2023, doi: 10.1007/s13177-023-00347-3.
[17] C. Gao, X. Yong, Y. L. Gao, and T. Li, “An improved black hole algorithm designed for K-Means clustering method,” Complex and Intelligent Systems, 2024, doi: 10.1007/s40747-024-01420-4.
[18] C. P. Pramod and G. N. Pillai, “K-Means clustering based Extreme Learning ANFIS with improved interpretability for regression problems,” Knowl Based Syst, vol. 215, Mar. 2021, doi: 10.1016/j.knosys.2021.106750.
[19] S. M. Miraftabzadeh, C. G. Colombo, M. Longo, and F. Foiadelli, “K-Means and Alternative Clustering Methods in Modern Power Systems,” 2023, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2023.3327640.
[20] L. N. C. Prakash K, G. Surya Narayana, M. D. Ansari, and V. K. Gunjan, “Optimization of K-Means Clustering with Modified Spiral Phenomena,” in Lecture Notes in Electrical Engineering, Springer Science and Business Media Deutschland GmbH, 2022, pp. 1205–1214. doi: 10.1007/978-981-16-7985-8_126.
[21] A. M. Ikotun and A. E. Ezugwu, “Enhanced Firefly-K-Means Clustering with Adaptive Mutation and Central Limit Theorem for Automatic Clustering of High-Dimensional Datasets,” Applied Sciences (Switzerland), vol. 12, no. 23, Dec. 2022, doi: 10.3390/app122312275.
[22] A. Arya and S. K. Malik, “Software Fault Prediction using K-Mean-Based Machine Learning Approach,” International Journal of Performability Engineering, vol. 19, no. 2, pp. 133–143, Feb. 2023, doi: 10.23940/ijpe.23.02.p6.133143.
[23] M. K. Mim, M. Hasan, A. Hossain, and Y. H. Khan, “An examination of factors affecting tourists’ destination choice: empirical evidence from Bangladesh,” SocioEconomic Challenges, vol. 6, no. 3, pp. 48–61, 2022, doi: 10.21272/sec.6(3).48-61.2022.
[24] H. Hu, Y. Zhang, C. Wang, and P. Yu, “Factors Influencing Tourists’ Intention and Behavior toward Tourism Waste Classification: A Case Study of the West Lake Scenic Spot in Hangzhou, China,” Sustainability (Switzerland), vol. 16, no. 3, Feb. 2024, doi: 10.3390/su16031231.
[25] A. Č. Tanković, I. Bilić, and A. Sohor, “Social Networks Influence in Choosing a Tourist Destination,” Journal of Content, Community and Communication, vol. 15, no. 8, pp. 2–14, 2022, doi: 10.31620/JCCC.06.22/02.
[26] Agustan, U. Rianse, E. Sukotjo, and A. Faslih, “Exploration and implementation of a smart tourism destination with the 6As framework & TOPSIS (case study: Wakatobi, Indonesia),” Scientific Review Engineering and Environmental Sciences, vol. 33, no. 4, pp. 419–442, 2024, doi: 10.22630/srees.9760.
[27] A. E. Karrar, “The Effect of Using Data Pre-Processing by Imputations in Handling Missing Values,” Indonesian Journal of Electrical Engineering and Informatics, vol. 10, no. 2, pp. 375–384, Jun. 2022, doi: 10.52549/ijeei.v10i2.3730.
[28] N. Kosaraju, S. R. Sankepally, and K. Mallikharjuna Rao, “Categorical Data: Need, Encoding, Selection of Encoding Method and Its Emergence in Machine Learning Models—A Practical Review Study on Heart Disease Prediction Dataset Using Pearson Correlation,” in Lecture Notes in Networks and Systems, vol. 551, Springer Science and Business Media Deutschland GmbH, 2023, pp. 369–382. doi: 10.1007/978-981-19-6631-6_26.
[29] Q. Tian and J. Sun, “Cluster-based Dual-branch Contrastive Learning for unsupervised domain adaptation person re-identification,” Knowl Based Syst, vol. 280, Nov. 2023, doi: 10.1016/j.knosys.2023.111026.
[30] N. Ahmed, R. Amin, H. Aldabbas, D. Koundal, B. Alouffi, and T. Shah, “Machine Learning Techniques for Spam Detection in Email and IoT Platforms: Analysis and Research Challenges,” 2022, Hindawi Limited. doi: 10.1155/2022/1862888.
[31] J. E. Simarmata, G.-W. Weber, and D. Chrisinta, “Performance Evaluation of Classification Methods on Big Data: Decision Trees, Naive Bayes, K-Nearest Neighbors, and Support Vector Machines,” Jurnal Matematika, Statistika Dan Komputasi, vol. 20, no. 3, pp. 623–638, 2024, doi: 10.20956/j.v20i3.32970.
[32] G. Chandra, J. Wang, P. Siirtola, and J. Röning, “Leveraging machine learning for predicting acute graft-versus-host disease grades in allogeneic hematopoietic cell transplantation for T-cell prolymphocytic leukaemia,” BMC Med Res Methodol, vol. 24, no. 1, p. 112, May 2024, doi: 10.1186/s12874-024-02237-y.
[33] J. E. Simarmata, G.-W. Weber, and D. Chrisinta, “Performance Evaluation of Classification Methods on Big Data: Decision Trees, Naive Bayes, K-Nearest Neighbors, and Support Vector Machines,” Jurnal Matematika, Statistika dan Komputasi, vol. 20, no. 3, pp. 623–638, May 2024, doi: 10.20956/j.v20i3.32970.
[34] D. G. Dauner, E. Leal, T. J. Adam, R. Zhang, and J. F. Farley, “Evaluation of four machine learning models for signal detection,” Ther Adv Drug Saf, vol. 14, Jan. 2023, doi: 10.1177/20420986231219472.
[35] A. Doewes, N. A. Kurdhi, and A. Saxena, “Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring,” 2023, doi: 10.5281/zenodo.8115784.
[36] S. Marzukhi, N. Awang, S. N. Alsagoff, and H. Mohamed, “RapidMiner and Machine Learning Techniques for Classifying Aircraft Data,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Aug. 2021. doi: 10.1088/1742-6596/1997/1/012012.
[37] M. W. Ningtyas and F. S. Pribadi, “Soybean Collect Recommender Based on Distance and Productivity Cluster Using K-Means Clustering and Simple Additive Weighting Method,” Elinvo (Electronics, Informatics, and Vocational Education), vol. 8, no. 1, pp. 86–95, Jun. 2023, doi: 10.21831/elinvo.v8i1.53208.
[38] W. Yang, R. Zhang, J. Chen, L. Wang, and J. Kim, “Prototype-Guided Pseudo Labeling for Semi-Supervised Text Classification,” Long Papers, 2023, pp. 16369–16382.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Candra Agustina, Purwanto Purwanto, Farikhin Farikhin, Eka Rahmawati

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






-a.jpg)
-b.jpg)











