SUPPORT VECTOR MACHINE TO CLASSIFY SENTIMENT REVIEWS ON GOOGLE PLAY STORE
DOI:
https://doi.org/10.33480/jitk.v11i3.7282Keywords:
CRISP-DM, Google Play Store, KNN, Sentiment, SVMAbstract
This research addresses the "rating-content discrepancy" on the Google Play Store, where numerical star ratings often conflict with the actual sentiment of textual reviews. Utilizing the CRISP-DM framework, the study evaluates the effectiveness of machine learning in resolving these inconsistencies by classifying Instagram user reviews into positive and negative categories. Two primary algorithms were compared using a dataset of 600 reviews. The Support Vector Machine (SVM) model demonstrated high efficacy with an accuracy of 0.84. In contrast, the K-Nearest Neighbors (KNN) model performed poorly, achieving an accuracy of only 0.48. This significant performance gap highlights SVM's superior ability to handle high-dimensional text data through optimal hyperplane separation. The research further integrated the Streamlit library to create an interactive web interface for real-time sentiment prediction and result visualization. Ultimately, this study confirms that a structured CRISP-DM approach combined with SVM provides a robust solution for automated opinion mining, offering a reliable methodology for future data science applications in social media analysis
Downloads
References
[1] E. Noei and K. Lyons, “A study of gender in user reviews on the Google Play Store,” Empir Softw Eng, vol. 27, no. 2, p. 34, 2021, doi: 10.1007/s10664-021-10080-8.
[2] Z. Hadi, E. Utami, and D. Ariatmanto, “Detect Fake Reviews Using Random Forest and Support Vector Machine,” SinkrOn, vol. 8, no. 2, pp. 623–630, Apr. 2023, doi: 10.33395/sinkron.v8i2.12090.
[3] R. Kurniawan, H. O. L. Wijaya, and R. P. Aprisusanti, “Sentiment Analysis of Google Play Store User Reviews on Digital Population Identity App Using K-Nearest Neighbors,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 2, pp. 170–178, Jun. 2024, doi: 10.32736/sisfokom.v13i2.2071.
[4] L. W. Rizkallah, “Optimizing SVM hyperparameters for satellite imagery classification using metaheuristic and statistical techniques,” Int. J. Data Sci. Anal., vol. 20, no. 5, pp. 4945–4962, Oct. 2025, doi: 10.1007/s41060-025-00762-7.
[5] A. R. Isnain, J. Supriyanto, and M. P. Kharisma, “Implementation of K-Nearest Neighbor (K-NN) Algorithm For Public Sentiment Analysis of Online Learning,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 2, p. 121, Apr. 2021, doi: 10.22146/ijccs.65176.
[6] R. Wirth and J. Hipp, “CRISP-DM: Towards a Standard Process Model for Data Mining,” 2025. [Online]. Available: www.cs.unibo.it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf
[7] C. Schröer, F. Kruse, and J. M. Gómez, “A Systematic Literature Review on Applying CRISP-DM Process Model,” Procedia Comput Sci, vol. 181, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199.
[8] G. Liu, L. Wang, D. Liu, L. Fei, and J. Yang, “Hyperspectral Image Classification Based on Non-Parallel Support Vector Machine,” Remote Sens. (Basel)., vol. 14, no. 10, May 2022, doi: 10.3390/rs14102447.
[9] R. B. Gumilar and others, “Analisa Perbandingan Algoritma Support Vector Machine dan K-Nearest Neighbors Terhadap Ulasan Aplikasi Vidio,” Journal of Information System Research (JOSH), vol. 5, no. 4, pp. 1188–1195, 2024, doi: 10.47065/josh.v5i4.5640.
[10] A. Dafid et al., “Optimizing K-Nearest Neighbors with Particle Swarm Optimization for Improved Classification Accuracy Corresponding Author,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 11, no. 2, pp. 238–250, 2025, doi: 10.26555/jiteki.v11i2.30775.
[11] I. G. I. Sudipa, R. A. Azdy, I. Arfiani, N. M. Setiohardjo, and Sumiyatun, “Leveraging K-Nearest Neighbors for Enhanced Fruit Classification and Quality Assessment,” Indonesian Journal of Data and Science, vol. 5, no. 1, pp. 30–36, Mar. 2024, doi: 10.56705/ijodas.v5i1.125.
[12] F. Martínez-Plumed and others, “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories,” IEEE Trans Knowl Data Eng, vol. 33, no. 8, pp. 3048–3061, 2021, doi: 10.1109/TKDE.2019.2962680.
[13] J. Brzozowska, J. Pizoń, G. Baytikenova, A. Gola, A. Zakimova, and K. Piotrowska, “DATA ENGINEERING IN CRISP-DM PROCESS PRODUCTION DATA – CASE STUDY,” Applied Computer Science, vol. 19, no. 3, pp. 83–95, 2023, doi: 10.35784/acs-2023-26.
[14] M. A. Hasanah, S. Soim, and A. S. Handayani, “Implementasi CRISP-DM Model Menggunakan Metode Decision Tree dengan Algoritma CART untuk Prediksi Curah Hujan Berpotensi Banjir,” 2021. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[15] M. Liang and T. Niu, “Research on Text Classification Techniques Based on Improved TF-IDF Algorithm and LSTM Inputs,” Procedia Comput Sci, vol. 208, pp. 460–470, 2022, doi: 10.1016/j.procs.2022.10.064.
[16] J. Zhou, Z. Ye, S. Zhang, Z. Geng, N. Han, and T. Yang, “Investigating response behavior through TF-IDF and Word2vec text analysis: A case study of PISA 2012 problem-solving process data,” Heliyon, vol. 10, no. 16, p. e35945, 2024, doi: 10.1016/j.heliyon.2024.e35945.
[17] D. S. Turan and B. Ordin, “The incremental SMOTE: A new approach based on the incremental k-means algorithm for solving imbalanced data set problem,” Inf Sci (N Y), vol. 711, p. 122103, 2025, doi: 10.1016/j.ins.2025.122103.
[18] E. L. T. Tchokote and E. F. Tagne, “Effective multimodal hate speech detection on Facebook hate memes dataset using incremental PCA, SMOTE, and adversarial learning,” Machine Learning with Applications, vol. 20, p. 100647, 2025, doi: 10.1016/j.mlwa.2025.100647.
[19] P. Sun, Z. Wang, L. Jia, and Z. Xu, “SMOTE-kTLNN: A hybrid re-sampling method based on SMOTE and a two-layer nearest neighbor classifier,” Expert Syst Appl, vol. 238, p. 121848, 2024, doi: 10.1016/j.eswa.2023.121848.
[20] S. Matharaarachchi, M. Domaratzki, and S. Muthukumarana, “Enhancing SMOTE for imbalanced data with abnormal minority instances,” Machine Learning with Applications, vol. 18, p. 100597, 2024, doi: 10.1016/j.mlwa.2024.100597.
[21] Md. Shamshuzzoha and others, “A novel framework for seasonal affective disorder detection: Comprehensive machine learning analysis using multimodal social media data and SMOTE,” Acta Psychol (Amst), vol. 256, p. 105005, 2025, doi: 10.1016/j.actpsy.2025.105005.
[22] H. K. Chinmayi and others, “Monitoring legume nutrition with machine learning: The impact of splits in training and testing data,” Appl Soft Comput, vol. 176, p. 113186, 2025, doi: 10.1016/j.asoc.2025.113186.
[23] L. Stanca, D.-C. Dabija, and V. Câmpian, “Qualitative analysis of customer behavior in the retail industry during the COVID-19 pandemic: A word-cloud and sentiment analysis approach,” Journal of Retailing and Consumer Services, vol. 75, p. 103543, 2023, doi: 10.1016/j.jretconser.2023.103543.
[24] H. Ren, Y. Liu, G. Naren, and J. Lu, “The impact of multidirectional text typography on text readability in word clouds,” Displays, vol. 83, p. 102724, 2024, doi: 10.1016/j.displa.2024.102724.
[25] G. Indrawan, H. Setiawan, and A. Gunadi, “Multi-class SVM Classification Comparison for Health Service Satisfaction Survey Data in Bahasa,” HighTech and Innovation Journal, vol. 3, no. 4, pp. 425–442, Dec. 2022, doi: 10.28991/HIJ-2022-03-04-05.
[26] A. Widodo, B. Agus Herlambang, and R. Renaldy, “Optimizing Support Vector Machine (SVM) for Sentiment Analysis of Blu by BCA Reviews with Chi-Square,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
[27] R. Yadav and H. Raheman, “Machine learning-based estimation of agricultural tyre sinkage: A streamlit web application,” J Terramech, vol. 119, p. 101055, 2025, doi: 10.1016/j.jterra.2025.101055.
[28] J. Li and others, “Sentiment Analysis Using E-Commerce Review Keyword-Generated Image with a Hybrid Machine Learning-Based Model,” Computers, Materials and Continua, vol. 80, no. 1, pp. 1581–1599, 2024, doi: 10.32604/cmc.2024.052666.
[29] T. Anderson, S. Sarkar, and R. Kelley, “Analyzing public sentiment on sustainability: A comprehensive review and application of sentiment analysis techniques,” Natural Language Processing Journal, vol. 8, p. 100097, 2024, doi: 10.1016/j.nlp.2024.100097.
[30] Q. Wan, X. Xu, and J. Han, “A dimensionality reduction method for large-scale group decision-making using TF-IDF feature similarity and information loss entropy,” Appl Soft Comput, vol. 150, p. 111039, 2024, doi: 10.1016/j.asoc.2023.111039.
[31] E. Delibaş, “Efficient TF-IDF method for alignment-free DNA sequence similarity analysis,” J Mol Graph Model, vol. 137, p. 109011, 2025, doi: 10.1016/j.jmgm.2025.109011.
[32] R. K. Halder, M. N. Uddin, M. A. Uddin, S. Aryal, and A. Khraisat, “Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications,” J Big Data, vol. 11, no. 1, Dec. 2024, doi: 10.1186/s40537-024-00973-y.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Agus Nursikuwagus, Suherman, Heri Purwanto, Tono Hartono

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






-a.jpg)
-b.jpg)











