COMPARISON OF DEEP LEARNING METHODS ON SENTIMENT ANALYSIS USING WORD EMBEDDING
Abstract
According to ICW, corruption cases in Indonesia in the last 5 years have increased and the amount of losses suffered by the state from 2012-2022 reached Rp138.39 trillion. According to Transparency International, Indonesia's CPI ranking decreased in 2023 to 115 compared to 2022 at 110 out of 180 countries. These results show that the response to corruption is still slow and continues to deteriorate due to a lack of support from stakeholders. The purpose of this study is to test and compare the performance of deep learning model algorithms (RNN/LSTM/GRU/Bi-GRU/Bi-LSTM) on sentiment classification using word embedding, and obtain a model architecture that can determine the polarity of a sentence about public sentiment related to corruption in Indonesia, which can help governments, researchers, and practitioners in designing more effective anti-corruption strategies. The dataset used amounted to 1793 derived from crawling Twitter with 3 classes namely positive, negative and neutral. This research starts from data collection, preprocessing, word embedding, splitting the dataset which is divided into 80% training data and 20% test data, deep learning model testing, model evaluation and result representation. Word embedding uses word2vec with a dimension of 300. Based on the experimental results obtained, Bi-GRU has better performance than other architectural models with an accuracy value of 88%, precision 88.07%, recall 86.97% and f1-score 87.51%. The data used in this research is relatively small, it is recommended that future research can overcome it
Downloads
References
Transparency International, “Corruption Perception Index (CPI) Indonesia,” Corruption Perception Index. Accessed: Feb. 01, 2024. [Online]. Available: https://www.transparency.org/en/countries/indonesia
Databoks, “Indonesia Negara Terkorup ke-5 di Asia Tenggara pada 2022.” [Online]. Available: https://databoks.katadata.co.id/datapublish/2023/02/02/indonesia-negara-terkorup-ke-5-di-asia-tenggara-pada-2022
Ying Lin, “10 Twitter Statistics Every Marketer Should Know In 2023 [Infographic],” Oberlo. [Online]. Available: https://www.oberlo.com/blog/twitter-statistics
Y. Romadhoni, K. Fahmi, and H. Holle, “Analisis Sentimen Terhadap PERMENDIKBUD No.30 pada Media Sosial Twitter Menggunakan Metode Naive Bayes dan LSTM,” Jurnal Informatika: Jurnal pengembangan IT (JPIT), vol. 7, no. 2, pp. 118-124, 2022, doi: 10.30591/jpit.v7i2.3191.
M. Z. Rahman, Y. A. Sari, and N. Yudistira, “Analisis Sentimen Tweet COVID-19 menggunakan Word Embedding dan Metode Long Short-Term Memory (LSTM),” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 5, no. 11, pp. 5120–5127, 2021, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/10188.
W. Widayat, “Analisis Sentimen Movie Review menggunakan Word2Vec dan metode LSTM Deep Learning,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 3, p. 1018, Jul. 2021, doi: 10.30865/mib.v5i3.3111.
M. Totox and H. F. Pardede, “Exploring the Effectiveness of Deep Learning in Analyzing Review Sentiment,” JIKO (Jurnal Informatika dan Komputer), vol. 6, no. 2, pp. 116-121, Aug. 2023, doi: 10.33387/jiko.v6i2.6372.
D. I. Af’idah, D. Dairoh, S. F. Handayani, R. W. Pratiwi, and S. I. Sari, “Sentimen Ulasan Destinasi Wisata Pulau Bali Menggunakan Bidirectional Long Short Term Memory,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 607–618, Jul. 2022, doi: 10.30812/matrik.v21i3.1402.
M. Ihsan, Benny Sukma Negara, and Surya Agustian, “LSTM (Long Short Term Memory) for Sentiment COVID-19 Vaccine Classification on Twitter,” Digital Zone: Jurnal Teknologi Informasi dan Komunikasi, vol. 13, no. 1, pp. 79–89, May 2022, doi: 10.31849/digitalzone.v13i1.9950.
M. F. Karaca, “Effects of preprocessing on text classification in balanced and imbalanced datasets,” KSII Transactions on Internet and Information Systems, vol. 18, no. 3, pp. 591–609, Mar. 2024, doi: 10.3837/tiis.2024.03.004.
E. Utami, I. Oyong, S. Raharjo, A. Dwi Hartanto, and S. Adi, “Supervised learning and resampling techniques on DISC personality classification using Twitter information in Bahasa Indonesia,” Applied Computing and Informatics, Sep. 2021, doi: 10.1108/ACI-03-2021-0054.
K. K. Agustiningsih, E. Utami, and M. A. Alsyaibani, “Sentiment Analysis of COVID-19 Vaccines in Indonesia on Twitter Using Pre-Trained and Self-Training Word Embeddings,” Jurnal Ilmu Komputer dan Informasi, vol. 15, no. 1, pp. 39–46, Feb. 2022, doi: 10.21609/jiki.v15i1.1044.
D. Dessi, R. Helaoui, V. Kumar, D. R. Recupero, and D. Riboni, “TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study,” arXiv preprint arXiv:2105.09632, May. 2021, doi: 10.5281/zenodo.4777594.
H. Juwiantho et al., “Sentiment Analysis Twitter Bahasa Indonesia Berbasis Word2Vec Menggunakan Deep Convolutional Neural Network,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 7, no. 1, pp. 181–188, Feb. 2020, doi: 10.25126/jtiik.202071758.
D. Intan Af et al., “Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen,” JPIT (Jurnal Informatika: Jurnal Pengembangan IT), vol. 6, no. 3, pp. 156-161, Oct. 2021, doi: 10.30591/jpit.v6i3.3016.
H. Xia, “Continuous-bag-of-words and Skip-gram for word vector training and text classification,” in Journal of Physics: Conference Series, Institute of Physics, vol. 2634, pp 012052, Nov. 2023. doi: 10.1088/1742-6596/2634/1/012052.
M. Aydoğan and A. Karci, “Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification,” Physica A: Statistical Mechanics and its Applications, vol. 541, p.123288, Mar. 2020, doi: 10.1016/j.physa.2019.123288.
M. U. Salur and I. Aydin, “A Novel Hybrid Deep Learning Model for Sentiment Classification,” IEEE Access, vol. 8, pp. 58080–58093, 2020, doi: 10.1109/ACCESS.2020.2982538.
G. S. N Murthy, S. Rao Allu, B. Andhavarapu, M. Bagadi, and M. Belusonti, “Text based Sentiment Analysis using LSTM,” International Journal Of Engineering Research & Technology (IJERT), vol. 09, no. 05, pp. 299-303, May 2020, [Online]. Available: www.ijert.org
N. R. Bhowmik, M. Arifuzzaman, and M. R. H. Mondal, “Sentiment analysis on Bangla text using extended lexicon dictionary and deep learning algorithms,” Array, vol. 13, p.100123, Mar. 2022, doi: 10.1016/j.array.2021.100123.
A. Hikmah, S. Adi, and M. Sulistiyono, “The Best Parameter Tuning on RNN Layers for Indonesian Text Classification,” in 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2020, Institute of Electrical and Electronics Engineers Inc., Dec. 2020, pp. 94–99. doi: 10.1109/ISRITI51436.2020.9315425.
Copyright (c) 2024 Rizal Gibran Aldrin Pratama, nuri cahyono
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.