SOCIAL MEDIA COMMENTS FOR GOVERNMENT INSTITUTION VIDEO CLASSIFICATION USING MACHINE LEARNING
Abstract
YouTube is a social media site that is quite familiar and is used as a means of disseminating video-based information. With a fairly high number of users, YouTube can become a communication medium for audiences, including government agencies. The user’s responses in comments reflect the nuance of the presented video. This research aims to determine the best algorithm for classifying video types based on user comments. Several machine learning algorithms used to carry out classification are Decision Tree, Random Forest, K-Nearest Neighbor, Support Vector Machine, and Logistic Regression. K-Fold Cross Validation was chosen as a method to evaluate the performance of classification algorithms based on the accuracy values. of these algorithms in classifying YouTube videos based on comments. The first experiment with the highest ratio of training and test data for each algorithm was obtained at a ratio of 90:10, with respectively 78.99%, 86.21%, 84.01%, 72.72%, and 79.31%. In the second experiment with k-fold cross validation using a ratio of 90:10, the highest accuracy for each algorithm was obtained at a value of k = 10, which was respectively 74.39%, 81.34%, 78.05%, 85.21%, and 72.15%. From these results, it can be concluded that the most suitable algorithm for classifying YouTube videos based on comments is the Random Forest algorithm with a training and test data ratio of 90:10 and SVM with 10-cross-fold validation. These results show that a larger portion of data for learning has a positive impact on algorithm performance.
Downloads
References
S. Ashour, “Quality in Higher Education Quality higher education is the foundation of a knowledge society : where does the UAE stand ?,” Quality in Higher Education, vol. 26, no. 2, pp. 209–223, 2020, doi: 10.1080/13538322.2020.1769263.
R. D. Van Schalkwyk, J. Maritz, and R. J. Steenkamp, “Quality in Higher Education Sociotechnical service quality for students and academics at private higher education institutions in South Africa,” Quality in Higher Education, vol. 27, no. 1, pp. 77–98, 2021, doi: 10.1080/13538322.2020.1815284.
A. M. Bhandarkar, A. Kumar, and R. Nayak, “Impact of social media on the academic performance of undergraduate medical students,” Med J Armed Forces India, vol. 77, pp. S37–S41, 2020, doi: 10.1016/j.mjafi.2020.10.021.
W. Mohammed, T. Alanzi, F. Alanezi, H. Alhodaib, and M. AlShammari, “Usage of social media for health awareness purposes among health educators and students in Saudi Arabia,” Inform Med Unlocked, vol. 23, p. 100553, 2021, doi: 10.1016/j.imu.2021.100553.
D. Tolkach and S. Pratt, “Travel Professors: A YouTube channel about tourism education & research,” J Hosp Leis Sport Tour Educ, vol. 28, no. September 2020, 2021, doi: 10.1016/j.jhlste.2021.100307.
T. Ahmad, K. Sattar, and A. Akram, “Medical professionalism videos on YouTube: Content exploration and appraisal of user engagement,” Saudi J Biol Sci, vol. 27, no. 9, pp. 2287–2292, 2020, doi: 10.1016/j.sjbs.2020.06.007.
M. C. Meacham, E. A. Vogel, J. Thrul, D. E. Ramo, and D. D. Satre, “Addressing cigarette smoking cessation treatment challenges during the COVID-19 pandemic with social media,” J Subst Abuse Treat, vol. 129, no. February, p. 108379, 2021, doi: 10.1016/j.jsat.2021.108379.
H. A. Fang et al., “An evaluation of social media utilization by general surgery programs in the COVID-19 era,” Am J Surg, vol. 222, no. 5, pp. 937–943, 2021, doi: 10.1016/j.amjsurg.2021.04.014.
G. Appel, L. Grewal, R. Hadi, and A. T. Stephen, “The future of social media in marketing,” J Acad Mark Sci, vol. 48, no. 1, pp. 79–95, 2020, doi: 10.1007/s11747-019-00695-1.
W. R. Fitriani, A. B. Mulyono, A. N. Hidayanto, and Q. Munajat, “Reviewer’s communication style in YouTube product-review videos: does it affect channel loyalty?,” Heliyon, vol. 6, no. 9, p. e04880, 2020, doi: 10.1016/j.heliyon.2020.e04880.
G. Yavetz and N. Aharony, “Social media in government offices: usage and strategies,” Aslib Journal of Information Management, vol. 72, no. 4, pp. 445–462, 2020, doi: 10.1108/AJIM-11-2019-0313.
Z. Zhu, Y. Liu, N. Kapucu, and Z. Peng, “Online media and trust in government during crisis: The moderating role of sense of security,” International Journal of Disaster Risk Reduction, vol. 50, p. 101717, 2020, doi: 10.1016/j.ijdrr.2020.101717.
Y. Ryoo, H. Yu, and E. Han, “Political YouTube Channel Reputation (PYCR): Development and validation of a multidimensional scale,” Telematics and Informatics, vol. 61, no. March, 2021, doi: 10.1016/j.tele.2021.101606.
T. Notley, M. Dezuanni, S. Chambers, and S. Park, “Using YouTube to seek answers and make decisions: Implications for Australian adult media and information literacy,” Comunicar, vol. 31, no. 77, 2023, doi: 10.3916/C77-2023-06.
Y. Goldberg, Neural network methods for natural language processing. Springer Nature, 2022.
Y. Piris and A. C. Gay, “Customer satisfaction and natural language processing,” J Bus Res, vol. 124, no. January 2020, pp. 264–271, 2021, doi: 10.1016/j.jbusres.2020.11.065.
Alamsyah, B. Prasetiyo, M. F. Al Hakim, and F. D. Pradana, “The improvement of COVID-19 prediction accuracy using optimal parameters in reccurent neural network model,” in AIP Conference Proceedings, Semarang, 2023, p. 040019. doi: 10.1063/5.0125767.
A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, “A survey on text classification algorithms: From text to predictions,” Information, vol. 13, no. 2, p. 83, 2022.
S. Subhan, M. F. Al Hakim, P. Listiaji, and W. Syafrizal, “Modeling news topics on government youtube channels with latent Dirichlet allocation method,” in AIP Conference Proceedings, 2023, pp. 400091–400095. doi: 10.1063/5.0125954.
D. Srivamsi, O. M. Deepak, M. D. A. Praveena, and A. Christy, “Cosine Similarity Based Word2Vec Model for Biomedical Data Analysis,” in 7th International Conference on Trends in Electronics and Informatics, ICOEI 2023 - Proceedings, 2023. doi: 10.1109/ICOEI56765.2023.10125794.
E. M. Dharma, F. L. Gaol, H. L. H. S. Warnars, and B. Soewito, “the Accuracy Comparison Among Word2Vec, Glove, and Fasttext Towards Convolution Neural Network (Cnn) Text Classification,” J Theor Appl Inf Technol, vol. 100, no. 2, pp. 349–359, 2022.
A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, “ Survey on Text Classification Algorithms: From Text to Predictions,” Information (Switzerland), vol. 13, no. 2, pp. 1–39, 2022, doi: 10.3390/info13020083.
Q. Jiao and S. Zhang, “A Brief Survey of Word Embedding and Its Recent Development,” in IAEAC 2021 - IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference, 2021. doi: 10.1109/IAEAC50856.2021.9390956.
Derisma, D. Yendri, and M. Silvana, “Comparing the classification methods of sentiment analysis on a public figure on indonesian-language social media,” J Theor Appl Inf Technol, vol. 98, no. 8, pp. 1214–1220, 2020.
R. Amanda and E. S. Negara, “Analysis and implementation machine learning for youtube data classification by comparing the performance of classification algorithms,” Jurnal Online Informatika, vol. 5, no. 1, pp. 61–72, 2020.
M. Jena and S. Dehuri, “DecisionTree for Classification and Regression: A State-of-the Art Review,” Informatica, vol. 44, no. 4, 2020.
A. Plaia, S. Buscemi, J. Fürnkranz, and E. L. Mencía, “Comparing boosting and bagging for decision trees of rankings,” J Classif, pp. 1–22, 2022.
T. A. Assegie, R. L. Tulasi, and N. K. Kumar, “Breast cancer prediction model with decision tree and adaptive boosting,” IAES International Journal of Artificial Intelligence, vol. 10, no. 1, p. 184, 2021.
X. Chen, “Research on the Application of Decision Tree Algorithm in Agricultural Economic Development,” in 2nd IEEE International Conference on Data Science and Information System, ICDSIS 2024, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/ICDSIS61070.2024.10594692.
A. K. Balyan et al., “A hybrid intrusion detection model using ega-pso and improved random forest method,” Sensors, vol. 22, no. 16, p. 5986, 2022.
J. B. Awotunde, F. E. Ayo, R. Panigrahi, A. Garg, A. K. Bhoi, and P. Barsocchi, “A Multi-level Random Forest Model-Based Intrusion Detection Using Fuzzy Inference System for Internet of Things Networks,” International Journal of Computational Intelligence Systems, vol. 16, no. 1, p. 31, 2023.
R. Saini, “Integrating Vegetation Indices and Spectral Features for Vegetation Mapping from Multispectral Satellite Imagery Using AdaBoost and Random Forest Machine Learning Classifiers,” Geomatics and Environmental Engineering, vol. 17, no. 1, pp. 57–74, 2023.
P. H. Progga, M. J. Rahman, S. Biswas, M. S. Ahmed, and D. M. Farid, “K-Nearest Neighbour Classifier for Big Data Mining based on Informative Instances,” in 2023 IEEE 8th International Conference for Convergence in Technology, I2CT 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/I2CT57861.2023.10126147.
K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A comparative analysis of logistic regression, random forest and KNN models for the text classification,” Augmented Human Research, vol. 5, pp. 1–16, 2020.
S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Sci Rep, vol. 12, no. 1, pp. 1–11, 2022, doi: 10.1038/s41598-022-10358-x.
M. Arya and C. S. S. Bedi, “Survey on SVM and their application in image classification,” International Journal of Information Technology, vol. 13, no. 5, pp. 1867–1877, 2021, doi: 10.1007/s41870-017-0080-1.
J. Alcaraz, M. Labbé, and M. Landete, “Support Vector Machine with feature selection: A multiobjective approach,” Expert Syst Appl, vol. 204, 2022, doi: 10.1016/j.eswa.2022.117485.
C. Brito-Pacheco, C. Brito-Loeza, and A. Martin-Gonzalez, “A regularized logistic regression based model for supervised learning,” J Algorithm Comput Technol, vol. 14, 2020, doi: 10.1177/1748302620971535.
R. Wang, N. Xiu, and C. Zhang, “Greedy Projected Gradient-Newton Method for Sparse Logistic Regression,” IEEE Trans Neural Netw Learn Syst, vol. 31, no. 2, pp. 527–538, 2020, doi: 10.1109/TNNLS.2019.2905261.
N. Srimaneekarn, A. Hayter, W. Liu, and C. Tantipoj, “Binary Response Analysis Using Logistic Regression in Dentistry,” 2022. doi: 10.1155/2022/5358602.
N. Hidayat, M. F. Al Hakim, and J. Jumanto, “Halal Food Restaurant Classification Based on Restaurant Review in Indonesian Language Using Machine Learning,” Scientific Journal of Informatics, vol. 8, no. 2, pp. 314–319, 2021, doi: 10.15294/sji.v8i2.33395.
Copyright (c) 2024 M. Faris Al Hakim, Subhan Subhan, Prasetyo Listiaji
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.