UNVEILING GENDER FROM INDONESIAN NAMES USING RANDOM FOREST AND LOGISTIC REGRESSION ALGORITHMS
Abstract
Gender detection can be done in many ways, some of these ways by using image identification such as the process of image identification based on faces or image shapes, on the other hand image identification and detection can also be done based on text or written data. The usefulness of gender identification can be used in various aspects of life, ranging from greetings such as ladies and gentlemen, which will certainly make the person concerned feel more appreciated by the accuracy of the pronunciation of the name. This gender identification and detection process can be done by making class predictions on predetermined gender label classes. Of course, each name in various languages has different characteristics in identifying and representing each gender, as well as Indonesian names that have diversity and unique levels of variation. The purpose of this study is to test the results of the algorithm in classification based on class labels. The application of this detection uses two algorithms, namely Random Forest and Logistic Regression. Both of these algorithms can predict classes with perfect accuracy in 6 experimental data, then the results of 526 experimental data resulted in a final accuracy of 0.94 for logistic regression and 0.93 for random forest. The advantage with a thin difference in this case is in the Logistic Regression algorithm.
References
Alanazi, S. A. (2019). Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis. IEEE Access, 7, 111931–111943. https://doi.org/10.1109/ACCESS.2019.2932026
Bartl, M., & Leavy, S. (2022). Inferring Gender: A Scalable Methodology for Gender Detection with Online Lexical Databases. Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, 47–58. https://doi.org/10.18653/v1/2022.ltedi-1.7
Caliskan, A., Ajay, P. P., Charlesworth, T., Wolfe, R., & Banaji, M. R. (2022). Gender Bias in Word Embeddings. Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 156–170. https://doi.org/10.1145/3514094.3534162
Charlesworth, T. E. S., Yang, V., Mann, T. C., Kurdi, B., & Banaji, M. R. (2021). Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words. Psychological Science, 32(2), 218–240. https://doi.org/10.1177/0956797620963619
Cryan, J., Tang, S., Zhang, X., Metzger, M., Zheng, H., & Zhao, B. Y. (2020). Detecting Gender Stereotypes: Lexicon vs. Supervised Learning Methods. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–11. https://doi.org/10.1145/3313831.3376488
Dinan, E., Fan, A., Wu, L., Weston, J., Kiela, D., & Williams, A. (2020). Multi-Dimensional Gender Bias Classification. http://arxiv.org/abs/2005.00614
Galih Pradana, M., Palilingan, K., Vanli Akay, Y., Puspasari Wijaya, D., & Hari Saputro, P. (2023). Comparison of Multi Layer Perceptron, Random Forest & Logistic Regression on Students Performance Test. 462–466. https://doi.org/10.1109/icimcis56303.2022.10017501
HaCohen-Kerner, Y. (2022). Survey on profiling age and gender of text authors. Expert Systems with Applications, 199, 117140. https://doi.org/10.1016/j.eswa.2022.117140
Karami, A., Lundy, M., Webb, F., & Dwivedi, Y. K. (2020). Twitter and Research: A Systematic Literature Review Through Text Mining. IEEE Access, 8, 67698–67717. https://doi.org/10.1109/ACCESS.2020.2983656
Kumar, J. A., Trueman, T. E., & Cambria, E. (2022). Gender-based multi-aspect sentiment detection using multilabel learning. Information Sciences, 606, 453–468. https://doi.org/10.1016/j.ins.2022.05.057
Musthofa Galih Pradana, H. K. (2023). Analisis Performa Algoritma Convolutional Neural Networks Menggunakan Arsitektur Lenet Dan Vgg16. Indonesian Journal of Business Intelligence (IJUBI), 6(2), 54–60.
Qiao, W., Khishe, M., & Ravakhah, S. (2021). Underwater targets classification using local wavelet acoustic pattern and Multi-Layer Perceptron neural network optimized by modified Whale Optimization Algorithm. Ocean Engineering, 219(June 2020), 108415. https://doi.org/10.1016/j.oceaneng.2020.108415
Sari, Y. (2022). Ekstraksi Fitur dan Aplikasinya pada Citra 2D. Perahu Litera.
Shah, K., Patel, H., Sanghvi, D., & Shah, M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification. Augmented Human Research, 5(1). https://doi.org/10.1007/s41133-020-00032-0
Yang, Y.-C., Al-Garadi, M. A., Love, J. S., Perrone, J., & Sarker, A. (2021). Automatic gender detection in Twitter profiles for health-related cohort studies. JAMIA Open, 4(2). https://doi.org/10.1093/jamiaopen/ooab042
Copyright (c) 2024 Musthofa Galih Pradana, Pujo Hari Saputro, Dyah Listianing Tyas
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The copyright of any article in the TECHNO Nusa Mandiri Journal is fully held by the author under the Creative Commons CC BY-NC license.
- The copyright in each article belongs to the author.
- Authors retain all their rights to published works, not limited to the rights set out on this page.
- The author acknowledges that Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri) is the first to publish with a Creative Commons Attribution 4.0 International license (CC BY-NC).
- Authors can enter articles separately, manage non-exclusive distribution, from manuscripts that have been published in this journal into another version (for example: sent to author affiliation respository, publication into books, etc.), by acknowledging that the manuscript was published for the first time in Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri);
- The author guarantees that the original article, written by the stated author, has never been published before, does not contain any statements that violate the law, does not violate the rights of others, is subject to the copyright which is exclusively held by the author.
- If an article was prepared jointly by more than one author, each author submitting the manuscript warrants that he has been authorized by all co-authors to agree to copyright and license notices (agreements) on their behalf, and agrees to notify the co-authors of the terms of this policy. Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri) will not be held responsible for anything that may have occurred due to the author's internal disputes.