COMPARISON OF KNN, NAIVE BAYES, DECISION TREE, ENSEMBLE, REGRESSION METHODS FOR INCOME PREDICTION
Abstract
Using the income classification dataset, we performed data analysis with the help of data mining to gather interesting information from the available data. Currently, data processing can be done using many tools. One of the tools that we use for data processing is the orange application. By using the dataset we looked at the welfare level ranging from marital status, school, gender, and from all fields related to income ranging from sales, to daily life to find out the income earned by employees or workers from several countries such as the United States, Cambodia, United Kingdom, Puerto-Rico, Canada, Germany, Outer US (Guam-USVI-etc). The purpose of this analysis is to determine the hourly income in one week that can affect the income classification. The classification technique uses various classification models, namely the K-Nearest Neighbor (KNN) algorithm model, Naïve Bayes, Decision Tree, Esemble Method and Linear Regression algorithm. The results of the analysis based on the test results of various algorithm models can be concluded that the best algorithm model for measuring workers' income is to use the Naive Bayes Decision. Analysis of variables based on Hours-per-Week and Capital-Gain affects Income Classification which determines whether the income earned is more than 50 thousand/50 K and the analysis results in a prediction of a person's income level.
References
Dachi, J. (2023). Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit.
Djamaludin, M. A., Triayudi, A., & Mardiani, E. (2022). Analisis Sentimen Tweet KRI Nanggala 402 di Twitter menggunakan Metode Naïve Bayes Classifier. Jurnal JTIK (Jurnal Teknologi Informasi dan Komunikasi).
Giri, G. A. (2018). Klasifikasi Musik Berdasarkan Genre dengan Metode K-Nearest Neighbor. Jurnal Ilmu Komputer VOL. XI No. 2.
Hozairi, H., Anwari, A., & Alim, S. (2021). Implementasi Orange Data Mining Untuk Klasifikasi Kelulusan Mahasiswa Dengan Model K-Nearest Neighbor, Decision Tree Serta Naive Bayes. Nero (Networking Engineering Research Operation).
Indrapras, K., Alfaris, M., Falentina, & Triana, A. (2022). Analisis Big Data dan Official Statistics dalam Melakukan Nowcasting Pertumbuhan Ekonomi Indonesia Sebelum dan Selama Pandemi COVID-19. Seminar Nasional Official Statistics 2022, https://prosiding.stis.ac.id/index.php/semnasoffstat.
Indriyawati, H., & Khoirudin. (2019). Penerapan Metode Regresi Linier Dalam Koherensi Pengolahan Data Bahan Baku Tiandra Store Guna Meningkatkan Mutu Produksi. Sintak Prosiding,https://www.unisbank.ac.id/ojs/index.php/sintak/article/view/7603.
Karo, G. E., Erwansyah, K., & Suharsil. (2020). Implementasi Data Mining Dalam Mengestimasi Pendapatan Pada Pt Citosarana Jasa. Jurnal Cyber Tech.
Laksono, R. A., Achmadi, S., & Sasmito, A. P. (2023). Implementasi Data Mining Menggunakan Metode Least Square Untuk Memprediksi Jumlah Pendapatan. JATI (Jurnal Mahasiswa Teknik Informatika). Vol. 7 No. 5, Oktober 2023.
Mardiani, E., Rahmansyah, N., Ningsih, S., Lantana, D. A., Wirawan, A. S., Wijaya, S. A., & Putri, D. N. (2023). Komparasi Metode KNN, Naive Bayes. Decision Tree, Ensemble, Linear Regression Terhadap Analisis Performa Pelajar Sma. Jurnal INNOVATIVE: Journal Of Social Science Research, 13880-13892.
Marinu, W. (2023). Pendekatan Penelitian Pendidikan: Metode Penelitian Kualitatif, Metode Penelitian Kuantitatif dan Metode Penelitian Kombinasi (Mixed Method). Jurnal Pendidikan Tambusai, Volume 7 Nomor 1 Tahun 2023. Halaman 2896-2910, 2896-2910.
Marutho, D. (2019). Perbandingan Metode Naïve Bayes, Knn, Decision Tree Pada Laporan Water Level Jakarta. Jurnal Ilmiah Infokam, Vol 15, No 2.
Priyanti, E. (2019). Komparasi Klasifikasi Pada Prediksi Pendapatan Rumah Tangga. JURNAL SWABUMI Vol.7 No.2 September 2019, pp.114~121, 114-121.
Ratra, R., & Gulia, P. (2020). Experimental evaluation of open source data mining tools (WEKA and Orange). International Journal of Engineering Trends and Technology,. International Journal of Engineering Trends and Technology, 68(8), 30-35., 30-35.
Susetyoko, R., Yuwono, W., Purwantini, E., & Ramadijanti, N. (2022). Perbandingan Metode Random Forest, Regresi Logistik,Naïve Bayes, dan Multilayer Perceptron Pada Klasifikasi Uang Kuliah Tunggal (UKT). Jurnal Infomedia: Teknik Informatika, Multimedia & Jaringan.
Wiguna, R. A., & Rifai, A. I. (2021). Analisis Text Clustering Masyarakat Di Twitter Mengenai Omnibus Law Menggunakan Orange Data Mining. Journal of Information Systems and Informatics, 2656-5935.
Yuwono, L., Fadillah, M. E., Indrayani, M., Maesarah, W., Ramadhan, A., & Panjaitan, F. S. (2021). Klasifikasi Pendapatan Pedagang Kaki Lima Dan Pelaku Usaha Online Akibat DampakCovid-19 Menggunakan Metode Naive Bayes. Bulletin of Applied Industrial Engineering Theory.
Copyright (c) 2023 Eri Mardiani, Nur Rahmansyah, Andy Setiawan, Zakila Cahya Ronika, Dini Fatihatul Hidayah, Atira Syakira
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The copyright of any article in the TECHNO Nusa Mandiri Journal is fully held by the author under the Creative Commons CC BY-NC license.
- The copyright in each article belongs to the author.
- Authors retain all their rights to published works, not limited to the rights set out on this page.
- The author acknowledges that Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri) is the first to publish with a Creative Commons Attribution 4.0 International license (CC BY-NC).
- Authors can enter articles separately, manage non-exclusive distribution, from manuscripts that have been published in this journal into another version (for example: sent to author affiliation respository, publication into books, etc.), by acknowledging that the manuscript was published for the first time in Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri);
- The author guarantees that the original article, written by the stated author, has never been published before, does not contain any statements that violate the law, does not violate the rights of others, is subject to the copyright which is exclusively held by the author.
- If an article was prepared jointly by more than one author, each author submitting the manuscript warrants that he has been authorized by all co-authors to agree to copyright and license notices (agreements) on their behalf, and agrees to notify the co-authors of the terms of this policy. Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri) will not be held responsible for anything that may have occurred due to the author's internal disputes.