COMPARISON OF KNN, NAIVE BAYES, DECISION TREE, ENSEMBLE, REGRESSION METHODS FOR INCOME PREDICTION

  • Eri Mardiani Universitas Nasional
  • Nur Rahmansyah Politeknik Negeri Media Kreatif
  • Andy Setiawan UPN Veteran Jakarta
  • Zakila Cahya Ronika UPN Veteran Jakarta
  • Dini Fatihatul Hidayah UPN Veteran Jakarta
  • Atira Syakira UPN Veteran Jakarta
Keywords: algorithm method comparison, data mining, income classification, orange

Abstract

Using the income classification dataset, we performed data analysis with the help of data mining to gather interesting information from the available data. Currently, data processing can be done using many tools. One of the tools that we use for data processing is the orange application. By using the dataset we looked at the welfare level ranging from marital status, school, gender, and from all fields related to income ranging from sales, to daily life to find out the income earned by employees or workers from several countries such as the United States, Cambodia, United Kingdom, Puerto-Rico, Canada, Germany, Outer US (Guam-USVI-etc). The purpose of this analysis is to determine the hourly income in one week that can affect the income classification. The classification technique uses various classification models, namely the K-Nearest Neighbor (KNN) algorithm model, Naïve Bayes, Decision Tree, Esemble Method and Linear Regression algorithm. The results of the analysis based on the test results of various algorithm models can be concluded that the best algorithm model for measuring workers' income is to use the Naive Bayes Decision. Analysis of variables based on Hours-per-Week and Capital-Gain affects Income Classification which determines whether the income earned is more than 50 thousand/50 K and the analysis results in a prediction of a person's income level.

References

Dachi, J. (2023). Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit.

Djamaludin, M. A., Triayudi, A., & Mardiani, E. (2022). Analisis Sentimen Tweet KRI Nanggala 402 di Twitter menggunakan Metode Naïve Bayes Classifier. Jurnal JTIK (Jurnal Teknologi Informasi dan Komunikasi).

Giri, G. A. (2018). Klasifikasi Musik Berdasarkan Genre dengan Metode K-Nearest Neighbor. Jurnal Ilmu Komputer VOL. XI No. 2.

Hozairi, H., Anwari, A., & Alim, S. (2021). Implementasi Orange Data Mining Untuk Klasifikasi Kelulusan Mahasiswa Dengan Model K-Nearest Neighbor, Decision Tree Serta Naive Bayes. Nero (Networking Engineering Research Operation).

Indrapras, K., Alfaris, M., Falentina, & Triana, A. (2022). Analisis Big Data dan Official Statistics dalam Melakukan Nowcasting Pertumbuhan Ekonomi Indonesia Sebelum dan Selama Pandemi COVID-19. Seminar Nasional Official Statistics 2022, https://prosiding.stis.ac.id/index.php/semnasoffstat.

Indriyawati, H., & Khoirudin. (2019). Penerapan Metode Regresi Linier Dalam Koherensi Pengolahan Data Bahan Baku Tiandra Store Guna Meningkatkan Mutu Produksi. Sintak Prosiding,https://www.unisbank.ac.id/ojs/index.php/sintak/article/view/7603.

Karo, G. E., Erwansyah, K., & Suharsil. (2020). Implementasi Data Mining Dalam Mengestimasi Pendapatan Pada Pt Citosarana Jasa. Jurnal Cyber Tech.

Laksono, R. A., Achmadi, S., & Sasmito, A. P. (2023). Implementasi Data Mining Menggunakan Metode Least Square Untuk Memprediksi Jumlah Pendapatan. JATI (Jurnal Mahasiswa Teknik Informatika). Vol. 7 No. 5, Oktober 2023.

Mardiani, E., Rahmansyah, N., Ningsih, S., Lantana, D. A., Wirawan, A. S., Wijaya, S. A., & Putri, D. N. (2023). Komparasi Metode KNN, Naive Bayes. Decision Tree, Ensemble, Linear Regression Terhadap Analisis Performa Pelajar Sma. Jurnal INNOVATIVE: Journal Of Social Science Research, 13880-13892.

Marinu, W. (2023). Pendekatan Penelitian Pendidikan: Metode Penelitian Kualitatif, Metode Penelitian Kuantitatif dan Metode Penelitian Kombinasi (Mixed Method). Jurnal Pendidikan Tambusai, Volume 7 Nomor 1 Tahun 2023. Halaman 2896-2910, 2896-2910.

Marutho, D. (2019). Perbandingan Metode Naïve Bayes, Knn, Decision Tree Pada Laporan Water Level Jakarta. Jurnal Ilmiah Infokam, Vol 15, No 2.

Priyanti, E. (2019). Komparasi Klasifikasi Pada Prediksi Pendapatan Rumah Tangga. JURNAL SWABUMI Vol.7 No.2 September 2019, pp.114~121, 114-121.

Ratra, R., & Gulia, P. (2020). Experimental evaluation of open source data mining tools (WEKA and Orange). International Journal of Engineering Trends and Technology,. International Journal of Engineering Trends and Technology, 68(8), 30-35., 30-35.

Susetyoko, R., Yuwono, W., Purwantini, E., & Ramadijanti, N. (2022). Perbandingan Metode Random Forest, Regresi Logistik,Naïve Bayes, dan Multilayer Perceptron Pada Klasifikasi Uang Kuliah Tunggal (UKT). Jurnal Infomedia: Teknik Informatika, Multimedia & Jaringan.

Wiguna, R. A., & Rifai, A. I. (2021). Analisis Text Clustering Masyarakat Di Twitter Mengenai Omnibus Law Menggunakan Orange Data Mining. Journal of Information Systems and Informatics, 2656-5935.

Yuwono, L., Fadillah, M. E., Indrayani, M., Maesarah, W., Ramadhan, A., & Panjaitan, F. S. (2021). Klasifikasi Pendapatan Pedagang Kaki Lima Dan Pelaku Usaha Online Akibat DampakCovid-19 Menggunakan Metode Naive Bayes. Bulletin of Applied Industrial Engineering Theory.

Published
2023-09-30
How to Cite
Mardiani, E., Rahmansyah, N., Setiawan, A., Ronika, Z., Hidayah, D., & Syakira, A. (2023). COMPARISON OF KNN, NAIVE BAYES, DECISION TREE, ENSEMBLE, REGRESSION METHODS FOR INCOME PREDICTION. Jurnal Techno Nusa Mandiri, 20(2), 115-121. https://doi.org/10.33480/techno.v20i2.4613