COMPARISON OF ENSEMBLE METHODS FOR DECISION TREE MODELS IN CLASSIFYING E. COLI BACTERIA
Keywords:
classification performance, decision tree, ensemble methods, escherichia coli classification, machine LearningAbstract
Certain strains of Escherichia coli (E. coli) can cause serious illness, so identifying dangerous strains with high accuracy is a priority in supporting public health and food safety. However, traditional machine learning methods, such as Decision Trees, are often not robust enough to handle the complexity of biological data. This research presents a solution by systematically evaluating seven ensemble methods, namely Adaboost, Gradient Boosting, XGBoost, LightGBM, Random Forest, Bagging, and Stacking, using a dataset that includes 336 E. coli samples with eight biological features. These models are evaluated based on accuracy, precision, recall, and F1 score, with parameter optimization to obtain the best results. The results show that XGBoost is superior with accuracy, recall, and F1 score of 88% and precision of 87%, outperforming other methods. This research has the advantage of a comprehensive approach in comparing various ensemble methods simultaneously, accompanied by the application of confusion matrix-based evaluation to ensure the accuracy of the results. Additionally, the ensemble approach proved to be more effective in handling complex data patterns and reducing bias in bacterial strain classification. These findings provide a significant contribution, namely a practical framework for improving laboratory diagnostics and public health surveillance, with machine learning-based solutions that are faster, more reliable, and applicable for both industrial and clinical environments. This research expands understanding of the potential of ensemble methods in microbiological data classification and provides new directions for modern diagnostic technology.
Downloads
References
V. J. Harkins, D. A. McAllister, and B. C. Reynolds, “Shiga-Toxin E. coli Hemolytic Uremic Syndrome: Review of Management and Long-term Outcome,” Curr Pediatr Rep, vol. 8, no. 1, pp. 16–25, Sep. 2024, doi: 10.1007/s40124-020-00208-7.
A. Damena, A. Mikru, M. Adane, and B. Dobo, “Microbial Profile and Safety of Chicken Eggs from a Poultry Farm and Small-Scale Vendors in Hawassa, Southern Ethiopia,” J Food Qual, vol. 2022, pp. 1–16, Sep. 2024, doi: 10.1155/2022/7483253.
K. Qu, F. Guo, X. Liu, Y. Lin, and Q. Zou, “Application of Machine Learning in Microbiology,” Front Microbiol, vol. 10, Sep. 2024, doi: 10.3389/fmicb.2019.00827.
A. Masruro, H. Utama, and A. Triyadi, “Kolaborasi Naïve Bayes dan AdaBoost dalam Klasifikasi Bakteri E.coli,” Manajemen dan Teknologi Informasi, vol. 2, no. 2, 2024, [Online]. Available: https://archive.ics.uci.edu/dataset/39/Ecoli.
R. I. Arumnisaa and A. W. Wijayanto, “Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI),” SISTEMASI, vol. 12, no. 1, p. 206, Sep. 2024, doi: 10.32520/stmsi.v12i1.2501.
U. Indahyanti, N. L. Azizah, and H. Setiawan, “Pendekatan Ensemble Learning Untuk Meningkatkan Akurasi Prediksi Kinerja Akademik Mahasiswa,” Jurnal Sains dan Informatika, vol. 8, no. 2, Sep. 2024, doi: 10.34128/jsi.v8i2.459.
F. Churniansyah and D. W. Utomo, “Teknik Bagging pada Ensemble Learning untuk Kategorisasi Produk E-Commerce,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 10, no. 1, pp. 92–99, Sep. 2024, doi: 10.25077/teknosi.v10i1.2024.92-99.
Eri Mardiani, NurRahmansyah, and Sari Ningsih, “Komparasi Metode Knn, Naive Bayes, Decision Tree, Ensemble, Linear Regression Terhadap Analisis Performa Pelajar Sma,” INNOVATIVE: Journal Of Social Science Research, vol. 3, pp. 13880–13892, 2023.
Y. Pristyanto, A. Sidauruk, and A. Nurmasani, “Klasifikasi Penyakit Diabetes Pada Imbalanced Class Dataset Menggunakan Algoritme Stacking,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 6, no. 1, p. 287, Jan. 2022, doi: 10.30865/mib.v6i1.3442.
B. N. Azmi, A. Hermawan, and D. Avianto, “Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver,” JTIM : Jurnal Teknologi Informasi dan Multimedia, vol. 4, no. 4, pp. 281–290, doi: 10.35746/jtim.v4i4.298.
D. H. Depari, Y. Widiastiwi, and M. M. Santoni, “Perbandingan Model Decision Tree, Naive Bayes dan Random Forest untuk Prediksi Klasifikasi Penyakit Jantung,” Informatik : Jurnal Ilmu Komputer, vol. 18, no. 3, p. 239, doi: 10.52958/iftk.v18i3.4694.
B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, doi: 10.38094/jastt20165.
A. K. Wardhani, E. Nugraha, and Q. Ulfiana, “Optimization of the Decision Tree Method using Pruning on Liver Disease Classification,” Journal of Applied Informatics and Computing, vol. 6, no. 2, pp. 136–140, doi: 10.30871/jaic.v6i2.4350.
E. A. Guna, M. D. D. Ghifary, E. F. Sihombing, and A. P. Datubara, “Implementasi Algoritma Decision Tree untuk Klasifikasi Data Evaluation Car Menggunakan Python,” Jusiik, vol. 1, no. 4, [Online]. Available: https://doi.org/10.59581/jusiik-widyakarya.v1i4
B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021, doi: 10.38094/jastt20165.
B. Chen, Q. Chen, and P. Ye, “Information-based massive data retrieval method based on distributed decision tree algorithm,” International Journal of Modeling, Simulation, and Scientific Computing, vol. 14, no. 01, p. 2243002, 2023, doi: 10.1142/S1793962322430024.
N. Nurussakinah and M. Faisal, “Klasifikasi Penyakit Diabetes Menggunakan Algoritma Decision Tree,” Jurnal Informatika, vol. 10, no. 2, pp. 143–149, Oct. 2023, doi: 10.31294/inf.v10i2.15989.
E. Fauziningrum and E. I. Sulistyaningsih, “PENERAPAN DATA MINING METODE DECISION TREE UNTUK MENGUKUR PENGUASAAN BAHASA INGGRIS MARITIM (STUDI KASUS DI UNIVERSITAS MARITIM AMNI),” JURNAL SAINS DAN TEKNOLOGI MARITIM, vol. 22, no. 1, p. 41, Sep. 2021, doi: 10.33556/jstm.v22i1.285.
D. Müller, I. Soto-Rey, and F. Kramer, “Towards a guideline for evaluation metrics in medical image segmentation,” BMC Res Notes, vol. 15, no. 1, p. 210, 2022, doi: 10.1186/s13104-022-06096-y.
M. M. Sugiman and H. D. Purnomo, “Prediksi Kegagalan Transformator Daya dengan Metode DGA (Dissolved Gas Analysis) Menggunakan Random Forest Berbasis TDCG,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 8, no. 1, p. 441, Jan. 2024, doi: 10.30865/mib.v8i1.7036.
F. Farhangi, “Investigating the role of data preprocessing, hyperparameters tuning, and type of machine learning algorithm in the improvement of drowsy EEG signal modeling,” Intelligent Systems with Applications, vol. 15, p. 200100, 2022, doi: https://doi.org/10.1016/j.iswa.2022.200100.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Alvin Rahman Al Musyaffa, Yoga Pristyanto, Nia Mauliza
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.