PERFORMANCE OF ROBUST SUPPORT VECTOR MACHINE CLASSIFICATION MODEL ON BALANCED, IMBALANCED AND OUTLIERS DATASETS

  • Muhammad Ardiansyah Sembiring (1) Universitas Sumatera Utara
  • Herman Saputra (2) Universitas Sumatera Utara
  • Riki Andri Yusda (3) Universitas Sumatera Utara
  • Sutarman Sutarman (4*) Universitas Sumatera Utara
  • Erna Budhiarti Nababan (5) Universitas Sumatera Utara

  • (*) Corresponding Author
Keywords: Machine Learning, Robust SVM, Support Vector Machine

Abstract

In the realm of machine learning, classification models are important for identifying patterns and grouping data. Support Vector Machine (SVM) and Robust SVM are two types of models that are often used. SVM works by finding an optimal hyperplane to separate data classes, while Robust SVM is designed to deal with uncertainty and noise in the data, making it more resistant to outliers. However, SVM has limitations in dealing with class imbalance and outliers in the dataset. Class imbalance makes the model tend to predict the majority class, and outliers can interfere with model formation. This research compares the performance of SVM and Robust SVM on normal, unbalanced and outlier datasets. The software uses Python and Scikit-learn for implementation and comparison of the two models. Key features include automatic data preprocessing, model training, and evaluation with metrics such as accuracy, precision, recall, and F1 score. The results show that Robust SVM is superior in accuracy on normal datasets and is very effective in dealing with class imbalance, achieving a maximum accuracy of 100%. On datasets with outliers, Robust SVM maintains stable accuracy, demonstrating its robustness to outliers. This research contributes to correspondence management by providing more reliable classification models, improving data processing accuracy, and supporting more informed decision making in software development

Downloads

Download data is not yet available.

References

R. C. Chen, C. Dewi, S. W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00327-4.

N. I. Nordin et al., “Enhancing COVID-19 Classification Accuracy with a Hybrid SVM-LR Model,” Bioengineering, vol. 10, no. 11, pp. 1–15, 2023, doi: 10.3390/bioengineering10111318.

V. Blanco, A. Japón, and J. Puerto, “A mathematical programming approach to SVM-based classification with label noise,” Comput. Ind. Eng., vol. 172, no. May, 2022, doi: 10.1016/j.cie.2022.108611.

A. Gaddam, T. Wilkin, M. Angelova, and J. Gaddam, “Detecting sensor faults, anomalies and outliers in the internet of things: A survey on the challenges and solutions,” Electron., vol. 9, no. 3, pp. 1–15, 2020, doi: 10.3390/electronics9030511.

M. Baldomero-Naranjo, L. I. Martínez-Merino, and A. M. Rodríguez-Chía, “A robust SVM-based approach with feature selection and outliers detection for classification problems,” Expert Syst. Appl., vol. 178, p. 115017, 2021, doi: 10.1016/J.ESWA.2021.115017.

D. Faccini, F. Maggioni, and F. A. Potra, “Robust and Distributionally Robust Optimization Models for Linear Support Vector Machine,” Comput. Oper. Res., vol. 147, no. December 2021, p. 105930, 2022, doi: 10.1016/j.cor.2022.105930.

M. Safaei et al., “A systematic literature review on outlier detection inwireless sensor networks,” Symmetry (Basel)., vol. 12, no. 3, pp. 1–40, 2020, doi: 10.3390/sym12030328.

D. A. Otchere, T. O. Arbi Ganat, R. Gholami, and S. Ridha, “Application of supervised machine learning paradigms in the prediction of petroleum reservoir properties: Comparative analysis of ANN and SVM models,” J. Pet. Sci. Eng., vol. 200, no. December 2020, p. 108182, 2021, doi: 10.1016/j.petrol.2020.108182.

H. Hairani, “Peningkatan Kinerja Metode SVM Menggunakan Metode KNN Imputasi dan K-Means-Smote untuk Klasifikasi Kelulusan Mahasiswa Universitas Bumigora,” J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 4, pp. 713–718, 2021, doi: 10.25126/jtiik.2021843428.

M. S. Reza, U. Hafsha, R. Amin, R. Yasmin, and S. Ruhi, “Improving SVM performance for type II diabetes prediction with an improved non-linear kernel: Insights from the PIMA dataset,” Comput. Methods Programs Biomed. Updat., vol. 4, no. August, p. 100118, 2023, doi: 10.1016/j.cmpbup.2023.100118.

D. Valero-Carreras, J. Alcaraz, and M. Landete, “Comparing two SVM models through different metrics based on the confusion matrix,” Comput. Oper. Res., vol. 152, no. December 2022, p. 106131, 2023, doi: 10.1016/j.cor.2022.106131.

A. T. Akbar, N. Yudistira, and A. Ridok, “Identifikasi Gagal Ginjal Kronis dengan Mengimplementasikan Metode Support Vector Machine beserta K-Nearest Neighbour (SVM-KNN),” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 2, pp. 301–308, 2023, doi: 10.25126/jtiik.20231026059.

A. Yaqoob, R. M. Aziz, N. K. Verma, P. Lalwani, A. Makrariya, and P. Kumar, “A Review on Nature-Inspired Algorithms for Cancer Disease Prediction and Classification,” Mathematics, vol. 11, no. 5, 2023, doi: 10.3390/math11051081.

N. E. Ramli, Z. R. Yahya, and N. A. Said, “Confusion Matrix as Performance Measure for Corner Detectors,” J. Adv. Res. Appl. Sci. Eng. Technol., vol. 29, no. 1, pp. 256–265, 2022, doi: 10.37934/araset.29.1.256265.

D. Krstinić, M. Braović, L. Šerić, and D. Božić-Štulić, “Multi-label Classifier Performance Evaluation with Confusion Matrix,” pp. 01–14, 2020, doi: 10.5121/csit.2020.100801.

K. Riehl, M. Neunteufel, and M. Hemberg, “Hierarchical confusion matrix for classification performance evaluation,” J. R. Stat. Soc. Ser. C Appl. Stat., vol. 72, no. 5, pp. 1394–1412, 2023, doi: 10.1093/jrsssc/qlad057.

T. S. Tamir et al., “Traffic Congestion Prediction using Decision Tree, Logistic Regression and Neural Networks,” IFAC-PapersOnLine, vol. 53, no. 5, pp. 512–517, 2020, doi: 10.1016/j.ifacol.2021.04.138.

K. S. Nugroho, A. Y. Sukmadewa, A. Vidianto, and W. F. Mahmudy, “Effective predictive modelling for coronary artery diseases using support vector machine,” IAES Int. J. Artif. Intell., vol. 11, no. 1, pp. 345–355, 2022, doi: 10.11591/ijai.v11.i1.pp345-355.

Published
2024-08-01
How to Cite
[1]
M. Sembiring, H. Saputra, R. Yusda, S. Sutarman, and E. Nababan, “PERFORMANCE OF ROBUST SUPPORT VECTOR MACHINE CLASSIFICATION MODEL ON BALANCED, IMBALANCED AND OUTLIERS DATASETS”, jitk, vol. 10, no. 1, pp. 208 - 215, Aug. 2024.
Article Metrics

Abstract viewed = 62 times
PDF downloaded = 40 times