A LIGHTWEIGHT AND PRACTICAL PIPELINE FOR CROSS-PROJECT DEFECT PREDICTION USING METRIC-BASED LEARNING

Authors

  • Novia Heriyani Mahasiswa
  • Agus Subekti Universitas Nusa Mandiri

DOI:

https://doi.org/10.33480/techno.v20i2.6854

Keywords:

Cross-Project Prediction, Domain Adaption, Machine Learning , Metric-Based Feature , Smoteenn

Abstract

Cross-Project Defect Prediction (CPDP) addresses the scarcity of defect data in new software projects by transferring knowledge from existing ones. However, domain shift between projects remains a major challenge. This study introduces a lightweight and practical CPDP pipeline based on traditional metric features, integrating domain adaptation (CORAL, TCA, TCA+), feature selection, and resampling techniques. Through 120 configurations evaluated on multiple PROMISE datasets, we found that combining TCA or TCA+ with Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors  (SMOTEENN) consistently improved F1-Score and Recall on imbalanced datasets. LightGBM demonstrated the most stable performance across projects, while Logistic Regression yielded the highest MCC in specific cases. Principal Component Analysis  (PCA)  visualizations supported the effectiveness of domain alignment. The proposed pipeline offers a reproducible, cost-efficient alternative to deep learning models and provides actionable insights for defect prediction in resource-constrained environments.

References

Albattah, W., & Alzahrani, M. (2024). Software Defect Prediction Based on Machine Learning and Deep Learning Techniques: An Empirical Approach. AI (Switzerland), 5(4), 1743–1758. https://doi.org/10.3390/ai5040086

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-019-6413-7

Farid, A. B., Fathy, E. M., Eldin, A. S., & Abd-Elmegid, L. A. (2021). Software defect prediction using hybrid model (CBIL) of convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM). PeerJ Computer Science, 7, 1–22. https://doi.org/10.7717/peerj-cs.739

Ghinaya, H., Herteno, R., Faisal, M. R., Farmadi, A., & Indriani, F. (2024). Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 6(3), 276–288. https://doi.org/10.35882/jeeemi.v6i3.453

Haldar, S., & Fernando Capretz, L. (2024). Feature Importance in the Context of Traditional and Just-In-Time Software Defect Prediction Models. In IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

Kumar, P. H., & Bhat, S. (2024). Enhancing Regional Plagiarism Detection Using a Backtrack Matching Model: A Precision, Recall, and F1 Score-Based Evaluation. In Journal of Information Systems Engineering and Management (Vol. 2025). Retrieved from https://www.jisem-journal.com/

Kumar, P. S., Nayak, J., & Behera, H. S. (2022). Model-based Software Defect Prediction from Software Quality Characterized Code Features by using Stacking Ensemble Learning. Journal of Engineering Science and Technology Review, 15(2), 137–155. https://doi.org/10.25103/jestr.152.17

Ren, J., Peng, C., Zheng, S., Zou, H., & Gao, S. (2022). An Approach to Improving Homogeneous Cross-Project Defect Prediction by Jensen-Shannon Divergence and Relative Density. Scientific Programming, 2022. https://doi.org/10.1155/2022/4648468

Sharma, U., & Sadam, R. (2022). An Empirical Evaluation of Defect Prediction Models Using Project-Specific Measures. Retrieved from http://ceur-ws.org

Song, H., Pan, Y., Guo, F., Zhang, X., Ma, L., & Jiang, S. (2024). ConCPDP: A Cross-Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment. IET Software, 2024(1). https://doi.org/10.1049/2024/5102699

Sotto-Mayor, B., & Kalech, M. (2024). A Survey on Transfer Learning for Cross-Project Defect Prediction. IEEE Access, 12, 93398–93425. https://doi.org/10.1109/ACCESS.2024.3424311

Stradowski, S., & Madeyski, L. (2024). Costs and Benefits of Machine Learning Software Defect Prediction: Industrial Case Study. FSE Companion - Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 92–103. Association for Computing Machinery, Inc. https://doi.org/10.1145/3663529.3663831

Tao, H., Fu, L., Cao, Q., Niu, X., Chen, H., Shang, S., & Xian, Y. (2024). Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks. IET Software, 2024(1). https://doi.org/10.1049/2024/5550801

Tong, H., Liu, B., Wang, S., & Li, Q. (2019). Transfer-Learning Oriented Class Imbalance Learning for Cross-Project Defect Prediction. Retrieved from http://arxiv.org/abs/1901.08429

Vescan, A., Găceanu, R., & Şerban, C. (2024). Exploring the impact of data preprocessing techniques on composite classifier algorithms in cross-project defect prediction. Automated Software Engineering, 31(2). https://doi.org/10.1007/s10515-024-00454-9

Yang, C., Fan, Z., Wu, J., Zhang, J., Zhang, W., Yang, J., & Yang, J. (2021). The Diagnostic Value of Soluble ST2 in Heart Failure: A Meta-Analysis. Frontiers in Cardiovascular Medicine, Vol. 8. Frontiers Media SA. https://doi.org/10.3389/fcvm.2021.685904

Zimmermann, T., Nagappan, N., Gall, H., Giger, E., & Murphy, B. (2009). Cross-project defect prediction. Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, 91–100. New York, NY, USA: ACM. https://doi.org/10.1145/1595696.1595713

Downloads

Published

2025-09-25

How to Cite

Heriyani, N., & Subekti, A. (2025). A LIGHTWEIGHT AND PRACTICAL PIPELINE FOR CROSS-PROJECT DEFECT PREDICTION USING METRIC-BASED LEARNING. Jurnal Techno Nusa Mandiri, 20(2), 125–134. https://doi.org/10.33480/techno.v20i2.6854