A LIGHTWEIGHT AND PRACTICAL PIPELINE FOR CROSS-PROJECT DEFECT PREDICTION USING METRIC-BASED LEARNING
DOI:
https://doi.org/10.33480/techno.v20i2.6854Kata Kunci:
Cross-Project Prediction, Domain Adaption, Machine Learning , Metric-Based Feature , SmoteennAbstrak
Cross-Project Defect Prediction (CPDP) addresses the scarcity of defect data in new software projects by transferring knowledge from existing ones. However, domain shift between projects remains a major challenge. This study introduces a lightweight and practical CPDP pipeline based on traditional metric features, integrating domain adaptation (CORAL, TCA, TCA+), feature selection, and resampling techniques. Through 120 configurations evaluated on multiple PROMISE datasets, we found that combining TCA or TCA+ with Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTEENN) consistently improved F1-Score and Recall on imbalanced datasets. LightGBM demonstrated the most stable performance across projects, while Logistic Regression yielded the highest MCC in specific cases. Principal Component Analysis (PCA) visualizations supported the effectiveness of domain alignment. The proposed pipeline offers a reproducible, cost-efficient alternative to deep learning models and provides actionable insights for defect prediction in resource-constrained environments.
Referensi
Albattah, W., & Alzahrani, M. (2024). Software Defect Prediction Based on Machine Learning and Deep Learning Techniques: An Empirical Approach. AI (Switzerland), 5(4), 1743–1758. https://doi.org/10.3390/ai5040086
Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-019-6413-7
Farid, A. B., Fathy, E. M., Eldin, A. S., & Abd-Elmegid, L. A. (2021). Software defect prediction using hybrid model (CBIL) of convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM). PeerJ Computer Science, 7, 1–22. https://doi.org/10.7717/peerj-cs.739
Ghinaya, H., Herteno, R., Faisal, M. R., Farmadi, A., & Indriani, F. (2024). Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 6(3), 276–288. https://doi.org/10.35882/jeeemi.v6i3.453
Haldar, S., & Fernando Capretz, L. (2024). Feature Importance in the Context of Traditional and Just-In-Time Software Defect Prediction Models. In IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).
Kumar, P. H., & Bhat, S. (2024). Enhancing Regional Plagiarism Detection Using a Backtrack Matching Model: A Precision, Recall, and F1 Score-Based Evaluation. In Journal of Information Systems Engineering and Management (Vol. 2025). Retrieved from https://www.jisem-journal.com/
Kumar, P. S., Nayak, J., & Behera, H. S. (2022). Model-based Software Defect Prediction from Software Quality Characterized Code Features by using Stacking Ensemble Learning. Journal of Engineering Science and Technology Review, 15(2), 137–155. https://doi.org/10.25103/jestr.152.17
Ren, J., Peng, C., Zheng, S., Zou, H., & Gao, S. (2022). An Approach to Improving Homogeneous Cross-Project Defect Prediction by Jensen-Shannon Divergence and Relative Density. Scientific Programming, 2022. https://doi.org/10.1155/2022/4648468
Sharma, U., & Sadam, R. (2022). An Empirical Evaluation of Defect Prediction Models Using Project-Specific Measures. Retrieved from http://ceur-ws.org
Song, H., Pan, Y., Guo, F., Zhang, X., Ma, L., & Jiang, S. (2024). ConCPDP: A Cross-Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment. IET Software, 2024(1). https://doi.org/10.1049/2024/5102699
Sotto-Mayor, B., & Kalech, M. (2024). A Survey on Transfer Learning for Cross-Project Defect Prediction. IEEE Access, 12, 93398–93425. https://doi.org/10.1109/ACCESS.2024.3424311
Stradowski, S., & Madeyski, L. (2024). Costs and Benefits of Machine Learning Software Defect Prediction: Industrial Case Study. FSE Companion - Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 92–103. Association for Computing Machinery, Inc. https://doi.org/10.1145/3663529.3663831
Tao, H., Fu, L., Cao, Q., Niu, X., Chen, H., Shang, S., & Xian, Y. (2024). Cross-Project Defect Prediction Using Transfer Learning with Long Short-Term Memory Networks. IET Software, 2024(1). https://doi.org/10.1049/2024/5550801
Tong, H., Liu, B., Wang, S., & Li, Q. (2019). Transfer-Learning Oriented Class Imbalance Learning for Cross-Project Defect Prediction. Retrieved from http://arxiv.org/abs/1901.08429
Vescan, A., Găceanu, R., & Şerban, C. (2024). Exploring the impact of data preprocessing techniques on composite classifier algorithms in cross-project defect prediction. Automated Software Engineering, 31(2). https://doi.org/10.1007/s10515-024-00454-9
Yang, C., Fan, Z., Wu, J., Zhang, J., Zhang, W., Yang, J., & Yang, J. (2021). The Diagnostic Value of Soluble ST2 in Heart Failure: A Meta-Analysis. Frontiers in Cardiovascular Medicine, Vol. 8. Frontiers Media SA. https://doi.org/10.3389/fcvm.2021.685904
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., & Murphy, B. (2009). Cross-project defect prediction. Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, 91–100. New York, NY, USA: ACM. https://doi.org/10.1145/1595696.1595713
##submission.downloads##
Diterbitkan
Cara Mengutip
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2025 Novia Heriyani, Agus Subekti

Artikel ini berlisensi Creative Commons Attribution-NonCommercial 4.0 International License.
The copyright of any article in the TECHNO Nusa Mandiri Journal is fully held by the author under the Creative Commons CC BY-NC license. The copyright in each article belongs to the author. Authors retain all their rights to published works, not limited to the rights set out on this page. The author acknowledges that Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri) is the first to publish with a Creative Commons Attribution 4.0 International license (CC BY-NC). Authors can enter articles separately, manage non-exclusive distribution, from manuscripts that have been published in this journal into another version (for example: sent to author affiliation respository, publication into books, etc.), by acknowledging that the manuscript was published for the first time in Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri); The author guarantees that the original article, written by the stated author, has never been published before, does not contain any statements that violate the law, does not violate the rights of others, is subject to the copyright which is exclusively held by the author. If an article was prepared jointly by more than one author, each author submitting the manuscript warrants that he has been authorized by all co-authors to agree to copyright and license notices (agreements) on their behalf, and agrees to notify the co-authors of the terms of this policy. Techno Nusa Mandiri: Journal of Computing and Information Technology (TECHNO Nusa Mandiri) will not be held responsible for anything that may have occurred due to the author's internal disputes.