# COMPARATIVE ANALYSIS OF SOFTWARE EFFORT ESTIMATION USING DATA MINING TECHNIQUE AND FEATURE SELECTION

### Abstract

Software development involves several interrelated factors that influence development efforts and productivity. Improving the estimation techniques available to project managers will facilitate more effective time and budget control in software development. Software Effort Estimation or software cost/effort estimation can help a software development company to overcome difficulties experienced in estimating software development efforts. This study aims to compare the Machine Learning method of Linear Regression (LR), Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Decision Tree Random Forest (DTRF) to calculate estimated cost/effort software. Then these five approaches will be tested on a dataset of software development projects as many as 10 dataset projects. So that it can produce new knowledge about what machine learning and non-machine learning methods are the most accurate for estimating software business. As well as knowing between the selection between using Particle Swarm Optimization (PSO) for attributes selection and without PSO, which one can increase the accuracy for software business estimation. The data mining algorithm used to calculate the most optimal software effort estimate is the Linear Regression algorithm with an average RMSE value of 1603,024 for the 10 datasets tested. Then using the PSO feature selection can increase the accuracy or reduce the RMSE average value to 1552,999. The result indicates that, compared with the original regression linear model, the accuracy or error rate of software effort estimation has increased by 3.12% by applying PSO feature selection

### Downloads

### References

B. Jeng, D. Yeh, D. Wang, S. L. Lhu, and C. M. Chen, “A Specific Effort Estimation Method Using Function Point,” J. Inf. Sci. Eng., vol. 27, no. 4, pp. 1363–1376, 2011.

D. Pratiwi, “Implementation of Function Point Analysis in Measuring the Volume Estimation of Software System in Object Oriented and Structural Model of Academic System,” Int. J. Comput. Appl., 2013.

G. R. Finnie, G. E. Wittig, and J.-M. Desharnais, “A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models,” J. Syst. Softw., vol. 39, no. 3, pp. 281–289, Dec. 1997.

A. B. Nassif, M. Azzeh, L. F. Capretz, and D. Ho, “A comparison between decision trees and decision tree forest models for software development effort estimation,” in 2013 Third International Conference on Communications and Information Technology (ICCIT), 2013, pp. 220–224.

K. Dejaeger, W. Verbeke, D. Martens, and B. Baesens, “Data mining techniques for software effort estimation: A comparative study,” IEEE Trans. Softw. Eng., vol. 38, no. 2, pp. 375–397, 2012.

M. O. Elish, “Improved estimation of software project effort using multiple additive regression trees,” Expert Syst. Appl., 2009.

A. BaniMustafa, “Predicting Software Effort Estimation Using Machine Learning Techniques,” in 2018 8th International Conference on Computer Science and Information Technology (CSIT), 2018, pp. 249–256.

A. B. Nassif, D. Ho, and L. F. Capretz, “Towards an early software estimation using log-linear regression and a multilayer perceptron model,” J. Syst. Softw., vol. 86, no. 1, pp. 144–160, Jan. 2013.

I. Wieczorek, “Improved Software Cost Estimation – A Robust and Interpretable Modelling Method and a Comprehensive Empirical Investigation,” Empir. Softw. Eng., vol. 7, no. 2, pp. 177–180, 2002.

B. W. Boehm, Software Engineering Economics. Prentice Hall, 1981.

Putnam and L. H, “A General Empirical Solution to the Macro Software Sizing and Estimating Problem,” IEEE Trans. Softw. Eng., vol. 4, no. 4, pp. 345–361, 1987.

D. D. Galorath and M. W. Evans, Software Sizing, Estimation, and Risk Management. Auerbach Publication, 2006.

A. B. Nassif, M. Azzeh, A. Idri, and A. Abran, “Software Development Effort Estimation Using Regression Fuzzy Models,” Comput. Intell. Neurosci., vol. 2019, pp. 1–17, Feb. 2019.

I. Kurniawan, “Kombinasi Median Weighted Information Gain Dengan K-Nearest Neighbor Pada Dataset Label Months Software Effort Estimation,” J. Teknoinfo, vol. 14, no. 2, p. 138, Jul. 2020.

A. Panda, S. M. Satapathy, and S. K. Rath, “Empirical Validation of Neural Network Models for Agile Software Effort Estimation based on Story Points,” Procedia - Procedia Comput. Sci., vol. 57, pp. 772–781, 2015.

I. Attarzadeh, A. Mehranzadeh, and A. Barati, “Proposing an enhanced artificial neural network prediction model to improve the accuracy in software effort estimation,” Proc. - 2012 4th Int. Conf. Comput. Intell. Commun. Syst. Networks, CICSyN 2012, pp. 167–172, 2012.

A. B. Nassif, L. F. Capretz, and D. Ho, “Software effort estimation in the early stages of the software life cycle using a cascade correlation neural network model,” Proc. - 13th ACIS Int. Conf. Softw. Eng. Artif. Intell. Networking, Parallel/Distributed Comput. SNPD 2012, pp. 589–594, 2012.

P. V. G. D. P. Reddy, K. R. Sudha, P. R. Sree, and S. N. S. V. S. C. Ramesh, “Software Effort Estimation using Radial Basis and Generalized Regression Neural Networks,” vol. 2, no. 5, pp. 87–92, 2010.

Z. Dan, “Improving the Accuracy in Software Effort estimation using ann model based on PSO,” in Proceedings of 2013 IEEE International Conference on Service Operations and Logistics, and Informatics, 2013, pp. 180–185.

J. Harlan, Analisis Regresi Linear. Yogyakarta: Penerbit Gunadarma, 2018.

A. Wibowo, “10-Fold Cross Validation,” MTI Binus, 2017. [Online]. Available: https://mti.binus.ac.id/2017/11/24/10-fold-cross-validation/. [Accessed: 02-Jan-2021].

*jitk*, vol. 6, no. 2, pp. 167-174, Feb. 2021.

**Article Metrics**

Abstract viewed = 117 times

PDF downloaded = 90 times

Copyright (c) 2021 Abdul Latif, Lady Agustin Fitriana, Muhammad Rifqi Firdaus

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.