SOFTWARE QUALITY PREDICTION USING MACHINE LEARNING TECHNIQUES AND SOURCE CODE METRICS: A REVIEW
Main Article Content
Abstract
Software quality prediction is the Machine Learning (ML) based technique in which ML models are trained using historical data. Output from these quality models can be used by software experts in the early phase of software development for improving the quality of software by controlling the various quality attributes like maintainability, reliability, security issues of software etc. In this study a systematic review of studies from 2005 to 2021 is performed. Studies that use ML techniques and source code metrics for Software Quality Prediction (SQP) are included for review. Study assesses the commonly used machine learning techniques and source code metric for SQP. Commonly used datasets, feature selection techniques and commonly used performance measures in software quality prediction are also assessed. In this paper  53 primary studies are selected for systematic review. Results of this study prove that Bayesian Learning (BL), Regression, Ensemble Learning (EL), Decision Tree (DT) and Support Vector Machine (SVM) are most commonly ML techniques used for quality prediction which comprises 58%, 52%, 41%, 32%, and 32% of the overall studies respectively. It is also assessed that NASA, PROMISE, Apache, Mozilla Firefox and Eclipse are the most commonly used datasets for training and testing the SQP models. LOC, CC, CBO, RFC, WMC, LCOM, DIT and NOC are among the most commonly used source code metrics in SQP. Based on the results from the selected studies it is concluded that ML techniques and source code metrics  have the ability to improve the overall quality of the software.
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.
References
H.W. Jung, SG. Kim and C.S. Chung, “Measuring software product quality: A survey of ISO/IEC 9126,†IEEE software, vol. 21, Oct. 2004, pp. 88-92.
T. Honglei, S. Wei and Z. Yanan, “The research on software metrics and software complexity metrics,†2009 International Forum on Computer Science-Technology and Applications, IEEE, vol. 1, Dec. 2009 pp. 131-136.
D. Azar, H. Harmanani and R. Korkmaz, “A hybrid heuristic approach to optimize rule-based software quality estimation models,†Information and Software Technology, vol. 51, Sep. 2009, pp. 1365-76.
M. Jørgensen, “Software quality measurement. Advances in engineering software,†vol. 30, Dec. 1999, pp. 907-12.
A.S. Nuñez-Varela, H.G. Pérez-Gonzalez, F.E. MartÃnez-Perez and C.Soubervielle-Montalvo, “Source code metrics: A systematic mapping study,†Journal of Systems and Software, vol. 128, Jun. 2017, pp. 164-97.
R. Malhotra, “A systematic review of machine learning techniques for software fault prediction,†Applied Soft Computing, vol. 27, Feb, pp. 504-18.
B. Khan, R. Naseem, M.A. Shah, K. Wakil, A. Khan, M. I. Uddin and M. Mahmoud, “Software defect prediction for healthcare big data: an empirical evaluation of machine learning techniques,†Journal of Healthcare Engineering,2021, Mar. 2021.
M. Gayathri and A. Sudha, “Software defect prediction system using multilayer perceptron neural network with data mining,†International Journal of Recent Technology and Engineering, vol. 3, May. 2014, pp. 54-59.
S. Agarwal and D. Tomar, “Prediction of software defects using twin support vector machine,†2014 international conference on information systems and computer networks (ISCON), IEEE, Mar. 2014, pp. 128-132.
R. Malhotra, “An empirical framework for defect prediction using machine learning techniques with Android software,†Applied Soft Computing, vol. 49, Dec. 2016, pp. 1034-50.
S.S. Rathore and S. Kumar, “A decision tree regression based approach for the number of software faults prediction,†ACM SIGSOFT Software Engineering Notes, vol. 41, Feb. 2016, pp. 1-6.
S. S. Rathore and S. Kumar, “An empirical study of some software fault prediction techniques for the number of faults prediction,†Soft Computing, vol. 21, Dec. 2017, pp. 7417-434.
Y. Jiang, B. Cuki, T. Menzie and N. Bartlow, “Comparing design and code metrics for software quality prediction,†Proceedings of the 4th international workshop on Predictor models in software engineering, vol. 12, May. 2008, pp. 11-18.
I. Gondra, “Applying machine learning to software fault-proneness prediction,†Journal of Systems and Software, vol. 81, Feb. 2008, pp. 186-95.
V. U. Challagulla, F. B. Bastani, I. L. Yen and R. A. Paul, “Empirical assessment of machine learning based software defect prediction techniques,â€International Journal on Artificial Intelligence Tools, vol. 17, Apr. 2008, pp. 389-400.
Singh, Y., Kaur, A., & Malhotra, “Empirical validation of object-oriented metrics for predicting fault proneness models,†Software quality journal, vol. 18, 2010, pp. 3-35.
R. Malhotra and A. Jain, “Fault prediction using statistical and machine learning methods for improving software quality,†Journal of Information Processing Systems, vol. 8, 2012, pp. 241-262.
A. Janes, M. Scotto, W. Pedrycz, B. Russo, M.Stefanovic and G.Succi, “Identification of defect-prone classes in telecommunication software systems using design metrics,†Information sciences, vol. 176, Dec. 2006, pp. 3711-34.
H. Turabieh, M. Mafarja and X. Li, “Iterated feature selection algorithms with layered recurrent neural network for software fault prediction,†Expert systems with applications, vol. 122, May. 2019, pp.27-42.
S. S. Rathore and S. Kumar, “Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems,†Knowledge-Based Systems, vol. 119, Mar. 2017, pp. 232-56.
A. T. Haouari, L. Souici-Meslati, F. Atil and D. Meslati, “Empirical comparison and evaluation of Artificial Immune Systems in inter-release software fault prediction,†Applied Soft Computing, vol. 96, Nov. 2020, pp. 106686.
https://doi.org/10.1016/j.asoc.2020.106686
H. Aljamaan and A. Alazba, “Software defect prediction using tree-based ensembles,†Proceedings of the 16th ACM international conference on predictive models and data analytics in software engineering, Nov. 2020, pp. 1-10.
P.S. Bishnu and V. Bhattacherjee, “Software fault prediction using quad tree-based k-means clustering algorithm,†IEEE Transactions on knowledge and data engineering, vol. 24, Jul. 2011, pp. 1146-150.
A. Hammouri, M. Hammad, M. Alnabhan and F. Alsarayrah, “Software bug prediction using machine learning approach,†International Journal of Advanced Computer Science and Applications, vol. 9, 2018.
K. O. Elish and M. O. Elish, “Predicting defect-prone software modules using support vector machines,†Journal of Systems and Software, vol. 81, May. 2008, pp. 649-60.
P. Singh and S. Verma, “Empirical investigation of fault prediction capability of object oriented metrics of open source software,†2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE), IEEE, May. 2012, pp. 323-327.
S.S. Rathore and A. Gupta, “Investigating object-oriented design metrics to predict fault-proneness of software modules,†2012 CSI Sixth International Conference on Software Engineering (CONSEG), IEEE, Sep. 2012, pp. 1-10.
G. Abaei, A. Selamat and H. Fujita, “An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction,†Knowledge-Based Systems, vol. 74, Jan. 2015 Jan, pp. 28-39.
I. H. Laradji, M. Alshayeb, L. Ghouti, “Software defect prediction using ensemble learning on selected feature,†Information and Software Technology, vol. 58, Feb. 2015, pp. 388-402.
Y. Zhou, B. Xu and H. Leung, “On the ability of complexity metrics to predict fault-prone classes in object-oriented systems,†Journal of Systems and Software, vol. 83, Apr. 2010, pp. 660-674.
P. He P, B. Li, X. Liu, J. Chen and Y. Ma, “An empirical study on software defect prediction with a simplified metric set,†Information and Software Technology, vol. 59, Mar. 2015, pp. 170-90.
A. Chug and S. Dhall, “Software defect prediction using supervised learning algorithm and unsupervised learning algorithm,†2013.
J. Li, P. He, J. Zhu and M.R. Lyu, “Software defect prediction via convolutional neural network,†2017 IEEE international conference on software quality, reliability and security (QRS), IEEE, Jul. 2017, pp. 318-328.
C. Pornprasit C and C.K. Tantithamthavorn, “Jitline: A simpler, better, faster, finer-grained just-in-time defect prediction,†2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, May. 2021, pp. 369-379.
S. Mehta and K.S. Patnaik, “Improved prediction of software defects using ensemble machine learning techniques,†Neural Computing and Applications, vol. 33, Aug. 2021, pp. 10551-562.
S.S. Rathore and S. Kumar, “An empirical study of ensemble techniques for software fault prediction,†Applied Intelligence, vol. 51, Jun. 2021 pp. 3615-44.
Y. Zhang, D. Lo,X. Xia,B. Xu B, J. Sun and S. Li, Combining software metrics and text features for vulnerable file prediction,†2015 20th International Conference on Engineering of Complex Computer Systems (ICECCS), IEEE, Dec. 2015, pp. 40-49.
I. Chowdhury and M. Zulkernine, “Can complexity, coupling, and cohesion metrics be used as early indicators of vulnerabilities?,†Proceedings of the 2010 ACM Symposium on Applied Computing, Mar, 2010, pp. 1963-1969.
H. Alves, B. Fonseca and N. Antunes, “Experimenting machine learning techniques to predict vulnerabilities,†2016 Seventh Latin-American Symposium on Dependable Computing (LADC), IEEE, Oct. 2016, pp. 151-156.
Y. Shin and L. Williams, “An empirical model to predict security vulnerabilities using code complexity metrics,†Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement, Oct. 2008, pp. 315-317.
K.Z. Sultana, V. Anu, T.Y. Chong, “Using software metrics for predicting vulnerable classes and methods in Java projects: A machine learning approach,†Journal of Software: Evolution and Process, vo.l 33, Mar. 2021, e2303.
A. Gupta, B. Suri, V. Kumar and P. Jain, “Extracting rules for vulnerabilities detection with static metrics using machine learning,†International Journal of System Assurance Engineering and Management, vol. 12, Feb. 2021, pp. 65-76.
Y. Shin and L. Williams, “An initial study on the use of execution complexity metrics as indicators of software vulnerabilities, Proceedings of the 7th International workshop on software engineering for secure systems, May. 2011, pp. 1-7.
S. Moshtari, A. Sami and M. Azimi, “Using complexity metrics to improve software security,†Computer Fraud & Security, vol. 5, May. 2013 May, pp. 8-17.
I. Chowdhury and M. Zulkernine, “Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities,†Journal of Systems Architecture, vol. 57, Mar. 2011, pp. 294-313.
H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl and Y. Acar, “Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits,†Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Oct. 2015, pp. 426-437.
J. Ren, Z. Zheng, Q. Liu, Z. Wei and H. Yan, “A buffer overflow prediction approach based on software metrics and machine learning,†Security and Communication Networks, Mar. 2019.
L. Kumar, S. K. Rath and A. Sureka, “Empirical analysis on effectiveness of source code metrics for predicting change-proneness,†Proceedings of the 10th Innovations in Software Engineering Conference, Feb. 2017, pp. 4-14.
L. Kumar, S. K. Rath and A. Sureka, “Using source code metrics to predict change-prone web services: A case-study on ebay services,†2017 IEEE workshop on machine learning techniques for software quality evaluation (MaLTeSQuE), IEEE, Feb. 2017, pp. 1-7.
D. Romano and M. Pinzger, “Using source code metrics to predict change-prone java interfaces,†27th IEEE international conference on software maintenance (ICSM), IEEE, Sep. 2011, pp. 303-312.
C. Liu, D. Yang, X. Xia, M. Yan M and X. Zhang, “Cross-project change-proneness prediction,†2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), IEEE, vol. 1, jul. 2018, pp. 64-73.
L. Kumar, S. Lal, A. Goyal and N.B. Murthy, “Change-proneness of object-oriented software using combination of feature selection techniques and ensemble learning techniques,†Proceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference), Feb. 2019, pp. 1-11.
G. Catolino and F. Ferrucci, “Ensemble techniques for software change prediction: A preliminary investigation,†In2018 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), IEEE, Mar. 2018, pp. 25-30.
E. Giger, M. Pinzger and H.C. Gall, “Can we predict types of code changes? an empirical analysis,†2012 9th IEEE working conference on mining software repositories (MSR), IEEE, Jun. 2012, pp. 217-226.
R. Abbas, F. A. Albalooshi and M. Hammad, “Software change proneness prediction using machine learning,â€2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), IEEE, Dec. 2020, pp. 1-7.
R. Malhotra and M. Khanna, “Investigation of relationship between object-oriented metrics and change proneness,†International Journal of Machine Learning and Cybernetics, vol. 4, Aug. 2013, pp. 273-86.
F. Toure, M. Badri and L. Lamontagne, “Investigating the Prioritization of Unit Testing Effort using Software Metrics,†ENASE, Apr. 2017 Apr, pp. 69-80.
L. Kumar, S.K. Rath and A. Sureka, “Using source code metrics and multivariate adaptive regression splines to predict maintainability of service oriented software,†2017 IEEE 18th international symposium on high assurance systems engineering (HASE), IEEE, Jan. 2017, pp. 88-95.
S.R. Moshin, M. Rahman, H. Parvez, O. Badreddin and S. Al Mamun, “Performance analysis of machine learning approaches in software complexity prediction,†Proceedings of International Conference on Trends in Computational and Cognitive Engineering, Springer, 2021 pp. 27-39.