SOFTWARE QUALITY PREDICTION USING MACHINE LEARNING TECHNIQUES AND  SOURCE CODE METRICS: A REVIEW

Santosh Saklani; Dr. Anshul Kalia; Dr. Sumesh Sood

doi:10.26483/ijarcs.v13i6.6918

PDF

Published: Dec 20, 2022

DOI: https://doi.org/10.26483/ijarcs.v13i6.6918

Keywords:

machine learning, software quality prediction, software vulnerabilities, source code metrics

Santosh Saklani

Dr. Anshul Kalia

Dr. Sumesh Sood

Abstract

Software quality prediction is the Machine Learning (ML) based technique in which ML models are trained using historical data. Output from these quality models can be used by software experts in the early phase of software development for improving the quality of software by controlling the various quality attributes like maintainability, reliability, security issues of software etc.Â In this study a systematic review of studies from 2005 to 2021 is performed.Â Studies that use ML techniques and source code metrics for Software Quality Prediction (SQP) are included for review. Study assesses the commonly used machine learning techniques and source code metric for SQP. Commonly used datasets, feature selection techniques and commonly used performance measures in software quality prediction are also assessed. In this paperÂ Â 53 primary studies are selected for systematic review. Results of this study prove that Bayesian Learning (BL), Regression, Ensemble Learning (EL), Decision Tree (DT) and Support Vector Machine (SVM) are most commonly ML techniques used for quality prediction which comprises 58%, 52%, 41%, 32%, and 32% of the overall studies respectively. It is also assessed that NASA, PROMISE, Apache, Mozilla Firefox and Eclipse are the most commonly used datasets for training and testing the SQP models. LOC, CC, CBO, RFC, WMC, LCOM, DIT and NOC are among the most commonly used source code metrics in SQP. Based on the results from the selected studies it is concluded that ML techniques and source code metricsÂ Â have the ability to improve the overall quality of the software.

Downloads

Download data is not yet available.

Issue

Vol. 13 No. 6 (2022): November-December 2022

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

Author Biographies

Santosh Saklani

Department of Computer Science

Himachal Pradesh University

Shimla, India

Dr. Anshul Kalia

Department of Computer Science

Himachal Pradesh University

Shimla, India

Dr. Sumesh Sood

Department of Computer Science

Himachal Pradesh University

Shimla, India

References

H.W. Jung, SG. Kim and C.S. Chung, â€œMeasuring software product quality: A survey of ISO/IEC 9126,â€ IEEE software, vol. 21, Oct. 2004, pp. 88-92.

T. Honglei, S. Wei and Z. Yanan, â€œThe research on software metrics and software complexity metrics,â€ 2009 International Forum on Computer Science-Technology and Applications, IEEE, vol. 1, Dec. 2009 pp. 131-136.

D. Azar, H. Harmanani and R. Korkmaz, â€œA hybrid heuristic approach to optimize rule-based software quality estimation models,â€ Information and Software Technology, vol. 51, Sep. 2009, pp. 1365-76.

M. JÃ¸rgensen, â€œSoftware quality measurement. Advances in engineering software,â€ vol. 30, Dec. 1999, pp. 907-12.

A.S. NuÃ±ez-Varela, H.G. PÃ©rez-Gonzalez, F.E. MartÃnez-Perez and C.Soubervielle-Montalvo, â€œSource code metrics: A systematic mapping study,â€ Journal of Systems and Software, vol. 128, Jun. 2017, pp. 164-97.

R. Malhotra, â€œA systematic review of machine learning techniques for software fault prediction,â€ Applied Soft Computing, vol. 27, Feb, pp. 504-18.

B. Khan, R. Naseem, M.A. Shah, K. Wakil, A. Khan, M. I. Uddin and M. Mahmoud, â€œSoftware defect prediction for healthcare big data: an empirical evaluation of machine learning techniques,â€ Journal of Healthcare Engineering,2021, Mar. 2021.

M. Gayathri and A. Sudha, â€œSoftware defect prediction system using multilayer perceptron neural network with data mining,â€ International Journal of Recent Technology and Engineering, vol. 3, May. 2014, pp. 54-59.

S. Agarwal and D. Tomar, â€œPrediction of software defects using twin support vector machine,â€ 2014 international conference on information systems and computer networks (ISCON), IEEE, Mar. 2014, pp. 128-132.

R. Malhotra, â€œAn empirical framework for defect prediction using machine learning techniques with Android software,â€ Applied Soft Computing, vol. 49, Dec. 2016, pp. 1034-50.

S.S. Rathore and S. Kumar, â€œA decision tree regression based approach for the number of software faults prediction,â€ ACM SIGSOFT Software Engineering Notes, vol. 41, Feb. 2016, pp. 1-6.

S. S. Rathore and S. Kumar, â€œAn empirical study of some software fault prediction techniques for the number of faults prediction,â€ Soft Computing, vol. 21, Dec. 2017, pp. 7417-434.

Y. Jiang, B. Cuki, T. Menzie and N. Bartlow, â€œComparing design and code metrics for software quality prediction,â€ Proceedings of the 4th international workshop on Predictor models in software engineering, vol. 12, May. 2008, pp. 11-18.

I. Gondra, â€œApplying machine learning to software fault-proneness prediction,â€ Journal of Systems and Software, vol. 81, Feb. 2008, pp. 186-95.

V. U. Challagulla, F. B. Bastani, I. L. Yen and R. A. Paul, â€œEmpirical assessment of machine learning based software defect prediction techniques,â€International Journal on Artificial Intelligence Tools, vol. 17, Apr. 2008, pp. 389-400.

Singh, Y., Kaur, A., & Malhotra, â€œEmpirical validation of object-oriented metrics for predicting fault proneness models,â€ Software quality journal, vol. 18, 2010, pp. 3-35.

R. Malhotra and A. Jain, â€œFault prediction using statistical and machine learning methods for improving software quality,â€ Journal of Information Processing Systems, vol. 8, 2012, pp. 241-262.

A. Janes, M. Scotto, W. Pedrycz, B. Russo, M.Stefanovic and G.Succi, â€œIdentification of defect-prone classes in telecommunication software systems using design metrics,â€ Information sciences, vol. 176, Dec. 2006, pp. 3711-34.

H. Turabieh, M. Mafarja and X. Li, â€œIterated feature selection algorithms with layered recurrent neural network for software fault prediction,â€ Expert systems with applications, vol. 122, May. 2019, pp.27-42.

S. S. Rathore and S. Kumar, â€œLinear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems,â€ Knowledge-Based Systems, vol. 119, Mar. 2017, pp. 232-56.

A. T. Haouari, L. Souici-Meslati, F. Atil and D. Meslati, â€œEmpirical comparison and evaluation of Artificial Immune Systems in inter-release software fault prediction,â€ Applied Soft Computing, vol. 96, Nov. 2020, pp. 106686.

https://doi.org/10.1016/j.asoc.2020.106686

H. Aljamaan and A. Alazba, â€œSoftware defect prediction using tree-based ensembles,â€ Proceedings of the 16th ACM international conference on predictive models and data analytics in software engineering, Nov. 2020, pp. 1-10.

P.S. Bishnu and V. Bhattacherjee, â€œSoftware fault prediction using quad tree-based k-means clustering algorithm,â€ IEEE Transactions on knowledge and data engineering, vol. 24, Jul. 2011, pp. 1146-150.

A. Hammouri, M. Hammad, M. Alnabhan and F. Alsarayrah, â€œSoftware bug prediction using machine learning approach,â€ International Journal of Advanced Computer Science and Applications, vol. 9, 2018.

K. O. Elish and M. O. Elish, â€œPredicting defect-prone software modules using support vector machines,â€ Journal of Systems and Software, vol. 81, May. 2008, pp. 649-60.

P. Singh and S. Verma, â€œEmpirical investigation of fault prediction capability of object oriented metrics of open source software,â€ 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE), IEEE, May. 2012, pp. 323-327.

S.S. Rathore and A. Gupta, â€œInvestigating object-oriented design metrics to predict fault-proneness of software modules,â€ 2012 CSI Sixth International Conference on Software Engineering (CONSEG), IEEE, Sep. 2012, pp. 1-10.

G. Abaei, A. Selamat and H. Fujita, â€œAn empirical study based on semi-supervised hybrid self-organizing map for software fault prediction,â€ Knowledge-Based Systems, vol. 74, Jan. 2015 Jan, pp. 28-39.

I. H. Laradji, M. Alshayeb, L. Ghouti, â€œSoftware defect prediction using ensemble learning on selected feature,â€ Information and Software Technology, vol. 58, Feb. 2015, pp. 388-402.

Y. Zhou, B. Xu and H. Leung, â€œOn the ability of complexity metrics to predict fault-prone classes in object-oriented systems,â€ Journal of Systems and Software, vol. 83, Apr. 2010, pp. 660-674.

P. He P, B. Li, X. Liu, J. Chen and Y. Ma, â€œAn empirical study on software defect prediction with a simplified metric set,â€ Information and Software Technology, vol. 59, Mar. 2015, pp. 170-90.

A. Chug and S. Dhall, â€œSoftware defect prediction using supervised learning algorithm and unsupervised learning algorithm,â€ 2013.

J. Li, P. He, J. Zhu and M.R. Lyu, â€œSoftware defect prediction via convolutional neural network,â€ 2017 IEEE international conference on software quality, reliability and security (QRS), IEEE, Jul. 2017, pp. 318-328.

C. Pornprasit C and C.K. Tantithamthavorn, â€œJitline: A simpler, better, faster, finer-grained just-in-time defect prediction,â€ 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, May. 2021, pp. 369-379.

S. Mehta and K.S. Patnaik, â€œImproved prediction of software defects using ensemble machine learning techniques,â€ Neural Computing and Applications, vol. 33, Aug. 2021, pp. 10551-562.

S.S. Rathore and S. Kumar, â€œAn empirical study of ensemble techniques for software fault prediction,â€ Applied Intelligence, vol. 51, Jun. 2021 pp. 3615-44.

Y. Zhang, D. Lo,X. Xia,B. Xu B, J. Sun and S. Li, Combining software metrics and text features for vulnerable file prediction,â€ 2015 20th International Conference on Engineering of Complex Computer Systems (ICECCS), IEEE, Dec. 2015, pp. 40-49.

I. Chowdhury and M. Zulkernine, â€œCan complexity, coupling, and cohesion metrics be used as early indicators of vulnerabilities?,â€ Proceedings of the 2010 ACM Symposium on Applied Computing, Mar, 2010, pp. 1963-1969.

H. Alves, B. Fonseca and N. Antunes, â€œExperimenting machine learning techniques to predict vulnerabilities,â€ 2016 Seventh Latin-American Symposium on Dependable Computing (LADC), IEEE, Oct. 2016, pp. 151-156.

Y. Shin and L. Williams, â€œAn empirical model to predict security vulnerabilities using code complexity metrics,â€ Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement, Oct. 2008, pp. 315-317.

K.Z. Sultana, V. Anu, T.Y. Chong, â€œUsing software metrics for predicting vulnerable classes and methods in Java projects: A machine learning approach,â€ Journal of Software: Evolution and Process, vo.l 33, Mar. 2021, e2303.

A. Gupta, B. Suri, V. Kumar and P. Jain, â€œExtracting rules for vulnerabilities detection with static metrics using machine learning,â€ International Journal of System Assurance Engineering and Management, vol. 12, Feb. 2021, pp. 65-76.

Y. Shin and L. Williams, â€œAn initial study on the use of execution complexity metrics as indicators of software vulnerabilities, Proceedings of the 7th International workshop on software engineering for secure systems, May. 2011, pp. 1-7.

S. Moshtari, A. Sami and M. Azimi, â€œUsing complexity metrics to improve software security,â€ Computer Fraud & Security, vol. 5, May. 2013 May, pp. 8-17.

I. Chowdhury and M. Zulkernine, â€œUsing complexity, coupling, and cohesion metrics as early indicators of vulnerabilities,â€ Journal of Systems Architecture, vol. 57, Mar. 2011, pp. 294-313.

H. Perl, S. Dechand, M. Smith, D. Arp, F. Yamaguchi, K. Rieck, S. Fahl and Y. Acar, â€œVccfinder: Finding potential vulnerabilities in open-source projects to assist code audits,â€ Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Oct. 2015, pp. 426-437.

J. Ren, Z. Zheng, Q. Liu, Z. Wei and H. Yan, â€œA buffer overflow prediction approach based on software metrics and machine learning,â€ Security and Communication Networks, Mar. 2019.

L. Kumar, S. K. Rath and A. Sureka, â€œEmpirical analysis on effectiveness of source code metrics for predicting change-proneness,â€ Proceedings of the 10th Innovations in Software Engineering Conference, Feb. 2017, pp. 4-14.

L. Kumar, S. K. Rath and A. Sureka, â€œUsing source code metrics to predict change-prone web services: A case-study on ebay services,â€ 2017 IEEE workshop on machine learning techniques for software quality evaluation (MaLTeSQuE), IEEE, Feb. 2017, pp. 1-7.

D. Romano and M. Pinzger, â€œUsing source code metrics to predict change-prone java interfaces,â€ 27th IEEE international conference on software maintenance (ICSM), IEEE, Sep. 2011, pp. 303-312.

C. Liu, D. Yang, X. Xia, M. Yan M and X. Zhang, â€œCross-project change-proneness prediction,â€ 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), IEEE, vol. 1, jul. 2018, pp. 64-73.

L. Kumar, S. Lal, A. Goyal and N.B. Murthy, â€œChange-proneness of object-oriented software using combination of feature selection techniques and ensemble learning techniques,â€ Proceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference), Feb. 2019, pp. 1-11.

G. Catolino and F. Ferrucci, â€œEnsemble techniques for software change prediction: A preliminary investigation,â€ In2018 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), IEEE, Mar. 2018, pp. 25-30.

E. Giger, M. Pinzger and H.C. Gall, â€œCan we predict types of code changes? an empirical analysis,â€ 2012 9th IEEE working conference on mining software repositories (MSR), IEEE, Jun. 2012, pp. 217-226.

R. Abbas, F. A. Albalooshi and M. Hammad, â€œSoftware change proneness prediction using machine learning,â€2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), IEEE, Dec. 2020, pp. 1-7.

R. Malhotra and M. Khanna, â€œInvestigation of relationship between object-oriented metrics and change proneness,â€ International Journal of Machine Learning and Cybernetics, vol. 4, Aug. 2013, pp. 273-86.

F. Toure, M. Badri and L. Lamontagne, â€œInvestigating the Prioritization of Unit Testing Effort using Software Metrics,â€ ENASE, Apr. 2017 Apr, pp. 69-80.

L. Kumar, S.K. Rath and A. Sureka, â€œUsing source code metrics and multivariate adaptive regression splines to predict maintainability of service oriented software,â€ 2017 IEEE 18th international symposium on high assurance systems engineering (HASE), IEEE, Jan. 2017, pp. 88-95.

S.R. Moshin, M. Rahman, H. Parvez, O. Badreddin and S. Al Mamun, â€œPerformance analysis of machine learning approaches in software complexity prediction,â€ Proceedings of International Conference on Trends in Computational and Cognitive Engineering, Springer, 2021 pp. 27-39.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Santosh Saklani

Dr. Anshul Kalia

Dr. Sumesh Sood

References

Most read articles by the same author(s)