EFFORT ESTIMATION OF OBJECT ORIENTED SYSTEM USING STOCHASTIC TREE BOOSTING TECHNIQUE

: Effort Estimation is one of the necessary and daunting tasks in software engineering. Effort Estimation means to predict the effort required to develop the software project. Predicting the effort with high precision is an ultimatum that draws the concern of researchers. In a need to develop best products within proper schedule, the work of proper effort estimation is of basic necessity. No doubt, there are a lot of effort estimation techniques which are already developed like COCOMO (Cost Constructive Model) etc. but these effort estimation techniques have sustained unsuitable for estimation of object oriented software because they are used for procedural programming concept. Presently, object oriented concept is frequently used in practice and as Class is the base of object oriented design so the use of Class Point approach(CPA) to estimate the effort supports the estimator in a much better way. The performance of model obtained using CPA can be upgraded by applying Stochastic Tree Boosting (STB) technique over forty project dataset collected from different sources in order to improve its prediction accuracy.


STOCHASTIC TREE BOOSTING TECHNIQUE
Stochastic Tree Boosting means to randomly select the values from dataset and then fit those values in tree in order to estimate the effort required to develop the software. Tree is a binary tree of depth 3 and each node of tree consists of values from one project of the dataset. In this technique, firstly random number of values are selected from dataset and fitted in the first tree and values in its terminal nodes are processed. After processing the terminal nodes values in first tree again random number of values are selected and fitted in second tree and this process continues until the values are fitted in desired number of trees. After repeating this process for (number of trees) times, the values of each node are processed in order to estimate the effort of each node.

ERROR AND ACCURACY ESTIMATION METRICS
The evaluation of the values obtained using Stochastic Tree Boosting Technique is done by applying certain metrics as defined below [1][2][3][4] The Magnitude of Relative Error (MRE) [1] for each observation i can be obtained as: Where AE i = Original effort value collected from the dataset for the ith validation data. PE i = Output (predicted effort) obtained using the developed model for the ith validation data. TP = Total no. of projects in the validation set. The Mean Magnitude of Relative Error (MMRE) [1] can be obtained through the summation of Magnitude of Relative Error (MRE) over N observations: Where AE i = Original effort value collected from the dataset for the ith validation data. PE i = Output (predicted effort) obtained using the developed model for the ith validation data. TP = Total no. of projects in the validation set. The Root Mean Square Error (RMSE) [4] is calculated as the square root of mean square error (MSE). MSE is calculated by finding out the mean of the square of the difference between the actual and predicted effort value.
Where AE i = Original effort value collected from the dataset for the ith test data. PE i = Output (predicted effort) obtained using the developed model for the ith test data. TP = Total no. of projects in the test set. The Prediction Accuracy (PRED (y)) [1] is PRED can be described as: Where AE i = Original effort value collected from the dataset for the ith test data. PE i = Output (predicted effort) obtained using the developed model for the ith test data. TP = Total no. of projects in the test set.

LITERATURE SURVEY
The COnstructive COst Model (COCOMO) produced by Barry Boehm in 1981 [5] provides a great deal of material that explains exactly what costs the model is estimating, and why it comes up with the estimates it does. R. T. Hughes [6] has proposed a model based on expert judgment by a group of experts to utilize their experiences for estimation of proposed software. 'Expert judgment' is defined as the consultation of one or more experts. In general, this model assumes that expert judgment is where an estimate is based on the experience of one or more people who are familiar with the development of software applications similar to that currently being sized. The Delphi technique [7] can be used to provide communication and cooperation among experts.  [11] used SGB Technique for effort prediction required to develop various software projects using both the class point and the use case point approach. SGB technique considers a function iteratively in a series and combines the output of each function with a weighting coefficient in order to minimize the total error of prediction and increase the accuracy. Furthermore, he compares the models obtained using the SGB technique with the other machine learning techniques in order to highlight the performance achieved by each method.

PROPOSED WORK
The proposed work is based on data derived from forty student projects [9] developed using Java language. STB (Stochastic Tree Boosting) based effort estimation model which is used to estimate the effort required to develop the software has been developed using forty project dataset.

CLASS POINT APPROACH
The CPA was given by Costagliola [9] Where min(Y) represents the minimum value of the dataset Y and max(Y) represents the maximum value of the dataset Y.

Partitioning of Dataset:
The dataset is partitioned into three sets i.e. learning set, validation set and test set.

Performing STB Execution:
The values of various parameters such as number of trees, and stochastic factor are taken and then STB steps are executed on learning set, validation set and test set. 6. Performing Validation: After completing STB execution, a five-stage validation is performed which produces five prototype models. The model that gives the minimum error (minimum RMSE and minimum MMRE) and the maximum accuracy (maximum PRED (y)) values is selected as the best model for each stage.

EXPERIMENTAL DETAILS:
In the proposed research study, the dataset collected from Costagliola [9], shown in Table 1, is used. In this table, every row displays the details of one project developed in the JAVA language values of CP1, CP2 and the actual effort (denoted by EFH) expressed in terms of person-hours required to successfully complete the project. After estimating the final class point values, the dataset is then sorted based on CP and then brought into proper decimal. The dataset in proper decimal format is partitioned into three sets shown in Figure 1.

PARTITIONING PROCEDURE FOR DATASET
1. Firstly, the dataset in proper decimal format is partitioned into a training set and a test set. The test set consists of every fifth tuple of dataset i.e. it consists of eight tuples and remaining thirty two tuples are present in training set. Selecting the parameters, validation and error estimation is done using training set and prediction accuracy is estimated using test set.

2.
After partitioning the dataset in training set and test set, the training set is further divided into validation set and learning set. Validation set consists of every fifth tuple of training set i.e. it consists of six tuples and remaining twenty six tuples are present in learning set. Selection of parameters is done using learning set and validation and error estimation is done using validation set.

STB EXECUTION
To estimate the effort required to develop the software using STB technique, the following steps are used: 1. A random percentage of rows are selected using stochastic factor. If stochastic factor is 0.5, then 50% of rows are selected.

IMPLEMENTATION
Proper estimation of effort is very essential in order to improve reliability of software development processes. Among various estimation methods, the estimation of the effort of software is done using Class Point Approach. The parameters are optimized using the Stochastic Tree Boosting technique to achieve better accuracy. The implementation of proposed work is done using XML, Java with NetBeans IDE and MATLAB. Figure 2 shows the Predicted Effort after completing the Effort estimation process.  Table 2 shows that 3 rd Fold is the best fold as it provides min MMRE, min RMSE and max Prediction accuracy.  Table 3 shows the Final Predicted Effort of forty project dataset. Figure 3 depicts the relationship between Software Size (Class Points) and Actual Effort.

CONCLUSIONS
The research work proposed in this research paper is beneficial for software developers, system analysts, and product experts. Class Point Approach (CPA) is used for object-oriented software and I have extended this approach by employing Stochastic Tree Boosting Technique to provide more precise estimation result. I have observed different results (error and prediction accuracy values) obtained using STB and comparisons are made using graphs. The results show that the STB-based effort estimation model possesses lower MMRE, NRMSE and higher prediction accuracy. So, we can conclude that effort estimation using the STB-based model provides results with better precision.

FUTURE SCOPE OF WORK
This research work can be further extended by applying some other machine learning techniques for the software development effort estimation purpose. There are various machine learning methods such as Decision Tree Forest, Random Forest and Support Vector Regression etc. which can be implemented and compare their results with the results of the STB technique to measure their precision.