A REVIEW ON ROAD ACCIDENT DETECTION USING DATA MINING TECHNIQUES

: Transportation has evolved greatly over time. With modern technology, the automobile industry has obtained new heights with respect to comfort, speed, efficiency and security. Despite improvement in technology, there has been increase in the rate of accidents. A large number of precious lives are lost because of road traffic accidents every day. The common reason behind road accident is driver’s mistake. It is essential to have effective road accident detection mechanism to save life. Data mining techniques are widely used for road accident detection. The main focus of this survey is to provide an overview of the literature in road accident detection with various techniques and approaches implemented in them, their merits and demerits etc. Comparison based on parameters is also done to prove the efficiency of the various road detection techniques and approaches. The comparison result shows the best road accident detection method.


I. INTRODUCTION
Nowadays due to road accidents [1] a large number of lives are lost. From an analysis it has been estimated that for every year over 3,00,000 persons die and 10 to 15 million peoples are injured due to road accidents in the entire world. The accidents are classified into the following types are [2]:  Fatality: an accident or incident resulting in a fatality either immediately.  Major injury Accidents: are accidents which results in a significant injury, damage or loss.  Minor accidents: are accidents which result in an injury, damage or loss but do not cause significant harm to a person.  Lost time accidents: are accidents which result in an employee being absent from work for more than half day.  Never miss incidents: result in no apparent damage or injury.  Dangerous Occurrences: are specific incidents as defined by Reporting of Injuries, Diseases and Dangerous Occurrences Regulations. Data Mining [3] has been proven as a consistent technique to analysis road accidents and it also provide productive results. Most of the road accident data analysis are done by the data mining processes such as feature selection, clustering and classification to identify factors that affect the severity of an accident. In this paper, various data mining techniques for detection of road accidents are analyzed based on their merits and demerits and compared in terms of accuracy, precision, recall and F-measure.

II. RESEARCH METHODOLOGY
The road accidents were detected by using deep learning algorithm [4]. The paired tokens in the collected three million tweets captured the association rules which improve the traffic accident detection accuracy. Then, Deep Belief Network (DBN) and Long Short-Term Memory (LSTM) were applied on the extracted tokens. The results of these learning algorithms are given as input to the classification algorithms supervises Latent Dirichlet Allocation (sLDA) and Support Vector Machine (SVM) which detect the traffic accidents.
A framework [5] was proposed to analysis the road accidents using the rule mining approach. A raw data was collected from Emergency Management research Institute (EMRI). It serves and keeps track of every accident record on every type of road. The collected data are converted into structured format by applying filtering techniques. Then the accident data were clustered by applying the hybrid clustering based on enhanced K-means clustering. It clusters the accident data by splitting the input array into sub arrays based on the distance between the elements in clusters. Finally, association rule mining was applied to identify the circumstance in which an accident may occur for each cluster. The outcome of this technique was utilized to take some accident prevention efforts in the areas identified for different categories of accidents in a way to reduce the number of accidents. Some classification techniques [6] were used to predict the severity of injury occurred during traffic accidents. The classification algorithms such as Random Forest, AdaBoostM1, Naïve Bayes, J48 and PART were investigated and compared these algorithms performance based on injury severity. It includes labels of severity, road classification, district council district, rain, hit and run, type of collision, natural light, number of vehicles involved degree of injury, number of causalities injured, pedestrian action, casualty sex, casualty age, vehicle class of driver or passenger casualty, year manufacture, role of casualty, driver age, driver sex, vehicle class and severity of accident. The three classes of severity of injury are based on casualty, based on accident and based on vehicle.
For accident detection, a preliminary real time autonomous accident detection system [7] was proposed. For the accident detection, data were collected from the sensors and it was integrated with the event log to extract the most discriminative features. It extracted features such as average velocity difference between reading at time T and T+1, weekday or weekend, average capacity usage difference between reading at time T and T+1, Average occupancy difference between reading at time T and T+1, occurrence of accident or event at rush hours. These features were fed into a regression tree, neighbor model and feed forward neural network model. It predicts the possibility of occurrence of an accident Data mining algorithms [8] were introduced for classification of vehicle collision patterns in road accidents. It derived the classification rules which can be utilized for prediction of vehicle collision patterns. Initially the training set was taken and then noisy, inconsistent and incomplete data were removed by applying data cleaning process. The preprocessed data is converted into an appropriate form for mining. Then the attribute space of a feature set was reduced for classification of vehicle collision. It can be achieved by applying the feature selection algorithms such as Multi valued Oblivious Decision Tree (MODTree) filtering, feature ranking, Correlation based Feature Selection (CFS), Mutual Information Feature Selector (MIFS) and Fast Correlation Based Filter (FCBF) algorithm. The selected features are used in different classification algorithms namely Naïve Bayes, C4.5, Classification and Regression Trees (C&RT), RndTree, Decision List, rule induction and random tree.
A Multi-class Support Vector Machine [9] was introduced to predict causes of traffic road accidents. A real time data is collected from police department in Dubai. Then, a typical data mining framework was applied on the collected data. The framework consists of three steps namely preprocessing, mining patterns and post processing. In the preprocessing step the data gets into the processes of data cleaning, deals with unknown and missing data, feature selection and also it take stock of unbalanced data. The format of the data was converted into such a form which can be accepted by SVM. Finally, in the post processing step Multi-SVM was applied which predict the causes of traffic road accidents.
A new method [10] was proposed to detect the road accidents based on temporal data mining. This method employed ternary numbers time series model was constructed that reflected the state of the traffic flow based on cell transmission model. The computational cost and the linear drift between time series were handled by Discrete Fourier transform. It transformed the time domain data into frequency domain data. Then Euclidean distance was calculated for transformed time series data and based on this measure accident was detected.
In order to analysis and predict the nature of road accident a method [11] was proposed based on data mining techniques. Here, Random Forest, Naïve Bayes and J48 algorithm were chosen to analysis road accident data in the state of Maharashtra. Finally, the Apriori association rule mining algorithm was applied to determine the relationship between independent variables with respect to the nature of accidents.
For analysis of traffic accident, Artificial Neural Network and Decision trees techniques [12] were employed. For the analysis of traffic accident, the data were collected from one of the busiest roads of Nigeria. The collected data was arranged into categorical and continuous data. The categorical data of road accident were analyzed by using Decision tree technique. Artificial Neural Network was applied on the continuous data of accidents.
The road accident data were analyzed by using proposed data mining framework [13]. In this framework, K-modes clustering K-modes clustering was used as a preliminary task to segment the road accident data. By applying the association rule mining technique, the various circumstances which were associated with the occurrence of the accident were identified. It was identified for both the dataset and the clusters were identified by introducing K-modes clustering algorithm. Then the results of cluster based analysis and dataset analysis were compared and it was captured from the analysis that was the combination of association rule mining and k modes clustering was producing crucial information effectively.
A data mining approach [14] was proposed to analyse road accidents in India. The intention of this approach was to create a model which sort out the heterogeneity of the data by grouping the similar objects together to find the accident prone areas in the country with respect to different accident factors. This was also used to determine the association between these factors and casualties. To group the similar objects of the heterogeneous data, K means clustering was employed. In K-means clustering, K was chosen randomly which is considered as initial centroids. Then, Euclidean distance between each data point and the centroids is calculated. The changes in the centroids are based on the Euclidean distance. This was continued until there is no change in the centroids. Finally, the decision tree classification was applied to analysis the road accidents.
For automatic road detection, a novel approach [15] was proposed. The novel approach was based on detection of damage vehicles from the collected footage from surveillance cameras. It observed the occurrence of road accident. A new supervised learning method with three stages was proposed for road accident detection. These three stages were comprised into a single framework in a serial manner. These were used five Support Vector Machine trained with Histogram Of Gradient (HOG) and Gray Level Co-occurrence features. sThe supervised learning was worked as a binary classifier which distinguished the data containing a damaged car as class 1 and data not containing damaged car as class 2.

A. Comparison of Research Methodologies
The road accident detection methods described in the above section is analyzed and compared based on methods used, their merits, demerits and the parameters used in experimental results. The comparison is given in Table I. In table I, the different methods for road accident detection are analyzed based on accuracy, precision, recall and F-measure. The Preliminary real time autonomous accident detection system [7] has better accuracy of 99.79% than other methods, Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11] method has better precision of 0.983 than other methods, Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11] method has better recall of 98.3 than other methods and Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11] method has better fmeasure of 98.3 than other methods.

III. PERFORMANCE EVALUATION
The performance of the efficient methodologies in the literature are analyzed and compared among them to determine the comparative performance efficiency. The methods considered for analysis are Long Short-Term Memory, Deep Belief Network, supervised Latent Dirichlet Allocation, Support Vector Machine [4], Hybrid Clustering, association rule mining [5], Random Forest, AdaBoostM1, Naïve Bayes, J48, PART [6], Preliminary real time autonomous accident detection system [7], Naïve Bayes, C4.5, C&RT, RndTree, Decision List, rule induction, random tree [8], Multi-class Support Vector Machine [9], Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11], Decision Trees, Neural Networks [12], K means clustering, Decision tree [14] and Support Vector Machines [15]. The comparison is done by the experimental results of the methods in terms of accuracy, precision, recall and F-measure.

A. Accuracy
Accuracy is described as the closeness of a measurement to the true value. It is given as  [4], Hybrid Clustering, association rule mining [5], Random Forest, AdaBoostM1, Naïve Bayes, J48, PART [6], Preliminary real time autonomous accident detection system [7], Naïve Bayes, C4.5, C&RT, RndTree, Decision List, rule induction, random tree [8], Multi-class Support Vector Machine [9], Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11], Decision Trees, Neural Networks [12], K means clustering, Decision tree [14] and Support Vector Machines [15]. X axis denotes the methods and Y axis denotes the accuracy in %. The graph clearly shows that the preliminary real time autonomous accident detection system [7] has high accuracy than other methods.

B. Precision
Precision is the evaluated according to the road accident prediction at true positive and false positive prediction.    [4], Multi-class Support Vector Machine [9], Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11], Decision Trees, Neural Networks [12], K means clustering, Decision tree [14] and Support Vector Machines [15]. X axis denotes the methods and Y axis denotes the precision. The graph clearly shows the J48 [11.1] method has high precision than other methods.

C. Recall
Recall is evaluated according to the classification of data at true positive and false negative predictions. Fig. 3, shows the comparison of recall between Multi-class Support Vector Machine [9], Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11], Decision Trees, Neural Networks [12], K means clustering, Decision tree [14] and Support Vector Machines [15]. X axis denotes the methods and Y axis denotes the recall. The graph clearly shows the J48 [11.1] method has high recall than other methods.     Fig. 4, shows the comparison of F-measure between Multiclass Support Vector Machine [9], Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11] and Decision Trees, Neural Networks [12]. X axis denotes the methods and Y axis denotes the F-measure. The graph clearly shows the J48 [11.1] has high f-measure than other methods.

IV. CONCLUSION
Road accident detection is considered to be the contemporary ever growing process focused primarily to reduce death. Here this paper provides the recent developments in the road accident detection techniques by analyzing the novel ideas. The analysis of these methods provides better understanding of the steps involved in each process in a way of consequently increasing the scope for finding the efficient techniques to achieve maximum accurate performance. The comparison of the efficient techniques is carried out in terms of accuracy, precision, recall and f-measure. The survey concludes that the preliminary real time autonomous accident detection system [7] method was efficient in terms of accuracy and Naïve Bayes, J48, Random Forest algorithm, Apriori association rule mining [11] method was efficient in terms of precision, recall and F-measure. This survey also helps in deriving the motivation for our future research work.