TRAFFIC ANALYSIS AND PREDICTION SYSTEM BY THE USE OF MODIFIED ARIMA MODEL

Traffic Prediction is critical as it is enhancing day by day leading to worst on road situations. Increased accidents and delays in critical applications is causing awful situations for the user. In order to resolve the problem Modified ARIMA is used. Modified ARIMA is implied over the dataset. The dataset for the implication is fetched from online source. The UCI website is used for traffic dataset time series analysis. Modified ARIMA is used to make stationery time series from dynamic series at AR phase. MI phase is used to predict number of previous values to be analyzed and MA phase is hybridized using KNN with Euclidean distance. The result of the proposed literature is presented in terms of accuracy and mean square error. Result shows significant improvement in terms of accuracy and means square error.


INTRODUCTION
Traffic is enhancing by leaps and bounds. Giving accurate prediction regarding traffic is need of the hour. The requirement of traffic prediction is critical in time specific applications. The environment such as ambulance movement, student needs to get back to exams or any other time critical application has significant usage of this research. Proposed literature utilizes techniques of data mining and ARIMA model for time series analysis.
The data that utilized for analyzing traffic at specified time is traffic flow data. This would help in predicting future traffic. The proper action is needed to be taken for reducing the congestion. By the use of traffic flow data this reduction of congestion has to be done as future prediction can be made. Data Mining has quickly grown with the presence of the wonder BIG Data [1]. For sure, numerous associations have begun to digitize their records, and have changed their paper-based frameworks to electronic frameworks. This change conveys a few advantages to the associations, among them time funds, a superior administration and a more tightly checking making the assignments less demanding. One of the immediate results of this change is the visit gathering of significant Data. While the Data's holders started to stress over the capacity of Data, they understood the benefits they can take from it. The Data gathered can be considered as another unformatted of structure (Raw Data) which needs to be filtered. Handling Data give a superior quality Data which contribute in request to make choice in data selection [2]. Moreover, Healthcare elements likewise choose electronic frameworks, by utilizing different strategies, among them, Electronic health Record (EHR) or Electronic Traffic Records (EMR) frameworks. It implies the executing EHR frameworks, leads to an immense measure of Data gathered by doctor's facilities, centers and other traffic suppliers.
[3] At that point, the vast majority of these Datasets are most certainly not extremely very much organized and fitting for explanatory purposes. In expansion, traffic Data are generally extremely perplexing and difficult to investigate. For instance the US Healthcare framework alone as of now achieved 150 Exabyte (1 Exabyte = 8388608 Terabit) five years prior. This pattern is because of the way that multi scale Data created from people is consistently expanding, especially with the new high-throughput sequencing stages, continuous imaging, and purpose of care gadgets, also as wearable figuring and versatile traffic innovations. As needs be, Data Mining has gotten a great deal of consideration on account of its solid capacity of separating Data from Data, furthermore, winds up noticeably prevalent in Healthcare field by dint of its productive diagnostic procedure for recognizing obscure and significant Data in traffic Data [1], [4].
In Traffic Prediction, Data Mining gives a few advantages for example, discovery of the extortion in traffic coverage, accessibility of therapeutic answer for the patients at lower cost, discovery of reasons for ailments and recognizable proof of therapeutic treatment techniques. It likewise helps the Healthcare analysts for making productive Healthcare approaches, developing medication suggestion frameworks, creating traffic profiles of people and so on [1], [2], [5], [6]. Taking such a case, McKinsey gauges that enormous Data examination can empower more than 300 billion in investment funds for every year in U.S. Medicinal services, 66% of that through decreases of around 8 percent in national Healthcare consumptions. Clinical operations and R & D (innovative work) are two of the biggest ranges for potential reserve funds with 165 billion furthermore, 108 billion in waste individually The result of Data Mining advancements are to give advantages to Healthcare association for gathering the patients having comparative sort of infections or traffic problems so that Medicinal services association gives them successful medications [6]. It can likewise valuable for anticipating the length of remain of patients in healing center, for restorative conclusion and making arrangement for compelling Data framework administration. Late innovations are utilized as a part of restorative field to improve the restorative administrations in practical way. Data Mining methods are additionally used to examine the different elements that are in charge of sicknesses for instance sort of nourishment, diverse working condition, instruction level, living conditions, accessibility of unadulterated water, human services administrations, social, natural and rural variables.
In this paper, we introduce the upsides of Data Mining for traffic and the reasons make Data Mining critical to be considered in traffic Data examination. Data mining traffic Dataset with missing values is considered to be analyzed initially through Support vector machine and accuracy is analyzed and after that ARIMA with KNN and Euclidean distance is used for rectification and analysis purpose [6]- [8][9] [10]. Accuracy is observed in both the cases to prove worth of the study.

STUDY OF EXISTING LITERATURE
Data mining approaches is the base of this literature. Analysis of existing literature provide base for proposed literature. [11] Reviewed various models and methods used within data mining. Data mining techniques development from 2005 to 2015 is reviewed and application in regards to traffic is proposed. [1] Suggests the integration of traffic data with data mining strategies used to form traffic information system. Patient traffic condition can be analyzed along with future prediction about patient's health. Hidden possibilities can be extracted using unlimited data mining techniques to make accurate health forecast. [12] Proposed multilayer perceptron in order to analyze big data corresponding to traffic. As literature deals with traffic of patients hence high degree of accuracy is desired. To accomplish the desired goal comparison of SVM and multilayer perceptron on traffic data set is made. Results of SVM in terms of classification are better as compared to multilayer perceptron.
[3] Suggests data mining techniques used for analysis of diabetics. Support Vector Machine (SVM) is used for this purpose. Genetic approach is also analyzed for diabetic's dataset in the field of data mining. Results of SVM are obtained to be better. [13] Suggests five J.48 classifiers to predict hypertension and eight other diseases. Prediction accuracy is obtained and compared against naïve bayes approach. Results in terms of J.48 are obtained to be better. [7] Suggests hybrid approach for traffic to predict diseases using Big data. Pruning based KNN is used for this purpose which used density based clustering based method integrated with KNN approach. Local outlier factor of PB-KNN is better as compared to KNN. [14] proposes SVM and neural network techniques for skin lesion detection in human body. Segmentation along with classification is performed in order to detect the diseases. [8]predict heart diseases are primary cause of death among humans in last decade. Data mining techniques are used in order to detect and predict heart diseases efficiently. [4] proposes a mechanism through which information about patient coming for checkup at hospital is stored and algorithm is applied in order to perform predictions. Data mining algorithm considered in this approach is naïve bayes. Accuracy of prediction is obtained is significant in this case. [15] suggests intelligent heart disease prediction system. Decision tree , naïve bayes and neural network technique are used for accurate analysis and prediction of disease.
Analyzed approaches enhance performance considering datasets not including any noisy or missing values. Missing values or noisy data handling and increasing prediction accuracy is primary task of proposed approach.

PROPOSED SYSTEM
Proposed system uses ARIMA model for time series analysis. ARIMA is hybridised by the use of KNN and Euclidean distance mechanism to achieve better accuracy and reduce Mean square error

ARIMA
For precise forecast of infection recognition the Auto backward moving normal model is utilized. By utilizing scientific model the alteration in time arrangement are to be done in ARIMA.This model depends on modification of watched qualities. The objective is to get the distinctions of watched esteem and esteem gotten from the model near zero. This model can foresee precisely distinction between the stationery and non stationery arrangement. Property 3 is also known as transitive dependency. Distance if close to zero then prediction is accurate otherwise error is recorded. Error calculating metric is applied to determine accuracy of the approach. Accuracy is given as Accuracy=1-Error_rate where Error_rate is given as Error_rate= KNN is used in many distinct environments such as classification, interpolation, problem solving, teaching and learning etc.
Major limitation of KNN is that its performance depends upon value of k. Accuracy is low and further work is required to be done to improve accuracy.

EUCLIDEAN DISTANCE
[18]The simplest method for prediction and grouping is Euclidean distance where the distance has been utilized in order to evaluate the deviations. Distance can be defined in several ways. Let is the distance of points in terms of x coordinate and is the distance in terms of y coordinate. The Euclidean distance is defined as Where i define range of values from 1 to n. All the components of vectors are taken equally and no correlation is evaluated in this case. The result of Euclidean distance equation can be normalized. This is accomplished as Where averaging is taken over all the vectors in the dataset. The scaled distance is obtained using the following equation The scaled distance is adjusted value so that obtained result lie between the specified range. The metric is used to evaluate errors.
[19]- [21] For observing errors and accuracy Mean root square error mechanism is to be utilized. Accuracy and error rate is inversely proportional to each other. This equation is used to evaluate Root Mean square error. Lower the value of RMS more accurate a prediction. Advantage of this approach is, convergence rate is better but disadvantage is that it can work over limited values. Non negative values are allowed and hence result always lies between 0 and 1.

Decision Implementation
The implementation of decision is final round in the visualization and rerouting of traffic monitoring. The people flow information is extracted from dataset and then using visualization meaningful information is extracted and used to divert the traffic to different routes in case of heavy congestion. Future prediction is made using this phase.

METHODOLOGY
The methodology for the proposed system is described as under

Algorithm
The proposed work starts by extracting datasets from UCI related to Traffic. The data which is extracted is applied with feature extraction. The extracted features are applied with compression techniques to reduce the size of the data. This is critical in order to use bandwidth efficiently. Classifier such as KNN + EUCLIDEAN DISTANCE is used to determine the future predictions concerning to traffic. Comprehensive approach is listed as follows 1. Obtain the dataset from UCI website related to Traffic data.
2. Apply Feature extraction mechanism to extract only required attributes. Clustering K-Means mechanism is utilized for this purpose.
3. Obtain data is passed through a classifier in order to obtain realistic future predictions. 4. Check for heavy traffic through prediction and apply KNN+Euclidean distance algorithm to determine closest neighbors and reduce error rate if any in terms of MSE and RMSE.

Flowchart
The flowchart describing working is listed as under

PERFORMANCE AND RESULTS
Simulation is conducted in MATLAB and difference table is observed. The first order and second order difference is taken and neighborhood is plotted, the performance obtained to be better as compared to ARIMA without KNN+Euclidean distance. Misclassification is then noted which is a difference between accuracy and actual value. In other words error rate is depicted through this plot.   Figure 9:Comparative Analysis of proposed and existing technique

CONCLUSION
The traffic prediction is critical as traffic is enhancing due to increase in on road vehicles. The proposed literature uses modified ARIMA model for prediction of traffic accurately. The results are predicted in terms of accuracy and mean square error. The accuracy is enhanced since Euclidean distance is used for determining the closest distance between the points present within the dataset. The dataset is fetched from the online source UCI. The accuracy is obtained by subtracting the actual value from the obtained value. The least error rate and enhanced accuracy proves the worth of the study. The result is compared against the existing literature involving ARIMA without KNN and Euclidean distance.
In future, Genetic algorithm can be merged with the ARIMA for further enhancement of accuracy and reducing error rate.