AN EFFECTIVE APPROACH TOWARDS PARALLELIZATION OF NETWORK TRAFFIC ANOMALY DETECTION SYSTEM

: Network traffic data is huge in volume and needs to be processed in real time to detect Intrusions. By utilizing the power of latest Hardware with multi-core processors and GPGPU computing, there is a scope for processing the huge volume of network traffic data in near real-time. This study is intended for examining the potential of Network Anomaly Detection Algorithm (NADA) presented by the authors [9] for parallelization. NADA was parallelized using parallel toolbox functions in Matlab. Other classification algorithms such as Naive Bayes, SVM and Decision trees were also implemented using the pre-defined functions in Matlab and the time taken for execution of these algorithms were compared with NADA for various sizes of data. This study uses the new version of Kyoto University’s Intrusion Detection/Evaluation benchmark dataset for experimentation. The parallel performance measures such as time taken, speedup and efficiency are encouraging.


INTRODUCTION
Intrusion Detection is the process of analyzing the stream of network traffic for possible intrusions. IDS can be classified into two type based on the detection techniques namely Misuse or signature based and Anomaly based. Signature based systems compare the network packets with the existing malicious signature for any possible intrusions/attacks. Anomaly detection algorithms attempt to profile normal behaviour of the network [6]. In the present digital age and with the huge volume of data floating around, the information security has become utmost importance. The data growth rate and the higher bandwidth & network speed makes it very difficult to process the data in real-time. With the advent of latest processor architectures (with many cores) and GPGPU computing, efficient Intrusion Detection has become a reality if the data is processed in parallel. This study does not attempt to compare the performance metrics such as Detection Rate, Accuracy, False Alarm Rate and F-Score etc. of Intrusion detection classifier. This study intents to compare the execution time for various classifiers and the parallel performance of NADA since NADA outperforms all the other classifiers in terms of serial execution time.
In this paper, the following contributions are made: • The execution time taken for some of the classifications algorithms were compared with different data sizes.
• The NADA algorithm proposed by Ashok Kumar et al is parallelized.
• The performance metrics such as time taken, speedup and efficiency are presented.
The rest of the paper is structured as follows. In chapter 2, the literature on various classification algorithms for intrusion detection are presented and parallel program performance metrics are discussed. In chapter 3, Experimental setup for this study and the dataset used, data set generation for this study are discussed. Chapter 4 discusses the experimental results. Conclusions of the study are given in Chapter 6.

BACKGROUND
The speeds of networks have increased than the speed of processors and the centralized IDS have not scaled [5]. Anomaly detection is one of the important area of research in information security and different algorithms were proposed by various researchers for Anomaly Detection.

A. PERFORMANCE METRICS
There are several performance metrics associated with parallel programs. These metrics are used to determine best parallel algorithm, Evaluate Hardware platforms and examine the benefits of Parallelism. To evaluate a parallel program, the basic measure required is serial execution time. i.e. time taken by a program if it is executed serially on one processor. Like serial computing in parallel computing also the time and memory are important performance measures. The important goals of parallel programs are Performance and Scalability and the main factors limiting the performance are architectural limitation and algorithmic limitations [4]. Performance is the measure of the capacity to reduce the time to solve the problem when the computing resources are increased. In parallel computing Amdhal's Law is used to predict the theoretical maximum speedup for a program using multiple processors. Speedup and Efficiency are important measures for any parallel programs.
Speedup: It is the ratio between time taken by a serial execution and the parallel execution and is given by the below formula.

Where T(1) is the execution time with one processor T(p) is the execution time with p processors
Efficiency: It is the measure of usage of computational resources. It is the ratio performance and resources used to achieve performance and are given below.
Where Efficiency= E(p) = S(p)/p --> 4 [4] S(p) is the speedup for p processor A. NADA Ashok Kumar et al [9] proposed Network Anomaly Detection Algorithm and claims that their algorithm out performs the popular classification algorithms such as Support Vector Machines, Naïve Bayes, ONE-R and Logistic Regression in terms of Detection Rate, False Alarm Rate and F-Score. But the authors have not measured execution time and compared it with other schemes. The numbers of test records were small in number which make it difficult to measure the execution time. The NADA algorithm is given the following Fig. 1. In this study the execution times is computed for various classification schemes and are compared with NADA. , and additional 10 features for effective investigation. The version of dataset has one more new feature which is 'protocol type'. Out of 15 features 3 features are categorical (flag, service & protocol type) and rest 12 features are numerical in nature. The added feature 'protocol type' in the new version is not used and only 14 conventional features are used here. According the authors of [7 and 8], probability function for categorical data and mean-range normalization for numerical data yields better results in terms of detection rate and time to build model for intrusion detection classifiers. In this study, categorical data was normalized using the following probability function (Equation 1) and the numerical data was normalised using the mean range normalization technique (Equation 2).

DATASET AND EXPERIMENTAL SETUP
The list of features which is used in this study is given below  duration: length (number of seconds) of the connection  service: network service on the destination, e.g., http, telnet, etc.
 src_bytes: number of data bytes from source to destination  dst_bytes: number of data bytes from destination to source  count: number of connections to the same host as the current connection in the past two seconds  same_srv_rate: % of connections in the count feature to the same service  serror_rate: % of connections in the count feature that have ``SYN'' errors  srv_serror_rate: % of connections whose service type is the same to that of the current connection in the past two seconds that have "SYN" errors  dst_host_count: among the past 100 connections whose destination IP address is the same to that of the current connection, the number of connections whose source IP address is also the same to that of the current connection  dst_host_srv_count: the number of connections in the dst_host_count feature whose service type is also the same to that of the current connection  dst_host_same_src_port_rate: % of connections in the dst_host_count feature whose source port is the same to that of the current connection Similarly 50000 records were selected in random for testing.
The experiments were carried out on a system with Intel Xeon E5 2650 2 Ghz processor with two processors of each having 8 cores of PEs and 32GB memory running Window 7 Professional 64-bit Operating System. The test dataset was processed in parallel with 2, 4, 8 and 16 cores respectively. Test dataset has only 50000 records and is too small for measuring the parallel performance. The dataset was replicated using 'repmat' function in Matlab and five test cases were generated with 1 Million, 2 Million, 5 Million, 10 Million and 20 Million records.

EXPERIMENT AND RESULTS
The NADA algorithm was implemented in Matlab V R2015a.
Similarly other classification algorithms such as Naïve Bayes, Support Vector Machine (SVM) and Decision Tree were implemented in Matlab using the built-in functions. Experiments were carried out on each test dataset for the above mentioned classification algorithms and the results are given in Table 1.    Table and Figure it is very clear that the time taken by 2 cores/workers is almost 1.5 times higher than the time taken by a single core/worker. Similarly the time taken by 1 core and 4 cores are almost equal and there is slight improvement in 4 core configuration. Clear performance improvements are seen from 8 core onwards. The delay in 2 cores and 4 cores can be attributed to the inter processor/process communication time.
Similarly the other performance measures such as Speedup and Efficiency are calculated from the time taken as given in Table 2. Table 3 lists the Speed up of the NADA Algorithm. From the above table, it can be observed that the speedup increases with the large dataset and number of cores. The efficiency of NADA is given in Table 4. The scalability of the algorithm needs to be checked for more cores and processors. The parallel measures of Parallel NADA are encouraging.

CONCLUSIONS AND FUTUE WORK
The Network Anomaly Detection Algorithm was implemented and parallelized using Matlab parallel toolbox functions. The popular classification algorithms such as Naïve Bayes, SVM and Decision Trees were also implemented in Matlab and the time taken by these methods were compared with NADA. NADA outperforms all the above algorithms with regard to time taken for execution. The parallel performance measures are calculated and discussed in the earlier section. NADA algorithm is a potential candidate for GPGPU parallelization.

2)
The third international knowledge discovery and data mining tools competition dataset KDD99-