A REVIEW ON KDD CUP99 AND NSL A REVIEW ON KDD CUP99 AND NSL-KDD DATASET

: Continues use of network services for information and resource sharing makes our work easier. But sometime the extensive use network services leads many problems in the form accessibility of data. Detection of attacks or intrusions on the network is a serious issue of concern for the researchers. I System solves the purpose of detecting intrusion on the network. Huge amount of data is required to simulate the powerful Intrusion Detection System (IDS) model as well as to train and testing the model. which are most widely used by researchers to detect the intrusion in computer But the the form of attacks or intrusions which demolish not only the privacy but also the integrity and accessibility of data. Detection of attacks or intrusions on the network is a serious issue of concern for the researchers. I e of detecting intrusion on the network. Huge amount of data is required to simulate the powerful Intrusion Detection System (IDS) model as well as to train and testing the model. This paper, presents the review of datasets DARPA, KDDCup99 and NSL_KDD which are most widely used by researchers to detect the intrusion in computer network. DARPA, NSL_KDD, KDDCup99. not only the privacy but also the integrity and accessibility of data. Detection of attacks or intrusions on the network is a serious issue of concern for the researchers. Intrusion Detection e of detecting intrusion on the network. Huge amount of data is required to simulate the powerful Intrusion Detection


INTRODUCTION
Now days, the use of network is growing due to the increased use of handheld devices. So, network security is the major issue. Our network suffers from various types of attacks like viruses, Trojan horse, worms. To identify and stop these attacks, a security management system is required. Confidentiality, Integrity and availability are the major objectives of security. An Intrusion Detection System serves this purpose by automatically alert the administrator when someone trying to violets the security policies. The role of intrusion detection system is to assemble the information from the network. Then after supervising and investigating this information, it separates them into normal & malicious behaviour and brings this result to system administrator [2]. Encryption, firewalls, virtual private network etc. are the conventional approaches which were used in early days. But they were not able to protect the network completely. Thus to increase the network security, an Intrusion System is introduced. It is divided into two main categories as Signature and Anomaly based . Another name of signature based IDS is misuse based IDS. It identifies only the familiar attacks. But anomaly based IDS can identify known as well as novel attacks. To figure out the conduct of Network Intrusion Detection System, various datasets are available. To provide the security to computer network, many researchers have suggested three most widely used datasets like 98/99, KDD99 and NSL-KDD. DARPA is the first dataset for the evaluation of intrusion detection system and was attempted in MIT Lincoln Laboratory in 1998. KDD CUP99 is the subset of DARPA98 dataset. It has 41 features. NSL-KDD dataset is derived from KDDCup99 by Continues use of network services for information and resource sharing makes our work easier. But sometime the extensive use network services leads many problems in the form of attacks or intrusions which demolish not only the privacy but also the integrity and accessibility of data. Detection of attacks or intrusions on the network is a serious issue of concern for the researchers. I e of detecting intrusion on the network. Huge amount of data is required to simulate the powerful Intrusion Detection System (IDS) model as well as to train and testing the model. This paper, presents the review of datasets DARPA, KDDCup99 and NSL_KDD which are most widely used by researchers to detect the intrusion in computer network.
IDS, Dataset, DARPA, NSL_KDD, KDDCup99. Now days, the use of network is growing due to the devices. So, network security is the major issue. Our network suffers from various types of attacks like viruses, Trojan horse, worms. To identify and stop these attacks, a security management system is required. Confidentiality, Integrity and availability of data are the major objectives of security. An Intrusion Detection System serves this purpose by automatically alert the administrator when someone trying to violets the security policies. The role of intrusion detection system is to ation from the network. Then after supervising and investigating this information, it separates them into normal & malicious behaviour and brings this Encryption, firewalls, virtual private network etc. are the nal approaches which were used in early days. But they were not able to protect the network completely. Thus Intrusion Detection introduced. It is divided into two main categories d . Another name of signature based IDS is misuse based IDS. It identifies only the familiar attacks. But anomaly based IDS can identify To figure out the conduct of Network Intrusion Detection vailable. To provide the security to computer network, many researchers have suggested three most widely used datasets like DARPA DARPA is the first dataset for the evaluation of intrusion detection system and was IT Lincoln Laboratory in 1998. KDD CUP99 is the subset of DARPA98 dataset. It has 41 KDD dataset is derived from KDDCup99 by removing the redundant and duplicate records from training and testing datasets respectively so it is the revised ver of the original KDDCup99 dataset. Each dataset has its advantages and shortcomings. It is very challenging to select a suitable dataset itself. Due to the increased use of network, the behaviour and the pattern changes and dependency on a particular dataset is not trust there is need to update the dataset periodically.
In this paper the review of DARPA, KDD Cup99 and NSL_KDD datasets are made using various attributes. Continues use of network services for information and resource sharing makes our work easier. But sometime the extensive use of of attacks or intrusions which demolish not only the privacy but also the integrity and accessibility of data. Detection of attacks or intrusions on the network is a serious issue of concern for the researchers. Intrusion Detection e of detecting intrusion on the network. Huge amount of data is required to simulate the powerful Intrusion Detection This paper, presents the review of datasets DARPA, KDDCup99 and NSL_KDD removing the redundant and duplicate records from training and testing datasets respectively so it is the revised version of the original KDDCup99 dataset. Each dataset has its advantages and shortcomings. It is very challenging to select a suitable dataset itself. Due to the increased use of network, the behaviour and the pattern changes and ataset is not trust-worthy. So, periodically. In this paper the review of DARPA, KDD Cup99 and NSL_KDD datasets are made using various attributes.

KDDCup99 dataset (Knowledge Discovery in
KDDCup99 is the mostly and commonly used dataset for the identification of intrusion in computer network. Simulation of US Air Force LAN was done in order to get the subset of DARPA 1998 dataset by inducing different types of attacks. A nine week of TCP dump data was used for this purpose at MIT Lincoln Laboratory. KDDCup dataset contains about 4,900,000 single instances which are described by 41 features [1]. They are classified as either Host: This class consist of features which last for more than two seconds. All the attack are divided into one of the following categories [3]: Denial of Service Attack (DoS): In this attack our system resources knowingly occupied by some unnecessary or unwanted processes in order to make the server too busy to handle the other important requests which result in rejection of legitimate request. User to Root Attack (U2R): In this type of attack, the intruder tries to gain the access of an authorized user account in the system and exploit some vulnerability to gain super user privilege. Remote to Local Attack (R2L): A person who doesn't have an account on a machine but yet sending packets to the same machine on a network to personify the legal user for gaining the local access to the machine. Probing Attack: These attacks gather the network activity information for the supposed objective to bypass its security controls.

Advantages of KDDCup99:
KDDCup99 dataset has some improvement over DARPA 1998 dataset:  Conversion the network traffic from TCP dump file into relational structure is not required.  The dataset contains direct and derived features which are readily available.  The memory and processing power is less required.  To optimize accuracy and detection rate over KDD99 dataset many machine learning algorithms are used.
Most frequent algorithms used with KDD99 are decision tree derivatives and support vector machines.
Problems of KDD99 Data Set:  The synthesized data is not matching to real traffic of network data.  Training and test sets are too large which make it very complex.  Detection accuracy is very low.  Cannot be detect dropped packets. Advantages of NSL-KDD:  Removal of redundant records helps the classifier to produce unbiased result.  Since not even a single record found identical in proposed test set; therefore, learner's performance is not biased by the methods having better detection rates on the frequent records.  Detection rate is high as compared to KDD Cup.  The record counts in the train and test sets are reduced.
Therefore selection of a chunk of data is not required randomly and all the experiments can be done on the entire set. As well as it gives consistent result of different research work.
There are total 21 different types of attacks which are present in training dataset. While test dataset contain 16 additional attacks. Major attacks are categorised as Probe, DoS, U2R and R2L [7].

CONCLUSION
In order to develop new tools and in the research area of IDS KDD Cup99 is most known dataset for the protection of computer network against malicious activities. This dataset also have many limitations like redundant and duplication of records, imbalance between normal traffic and number of attacks and many more listed above. The solution of this is NSL-KDD which has removed unnecessary and same records in both training and test sets. Continuous use of computer network and information system has become the vital source for large number of attacks. Now a days , in all over the world, many researcher are developing new datasets by taking the help from KDD Cup, NSL-KDD and DARPA datasets depending upon the issues in problem solving and purpose of IDS.