kurian M J, Dr. Gladston Raj S


The basic concept of the classification based outlier is to train a model which separate outliers from normal data. A medical cancer dataset is used for the application of classification based anomaly detection. With the comparison of C4.5 and Decision Tree classification algorithms, it is clear that K-Neighborhood algorithm is more suitable for the identification of outliers in terms of f-score, error rate and accuracy. Also the time taken for identification of outlier using KNN is less than that of C4.5 and Decision Tree. In this work, the classification performance for the identification of outlier is measured using dimensionality reduction algorithms like PCA, KPCA and LPP, and the results reveal that the influence of dimensionality reduction on the cancer dataset is very much enhanced the classification performance to a significant level.


Outlier detection, Classification, accuracy, KNN, C4.5, Decision tree

