IMPROVING THE PERFORMANCE OF A CLASSIFICATION BASED OUTLIER DETECTION SYSTEM USING DIMENSIONALITY REDUCTION TECHNIQUES

Main Article Content

kurian M J
Dr. Gladston Raj S

Abstract

The basic concept of the classification based outlier is to train a model which separate outliers from normal data. A medical cancer dataset is used for the application of classification based anomaly detection. With the comparison of C4.5 and Decision Tree classification algorithms, it is clear that K-Neighborhood algorithm is more suitable for the identification of outliers in terms of f-score, error rate and accuracy. Also the time taken for identification of outlier using KNN is less than that of C4.5 and Decision Tree. In this work, the classification performance for the identification of outlier is measured using dimensionality reduction algorithms like PCA, KPCA and LPP, and the results reveal that the influence of dimensionality reduction on the cancer dataset is very much enhanced the classification performance to a significant level.

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

Simon Hawkins, Hongxing He, Graham Williams and Rohan Baxter, “Outlier Detection Using Replicator Neural Networks, DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery Pages 170-180

Graham Williams, Rohan Baxter, Hongxing He, Simon Hawkins and Lifang Gu, “A Comparative Study of RNN for Outlier Detection in Data Miningâ€, ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining, Page 709.

Hodge, V.J. and Austin, J. (2004) A survey of outlier detection methodologies. Artificial Intelligence Review, 22 (2). pp. 85-126.

A. Faizah Shaari, B. Azuraliza Abu Bakar, C. Abdul Razak Hamdan, "On New Approach in Mining Outlier" Proceedings of the International Conference on Electrical Engineering and Informatics, Indonesia June 17-19, 2007

Yumin Chen, Duoqian Miao, Hongyun Zhang, "Neighborhood outlier detection", Expert Systems with Applications 37 (2010) 8745-8749, 2010 Elsevier

Xiaochun Wang, Xia Li Wang, D. Mitch Wilkes, “A Minimum Spanning Tree-Inspired Clustering-Based Outlier Detection Techniqueâ€, Advances in Data Mining. Applications and Theoretical Aspects, Lecture Notes in Computer Science Volume 7377, 2012, pp 209-223

Jiawei Han, Micheline Kamber and Jian Pei, "Data Mining Concepts and Techniques (Third Edition)", Morgan Kaufmann Publishers is an imprint of Elsevier, c 2012 by Elsevier Inc.

Gouda I. Salama, M.B.Abdelhalim, and Magdy Abd-elghany Zeid, Breast Cancer Diagnosis on Three Different Datasets Using Multi-Classifiers, International Journal of Computer and Information Technology (2277 - 0764), Volume 01- Issue 01, September 2012

S. Aruna et al. (2011). Knowledge based analysis of various statistical tools in detecting breast cancer.

Angeline Christobel. Y, Dr. Sivaprakasam (2011). An Empirical Comparison of Data Mining Classification Methods. International Journal of Computer Information Systems,Vol. 3, No. 2, 2011.

D.Lavanya, Dr.K.Usha Rani,..," Analysis of feature selection with classification: Breast cancer datasets",Indian Journal of Computer Science and Engineering (IJCSE),October 2011.

E.Osuna, R.Freund, and F. Girosi, "Training support vector machines: Application to face detection". Proceedings of computer vision and pattern recognition, Puerto Rico pp. 130-136.1997.

Vaibhav Narayan Chunekar, Hemant P. Ambulgekar (2009). Approach of Neural Network to Diagnose Breast Cancer on three different Data Set. 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

D. Lavanya, "Ensemble Decision Tree Classifier for Breast Cancer Data," International Journal of Information Technology Convergence and Services, vol. 2, no. 1, pp. 17-24, Feb. 2012.

B.Ster, and A.Dobnikar, "Neural networks in medical diagnosis: Comparison with other methods." Proceedings of the international conference on engineering applications of neural networks pp. 427-430. 1996.

T.Joachims, Transductive inference for text classification using support vector machines. Proceedings of international conference machine learning. Slovenia. 1999.

J.Abonyi, and F. Szeifert, "Supervised fuzzy clustering for the identification of fuzzy classifiers." Pattern Recognition Letters, vol.14(24), 2195-2207,2003.

Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Street WN, Wolberg WH, Mangasarian OL. Nuclear feature extraction for breast tumor diagnosis. Proceedings IS&T/ SPIE International Symposium on Electronic Imaging 1993; 1905:861-70.

William H. Wolberg, M.D., W. Nick Street, Ph.D., Dennis M. Heisey, Ph.D., Olvi L. Mangasarian, Ph.D. computerized breast cancer diagnosis and prognosis from fine needle aspirates, Western Surgical Association meeting in Palm Desert, California, November 14, 1994.

Chen, Y., Abraham, A., Yang, B.(2006), Feature Selection and Classification using Flexible Neural Tree. Journal of Neurocomputing 70(1-3): 305-313.

J. Han and M. Kamber,"Data Mining Concepts and Techniques", Morgan Kauffman Publishers, 2000.

Duda, R.O., Hart, P.E.: "Pattern Classification and Scene Analysis", In: Wiley-Interscience Publication, New York (1973)

Bishop, C.M.: "Neural Networks for Pattern Recognition". Oxford University Press,New York (1999).

Vapnik, V.N., The Nature of Statistical Learning Theory, 1st ed., Springer-Verlag,New York, 1995.

Ross Quinlan, (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA.

Cabena, P., Hadjinian, P., Stadler, R., Verhees, J. and Zanasi, A. (1998). Discovering Data Mining: From Concept to Implementation, Upper Saddle River, N.J., Prentice Hall.

Kurian M.J ,Dr. Gladston Raj S. “Outlier Detection in Multidimensional Cancer Data using Classification Based Appoach†International Journal of Advanced Engineering Research(IJAER) Vol. 10 ,No.79 , pp –(342 348) 2015..

Kurian M.J , Dr. Gladston Raj S. “ An Analysis on the Performance of a Classification Based Outlier Detection System using Feature Selection†International Journal of Computer Applications (IJCA) Vol.132.No.8. December 2015.