EMPIRICAL EVALUATION OF MACHINE LEARNING ALGORITHMS FOR AUTOMATIC DOCUMENT CLASSIFICATION

Chakravarthy T, Kumaravelan G, P.V. Arivoli

Abstract


Automatic document classification process is the important area of research in the field of Text Mining(TM). Text mining is the process of discovering the interesting pattern or knowledge from huge amount of data. The document classification process used in many domains. Here, to take the classification process is apply SMS spam classification. The bench marked dataset is used and the same data set is processed in various ML algorithms of Naïve Bayes, Support Vector Machine, Decision Tree and Logistic Regression model. In this paper evaluates the results of various machine learning algorithms for automatic document classification in SMS spam classification.

Keywords


Text Mining, Machine Learning, Document Classification and Information Retrieval.

Full Text:

PDF

References


A.Kousar Nikhath, K.Subrahmanyam, R.Vasavi, "Building a K-Nearest Neighbor Classifier for Text Categorization", International Journal of Computer Science and Information Technologies, Vol. 7 No.1, pp. 254-256, 2016.

Andrew McCallum and Kamal Nigam, “A Comparison of Event Models for Naïve Bayes Text Classification”, AAAI-98 workshop on learning for text categorization, Vol. 752, 1998.

Arivoli. P.V., Chakravarthy. T, “Document Classificaiton Using Machine Learning Algorithms – A Review”, International Journal of Scientific Engineering and Research, Vol 5, Issue 2, pp 48 -55, February 2017.

Bang, S. L., Yang, J. D., and Yang, H. J. , “Hierarchical document categorization with k-NN and concept-based thesauri, Elsevier, Information Processing and Management”, Vol. 42 No.2, pp. 397–406, 2006.

Duoqian Miao , Qiguo Duan, Hongyun Zhang and Na Jiao, “Rough set based hybrid algorithm for text classification”, Elsevier, Expert Systems with Applications, Vol. 36, Issue 5, pp. 9168–9174, July 2009.

El Kourdi, M., Bensaid, A., & Rachidi, T. E. , “Automatic Arabic document categorization based on the Naïve Bayes algorithm” In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Association for Computational Linguistics, pp. 51-58, August 2004.

Ethem Alpaydin, “Introduction to Machine Learning (Adaptive Computation and Machine Learning)”, The MIT Press, 2004.

Eui-Hong (Sam) Han, George Karypis and Vipin Kumar, “Text Categorization Using Weighted Adjusted k-Nearest Neighbor Classification”, Pacific-asia conference on knowledge discovery and datamining. Springer, Berlin, Heidelberg, pp.53-65,2001.

Genkin, A., Lewis, D. D., & Madigan, D. “Large-scale Bayesian logistic regression for text categorization. Technometrics”, American Statistical Association and the American Society for Quality TECHNOMETRICS, Vol. 49, No. 3,pp. 291-304, 2007. DOI:10.1198/004017007000000245.

Hwee-Tou Ng, Wei-Boon Goh and Kok-Leong Low , “Feature Selection, Perceptron Learning, and a Usability Case Study for Text Categorization, In Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp.67-73. 1997.

Ismail Hmeidi, Mahmoud Al-Ayyoub, Nawaf A. Abdulla, Abdalrahman A. Almodawar, Raddad Abooraig, Nizar A. Mahyoub, "Automatic Arabic text categorization: A comprehensive comparative study", Journal of Information Science,

Kim. J, Lee. B, Shaw. M, Chang. H and Nelson. W, “Application of Decision -Tree Induction Techniques to Personalized Advertisements on Internet Storefronts”, International Journal of Electronic Commerce Vol .5 No.3, pp.45-62, 2001.

Moromi Gogoi and Shikhar Kumar Sarma, "Document Classification of Assamese Text Using Naïve Bayes Approach", International Journal of Computer Trends and Technology (IJCTT), Vol. 30, No. 4, December 2015.

Russell Greiner and Jonathan Schaffer, “AIxploratorium – Decision Trees”, Department of Computing Science, University of Alberta, Edmonton, ABT6G2H1, Canada.2001. URL :http://www.cs.ualberta.ca/ ~aixplore/ learning/ DecisionTrees

S.G. Lade and Nikhil Vyawahare, "Document Classification Using KNN on GPU", International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Vol. 3 Issue 8, August 2014.

Said Bahassine, Abdellah Madani, Mohamed Kissi, "Arabic Text Classification Using New Stemmer For Feature Selection And Decision Trees", Journal of Engineering Science and Technology,Vol. 12, No. 6, pp. 1475-1487, 2017.

Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim, and Sung Hyon Myaeng, "Some Effective Techniques for Naïve Bayes Text Classification", IEEE Transactions On Knowledge And Data Engineering, Vol. 18, No. 11, November 2006.

Saurav Sahay, “Support Vector Machines and Document Classification”,URL:http://www.static.cc.gatech.edu/~ss ahay/sauravsahay7001-2.pdf . 2011.

Thorsten Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features” ECML -98, 10th European Conference on Machine Learning, pp. 137-142, 1998.

Vishwanath Bijalwan, Pinki Kumari, Jordan Pascual and Vijay Bhaskar Semwal, "Machine learning approach for text and document mining", https://arxiv.org/ftp/arxiv/papers/1406/1406.1580.pdf, 2014.

Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari and Jordan Pascual, "KNN based Machine Learning Approach for Text and Document Mining", International Journal of Database Theory and Application, Vol.7, No.1, pp.61-70, 2014.

Vladimir N. Vapnik, “The Nature of Statistical Learning Theory” , Springer science & business media, 2013.

Yiming Yang and Xin Liu, "A re-examination of text categorization methods", In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 42-49, 2009. doi>10.1145/312624.312647.




DOI: https://doi.org/10.26483/ijarcs.v8i8.4699

Refbacks

  • There are currently no refbacks.




Copyright (c) 2017 International Journal of Advanced Research in Computer Science