N-GRAMS SOLUTION FOR ERROR DETECTION AND CORRECTION IN HINDI LANGUAGE

Main Article Content

Shailza Kanwar
Manoj Kumar Sachan
Gurpreet Singh

Abstract

Hindi is the National language of India, which is still in its early stage of research and development regarding natural language processing applications in comparison to other languages like English, Chinese. Natural language processing is a field of Artificial Intelligence, which includes major tasks such as information retrieval, word segmentation, speech recognition, parsing, part of speech tagging, text classification, automatic text summarization etc. Spelling detection and correction in Hindi language is an important task of NLP which has not gotten sufficient attention till date. Spelling detection and correction for Indian languages such as Hindi is considered as a difficult task. Hindi Language is very different from English language in its phonetic properties and grammatical rules. Thus the existing techniques and methods that are being used to check the errors in English language can’t be used for Hindi Language. There are mainly two types of error: Non word error and real word error. Error detection for non-word error in Hindi language has been done but for real word error no work has been done till date. This paper focused on Real word spelling error detection and correction in Hindi text by using N Grams Model and Levensthein edit distance algorithm.

Downloads

Download data is not yet available.

Article Details

Section
Articles
Author Biographies

Shailza Kanwar, SLIET Longowal, Sangrur

M.Tech Scholar, CSE

Manoj Kumar Sachan, SLIET Longowal, Sangrur

Associate Professor, CSE

Gurpreet Singh, Sant Longowal Institute of Engineering and Technology, Sangrur, Central University

Research Scholar, CSE department

References

Sachan, M.K., Lehal, G.S., Jain, V.K. (2011) 'A Novel Method to Segment Online Gurmukhi Script', Proceedings of International Conference on Information Systems for Indian Languages, ICISIL 2011, Patiala, Communications in Computer and Information Science, Springer-Verlag Berlin Heidelberg, Germany,Vol. 139, pp. 1-8.

. Sachan, M.K., Lehal, G.S., Jain, V.K. (2011), 'A System for Online Gurmukhi Script Recognition', Proceedings of International Conference on Information Systems for Indian Languages, ICISIL 2011, Patiala, Communications in Computer and Information Science, Springer-Verlag Berlin Heidelberg, Germany, Vol. 139, pp. 294-295.

. F. J. Damerau. A technique for computer detection and correction of spelling errors, communication of ACM, 7(3), pages 171-176, 1964.

. Jain, A., & Jain, M. (2014, September). Detection and correction of non word spelling errors in Hindi language. In Data Mining and Intelligent Computing (ICDMIC), 2014 International Conference on (pp. 1-5). IEEE.

. Fossati, D., & Di Eugenio, B. (2007). A mixed trigrams approach for context sensitive spell checking. In Computational Linguistics and Intelligent Text Processing (pp. 623-633). Springer Berlin Heidelberg

. L. A. Wilcox-O‟Hearn, G. Hirst, and A. Budanitsky. Real-word spelling correction with trigrams: A reconsideration of the mays, damerau, and mercer model. In Proceedings of CICLing-2008 (LNCS 4919, Springer-Verlag), pages 605–616, Haifa, February 2008.

. Youssef Bassil, Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset, International Journal of Research and Reviews in Computer Science (IJRRCS), ISSN: 2079-2557, Vol. 3, No. 1, February 2012.

. Eranga Jayalatharachchl, Asanaka Wasala, Ruvan Weersinghe, Data-Driven Spell Checking: The Synergy of Two Algorithms for Spelling Error Detection and Correction, The International Conference on Advances in ICT for Emerging Regions - iCTer 2012.

. Samanta, P., & Chaudhuri, B. B. (2013). A simple real-word error detection and correction using local word bigram and trigram. In ROCLING.

. Jain, A., & Jain, M. (2014, September). Detection and correction of non word spelling errors in Hindi language. In Data Mining and Intelligent Computing (ICDMIC), 2014 International Conference on (pp. 1-5). IEEE.

. Lehal, G. S. (2007). design and implementation of Punjabi spell checker. International Journal of Systemics, Cybernetics and Informatics, 70-75

. Singh, G., Sachan, M. (2014) 'Multi-layer perceptorn (MLP) neural network technique for offline handwritten gurmukhi character recognition', IEEE International conference on computational intelligence and computing research. 221-225.

Most read articles by the same author(s)

> >>