REVIEW ON STEMMING TECHNIQUES
Main Article Content
Abstract
Stemming is a method of deriving root word from the inflected word. The stemming process is often called conflation and is done by stemmers or stemming algorithms. The stemming algorithm is the process that reduces all the words of the same basis in a common form. The algorithm is basic building block for the stemmer. The development of stemmer is based on language and requires specific language knowledge and spell checking for that language. This paper, presents an overview of different stemming techniques and algorithms which have been used by the researchers for stemming in different languages.
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.
References
P. Rana, “Stemming of Punjabi Words By Using Brute Force Technique,†Int. J. Eng. Sci., vol. 3, no. 2, pp. 1351–1358, 2011.
V. Gupta and G. S. Lehal, “Punjabi language stemmer for nouns and proper names,†Proc. 2nd Work. South Southeast Asian Nat. Lang. Process. (WSSANLP), IJCNLP 2011, pp. 35–39, 2011.
J. B. Lovins, “Development of a stemming algorithm,†Mech. Transl. Comput. Linguist., vol. 11, no. June, pp. 22–31, 1968.
M. F. Porter, “An algorithm for suffix stripping,†Program, vol. 14, no. 3. pp. 130–137, 1980.
D. Kumar and P. Rana, “Design and Development of a Stemmer for Punjabi,†Int. J. Comput. Appl., vol. 11, no. 12, pp. 18–23, 2010.
Jasmeet Singh and V. Gupta, “Text Stemming: Approaches, Applications, and Challenges,†ACM Comput. Surv. Vol. 49, No. 3, Article 45 pp. 1-46, 2016.
J. Patel, P. Desai, and U. Bhagat, “A survey of different stemming algorithm,†Int. J. Adv. Eng. Res. Dev., vol. 2, no. 6, pp. 1083–1088, 2015.
Tom´aˇs Brychc´ın and Miloslav Konop´ık, “High precision stemmer,†Inf. Process. Manag. 51, 1, pp. 68–91, 2015.
Robert Krovetz, “Viewing morphology as an inference process,†In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202, 1993.
JiaulH. Paik, Mandar Mitra, Swapan K. Parui, and Kalervo Jarvelin, “An effective and efficient stemming algorithm for information retrieval,†ACM Trans. Inf. Syst. 29, 2011.
Jiaul H. Paik, Dipasree Pal, and Swapan K. Parui, “A novel corpus-based stemming algorithm using co-occurrence statistics,†In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, NY, pp. 863–872, 2011.
Jiaul H. Paik, Swapan K. Parui, Dipasree Pal, and Stephen E. Robertson, “Effective and robust querybased stemming,†ACM Trans. Inf. Syst. 31, pp. 2013.
Prasenjit Majumder, Mandar Mitra, Swapan K. Parui, Gobinda Kole, Pabitra Mitra, and Kalyankumar Datta, “Yet another suffix stripper,†ACM Trans. Inf. Syst. 25, 2007.
JiaulH. Paik and Swapan K. Parui, “A Fast corpus-based stemmer,†ACMTrans. Asian Lang. Inf. Process. 10, 2011.
David Weiss, “A hybrid stemmer for the Polish language,†Institute of Computing Science: Poznan University of Technology Research Report. 2005
Manish Shrivastava, Bibhuti Mohapatra, Pushpak Bhattacharyya, Nitin Agarwal, and Smriti Singh, “Morphology based natural language processing tools for indian languages,†In Proceedings of the 4th Annual Inter Research Student Seminar in Computer Science, 2005.
Giorgos Adam, Konstantinos Asimakis, Christos Bouras, and Vassilis Poulopoulos, “An efficient mechanism for stemming and tagging: the case of Greek language,†In Proceedings of the 14th International, 2010.
Pratikkumar Patel, Kashyap Popat, and Pushpak Bhattacharyya, “Hybrid stemmer for Gujarati,†In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 51, 2010.
Upendra Mishra and Chandra Prakash, “MAULIK: An effective stemmer for Hindi language†Int. J. Comput. Sci. Eng. 4, pp. 711–717, 2012.
C. D. Paice, “An Evaluation Method for Stemming Algorithmsâ€, Proceedings of 17th annual international ACM SIGIR conference on Research and development in
information retrieval, pp. 42-50, 1994.
X. Jinxi and C. Bruce W., “Corpus-based Stemming Using Co-occurrence of Word Variantsâ€, ACM Transactions on Information Systems, Volume 16, Issue 1, pp. 61-81, 1998.
J. Mayfield and P. McNamee, “Single N-gram stemmingâ€, Proceedings of the 26th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415-416, 2003.
M. Jenkins and D. Smith, “Conservative Stemming for Search and Indexingâ€, In Proceedings of SIGIR’05, 2005.
M. Massimo and O. Nicola. “A Novel Method for Stemmer Generation based on Hidden Markov Modelsâ€, Proceedings of the twelfth international conference on Information and knowledge management, pp. 131-138, 2003.
F. Peng, N. Ahmed, X. Li and Y. Lu, “Context Sensitive Stemming for Web Searchâ€, Proceedings of the 30th annual international ACM SIGIR Conference on Research
and Development in Information Retrieval, pp. 639-646.
A. Ramanathan and D. D. Rao, “A Lightweight Stemmer for Hindiâ€, Workshop on Computational Linguistics for South-Asian Languages, EACL, 2003.
S. Dasgupta and V. Ng, “Unsupervised Morphological Parsing of Bengaliâ€, Language Resources and Evaluation, 40(3-4):311-330, 2006.
Khan. 2007. “A light weight stemmer for Bengali and its Use in spelling Checker,†Proc. 1st Intl. Conf. on Digital Comm. and Computer Applications (DCCA07), Irbid, Jordan, March 19-23.
Juhi Ameta, Nisheeth Joshi and Iti Mathur, 2011, “A Lightweight Stemmer for Gujarati,†46th Annual National Convention of Computer Society of India. Organized by Computer Society of India Gujarat Chapter. Sponsored by Computer Society of India and Department of Science and Technology, Govt. of Gujarat and IEEE Gujarat Section.
Vijay Sundar et.al, “Morphological Analyzer for Classical Tamil Texts,†Workshop on Computational Linguistics for South-Asian Languages, 2012.
K. Suba, D. Jiandani and P. Bhattacharyya, “Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujaratiâ€, In proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, Chiang Mai, Thailand, pp.1-8, 2011