REVIEW ON STEMMING TECHNIQUES

Prabhjot Kaur; Preetpal Kaur Buttar

doi:10.26483/ijarcs.v9i5.6308

PDF

Published: Oct 20, 2018

DOI: https://doi.org/10.26483/ijarcs.v9i5.6308

Keywords:

Stemming, Stemming techniques, Survey

Prabhjot Kaur

Preetpal Kaur Buttar

Abstract

Stemming is a method of deriving root word from the inflected word. The stemming process is often called conflation and is done by stemmers or stemming algorithms. The stemming algorithm is the process that reduces all the words of the same basis in a common form. The algorithm is basic building block for the stemmer. The development of stemmer is based on language and requires specific language knowledge and spell checking for that language. This paper, presents an overview of different stemming techniques and algorithms which have been used by the researchers for stemming in different languages.

Downloads

Download data is not yet available.

Issue

Vol. 9 No. 5 (2018): September-October 2018

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

References

P. Rana, â€œStemming of Punjabi Words By Using Brute Force Technique,â€ Int. J. Eng. Sci., vol. 3, no. 2, pp. 1351â€“1358, 2011.

V. Gupta and G. S. Lehal, â€œPunjabi language stemmer for nouns and proper names,â€ Proc. 2nd Work. South Southeast Asian Nat. Lang. Process. (WSSANLP), IJCNLP 2011, pp. 35â€“39, 2011.

J. B. Lovins, â€œDevelopment of a stemming algorithm,â€ Mech. Transl. Comput. Linguist., vol. 11, no. June, pp. 22â€“31, 1968.

M. F. Porter, â€œAn algorithm for suffix stripping,â€ Program, vol. 14, no. 3. pp. 130â€“137, 1980.

D. Kumar and P. Rana, â€œDesign and Development of a Stemmer for Punjabi,â€ Int. J. Comput. Appl., vol. 11, no. 12, pp. 18â€“23, 2010.

Jasmeet Singh and V. Gupta, â€œText Stemming: Approaches, Applications, and Challenges,â€ ACM Comput. Surv. Vol. 49, No. 3, Article 45 pp. 1-46, 2016.

J. Patel, P. Desai, and U. Bhagat, â€œA survey of different stemming algorithm,â€ Int. J. Adv. Eng. Res. Dev., vol. 2, no. 6, pp. 1083â€“1088, 2015.

TomÂ´aË‡s BrychcÂ´Ä±n and Miloslav KonopÂ´Ä±k, â€œHigh precision stemmer,â€ Inf. Process. Manag. 51, 1, pp. 68â€“91, 2015.

Robert Krovetz, â€œViewing morphology as an inference process,â€ In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191â€“202, 1993.

JiaulH. Paik, Mandar Mitra, Swapan K. Parui, and Kalervo Jarvelin, â€œAn effective and efficient stemming algorithm for information retrieval,â€ ACM Trans. Inf. Syst. 29, 2011.

Jiaul H. Paik, Dipasree Pal, and Swapan K. Parui, â€œA novel corpus-based stemming algorithm using co-occurrence statistics,â€ In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIRâ€™11). ACM, New York, NY, pp. 863â€“872, 2011.

Jiaul H. Paik, Swapan K. Parui, Dipasree Pal, and Stephen E. Robertson, â€œEffective and robust querybased stemming,â€ ACM Trans. Inf. Syst. 31, pp. 2013.

Prasenjit Majumder, Mandar Mitra, Swapan K. Parui, Gobinda Kole, Pabitra Mitra, and Kalyankumar Datta, â€œYet another suffix stripper,â€ ACM Trans. Inf. Syst. 25, 2007.

JiaulH. Paik and Swapan K. Parui, â€œA Fast corpus-based stemmer,â€ ACMTrans. Asian Lang. Inf. Process. 10, 2011.

David Weiss, â€œA hybrid stemmer for the Polish language,â€ Institute of Computing Science: Poznan University of Technology Research Report. 2005

Manish Shrivastava, Bibhuti Mohapatra, Pushpak Bhattacharyya, Nitin Agarwal, and Smriti Singh, â€œMorphology based natural language processing tools for indian languages,â€ In Proceedings of the 4th Annual Inter Research Student Seminar in Computer Science, 2005.

Giorgos Adam, Konstantinos Asimakis, Christos Bouras, and Vassilis Poulopoulos, â€œAn efficient mechanism for stemming and tagging: the case of Greek language,â€ In Proceedings of the 14th International, 2010.

Pratikkumar Patel, Kashyap Popat, and Pushpak Bhattacharyya, â€œHybrid stemmer for Gujarati,â€ In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 51, 2010.

Upendra Mishra and Chandra Prakash, â€œMAULIK: An effective stemmer for Hindi languageâ€ Int. J. Comput. Sci. Eng. 4, pp. 711â€“717, 2012.

C. D. Paice, â€œAn Evaluation Method for Stemming Algorithmsâ€, Proceedings of 17th annual international ACM SIGIR conference on Research and development in

information retrieval, pp. 42-50, 1994.

X. Jinxi and C. Bruce W., â€œCorpus-based Stemming Using Co-occurrence of Word Variantsâ€, ACM Transactions on Information Systems, Volume 16, Issue 1, pp. 61-81, 1998.

J. Mayfield and P. McNamee, â€œSingle N-gram stemmingâ€, Proceedings of the 26th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415-416, 2003.

M. Jenkins and D. Smith, â€œConservative Stemming for Search and Indexingâ€, In Proceedings of SIGIRâ€™05, 2005.

M. Massimo and O. Nicola. â€œA Novel Method for Stemmer Generation based on Hidden Markov Modelsâ€, Proceedings of the twelfth international conference on Information and knowledge management, pp. 131-138, 2003.

F. Peng, N. Ahmed, X. Li and Y. Lu, â€œContext Sensitive Stemming for Web Searchâ€, Proceedings of the 30th annual international ACM SIGIR Conference on Research

and Development in Information Retrieval, pp. 639-646.

A. Ramanathan and D. D. Rao, â€œA Lightweight Stemmer for Hindiâ€, Workshop on Computational Linguistics for South-Asian Languages, EACL, 2003.

S. Dasgupta and V. Ng, â€œUnsupervised Morphological Parsing of Bengaliâ€, Language Resources and Evaluation, 40(3-4):311-330, 2006.

Khan. 2007. â€œA light weight stemmer for Bengali and its Use in spelling Checker,â€ Proc. 1st Intl. Conf. on Digital Comm. and Computer Applications (DCCA07), Irbid, Jordan, March 19-23.

Juhi Ameta, Nisheeth Joshi and Iti Mathur, 2011, â€œA Lightweight Stemmer for Gujarati,â€ 46th Annual National Convention of Computer Society of India. Organized by Computer Society of India Gujarat Chapter. Sponsored by Computer Society of India and Department of Science and Technology, Govt. of Gujarat and IEEE Gujarat Section.

Vijay Sundar et.al, â€œMorphological Analyzer for Classical Tamil Texts,â€ Workshop on Computational Linguistics for South-Asian Languages, 2012.

K. Suba, D. Jiandani and P. Bhattacharyya, â€œHybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujaratiâ€, In proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, Chiang Mai, Thailand, pp.1-8, 2011

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)