A Rule-Based Stemmer for Punjabi Adjectives

Main Article Content

Preetpal Kaur Buttar
Harmanjeet Kaur

Abstract

This research work is concerned with the development of a rule-based stemmer for stemming of adjectives in the Punjabi language. Stemming is a method of deriving the root word from the inflected word. The proposed Punjabi Adjective Stemmer (PAS) uses a rule-based approach for converting the inflected Punjabi adjectives to their root forms. A database containing valid root adjectives occurring in the Punjabi language has been created. This database stores 1,762 Punjabi root adjectives. When an adjective word is fed to PAS as an input, first it compares the input word with the root database to determine whether the input adjective is a root adjective or an inflected one. If the input adjective is a root adjective, then no stemming is required and the input adjective is returned as the output. Otherwise, the inflected input adjective is sent to the suffix-stripping algorithm to get the corresponding root adjective. The suffix-stripping algorithm uses a set of predefined rules. India is a linguistically rich country with 22 languages recognized officially. But the computational resources developed for these languages are very scarce. Most of the stemmers developed for Punjabi language so far concentrated on nouns and proper names. PAS is the only stemmer developed so far for specifically addressing the problem of stemming of Punjabi adjectives. PAS has an overall accuracy of  88.76%.

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

A. S. Nayak, A. P. Kanive, Ananthu, N. Chandavekar, Naveen and B. Ramasamy, Survey on pre-processing techniques for text mining, IJECS 6 (2016) 2319-7242.

A. Sharma, R. Kumar and V. Mansotra, Proposed stemming algorithm for Hindi information retrieval, IJIRCCE 4 (2016).

D. Kumar and P. Rana, Stemming of Punjabi words by using brute force technique, International Journal of Engineering Science and Technology 3 (2011) 1351-1358.

N. Saharia, U. Sharma and J. Kalita, Analysis and evaluation of stemming algorithms: a case study with Assamese, in Proc. ICACCI’12, 2012, pp. 3-5.

B. Dalwadi and S. Suthar, Overview of stemming algorithms for Indian and non-Indian languages, International Journal of Computer Science and Information Technologies 5 (2014) 1144-1146.

A. Jivani, A comparative study of stemming algorithms, Int. J. Comp. Tech. Appl. 2 (2011) 1930-1938.

P. Pandey, D. Amin and S. Govilkar, Rule based stemmer of Marathi wordnet for Marathi language, IJARCCE 5 (2016).

I. Slawik, N. Jan and A. Waibel, Stripping adjectives: integration techniques for selective stemming in smt systems, European Association for Machine Translation 2015.

V. Gupta, Automatic stemming of words for Punjabi language, Advances in Intelligent Systems and Computing 264 (2014) 73-84.

R. Puri, R. P. S. Bedi and V. Goyal, Punjabi stemmer using Punjabi wordnet database, Indian Journal of Science and Technology, 8 (2015) 1-5.

A. Ramanathan and D. D. Rao, A lightweight stemmer for Hindi, in Proc. Workshop on Computational Linguistics for South-Asian Languages, 2003.

H. Singh, A study of research papers on Punjabi stemming with special reference to brute force approach, International Journal of Computer Sciences and Engineering 7 (2019) 164-167.

D. Kumar and P. Rana, Design and development of a stemmer for Punjabi, International Journal of Computer Applications 11 (2010) 18 – 23.

C. Dhawan, J. Singh and K. Garg, Hybrid approach for stemming in Punjabi, International Journal of Computer Science & Communication Networks 3 101-104.

V. Gupta, N. Joshi and I. Mathur, Rule based stemmer in Urdu, in Proc. International Conference on Computer and Communication Technology, 2013.

S. Paul, M. Tandon, N. Joshi and I. Mathur, Design of a rule based Hindi lemmatizer, Computer Science & Information Technology, 3 (2013) 67-74.

V. Gupta and G. S. Lehal, A survey of common stemming techniques and existing stemmers for Indian languages, Journal of Emerging Technologies in Web Intelligence 5 (2013).

S. Dasgupta and V. Ng, Unsupervised Morphological Parsing of Bengali, Language Resources and Evaluation, 40 (2006) 311-330.

J. Goldsmith, Unsupervised learning for morphology of natural languages, Computational Linguistics 27 (2001) 153-198.

J. Singh and V. Gupta, A systematic review of text stemming techniques, 48 (2016) 157–217.

P. K. Patel, P. Kashyap and P. Bhattacharyya, Hybrid stemmer for Gujarati, in Proc. 23rd International Conference on Computational Linguistics (COLING), 2010.