A Rule-Based Stemmer for Punjabi Adjectives

Preetpal Kaur Buttar, Harmanjeet Kaur


This research work is concerned with the development of a rule-based stemmer for stemming of adjectives in the Punjabi language. Stemming is a method of deriving the root word from the inflected word. The proposed Punjabi Adjective Stemmer (PAS) uses a rule-based approach for converting the inflected Punjabi adjectives to their root forms. A database containing valid root adjectives occurring in the Punjabi language has been created. This database stores 1,762 Punjabi root adjectives. When an adjective word is fed to PAS as an input, first it compares the input word with the root database to determine whether the input adjective is a root adjective or an inflected one. If the input adjective is a root adjective, then no stemming is required and the input adjective is returned as the output. Otherwise, the inflected input adjective is sent to the suffix-stripping algorithm to get the corresponding root adjective. The suffix-stripping algorithm uses a set of predefined rules. India is a linguistically rich country with 22 languages recognized officially. But the computational resources developed for these languages are very scarce. Most of the stemmers developed for Punjabi language so far concentrated on nouns and proper names. PAS is the only stemmer developed so far for specifically addressing the problem of stemming of Punjabi adjectives. PAS has an overall accuracy of  88.76%.


Stemming, Stemmer, Suffix-Stripping, Adjectives, Punjabi

