SEGMENTATION-FREE RECOGNITION OF URDU SCRIPT USING HMM

Prabjot Singh, Kuljeet Singh, Jyoti Mahajan

Abstract


All the Urdu literature is in the form of manuscripts and typewritten books.There is a need for converting all these physical libraries into electronic libraries. Various OCRs have been developed for different languages and are widely used. Building a complete Urdu OCR is a difficult task because Urdu is highly cursive language, where ligatures overlap and style variation poses challenges to the recognition system.
We are describing a technique for automatic recognition of off-line printed Urdu text using Hidden Markov Models. Our method does not require segmentation into characters and considers each shape of Urdu character as different class resulting in a total of 196 classes (compared to 38 Urdu letters). This paper presents a novel feature extraction method based on sliding window technique, using only 16 statistical features from each sliding window thereby eliminating the need for segmentation of Urdu text. The dependency of Recognition rate of Urdu script upon, the number of states of HMM, different sizes of hierarchical window and different fonts is presented. We are using HTK (Hidden Markov Model Toolkit) for training, recognition and result analysis.

Keywords


Naskh, OCR, HMM, HTK

Full Text:

PDF


DOI: https://doi.org/10.26483/ijarcs.v9i1.5500

Refbacks

  • There are currently no refbacks.




Copyright (c) 2018 International Journal of Advanced Research in Computer Science