Detection and Recognition of Hindi Text from Natural Scenes and its Transliteration to English

Main Article Content

Preetpal Kaur Buttar


India is a country with many cultures and if you travel from one place to another, you might find yourself in totally different culture. This also means the languages change from place to place in India and it gets very difficult to read signboards, shop names and even many other common things written in local languages. This can create problems for not only the travelers travelling from other countries but also the people who move withing the country from different regions. But most of the signboards, shop names or other landmarks mostly use English or Hindi in most of the regions. Here we propose a complete text detection & recognition as well as transliteration system that will help travelers read text written in Hindi on any signboards or shops and then transliterate that detected text into English. The proposed system is capable of detecting text written in Hindi language in natural environment using Progressive Scale Expansion algorithm and then transliterating the detected text into English language. Our proposed system can detect text in tough scenarios, and it can even detect curved text from natural images. Our system after detecting text region, extracts the text from the detected area using PyTesseract OCR engine and then the extracted text is further transliterated into English text with the help of seq2seq MultiRNN LSTM model which gives us accurate transliterations without losing the actual pronunciation of the original Hindi words. We use a synthetic dataset for Hindi Text images containing approx. 100000 for Text Detection and FIRE2013 dataset for transliteration. The overall system is evaluated using BLEU score.


Download data is not yet available.

Article Details



B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,†Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 2963–2970, 2010, doi: 10.1109/CVPR.2010.5540041.

S. Bhargava and E. Yablonovitch, “Lowering HAMR near-field transducer temperature via inverse electromagnetic design,†IEEE Trans. Magn., vol. 51, no. 4, 2015, doi: 10.1109/TMAG.2014.2355215.

S. Karim, A. A. Laghari, A. Halepoto, A. Manzoor, N. Hussain Phulpoto, and A. Ali, “Vehicle detection in Satellite Imagery using Maximally Stable Extremal Regions,†IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 18, no. 4, pp. 75–78, 2018.

I. Ahmad and G. A. Fink, “Handwritten Arabic text recognition using multi-stage sub-core-shape HMMs,†Int. J. Doc. Anal. Recognit., vol. 22, no. 3, pp. 329–349, 2019, doi: 10.1007/s10032-019-00339-8.

X. Zhou et al., “East: An efficient and accurate scene text detector,†arXiv, pp. 5551–5560, 2017.

J. Wang and X. Hu, “Gated Recurrent Convolution Neural Network for OCR,†no. Nips, 2017.

P. Shivakumara, D. Tang, M. Asadzadehkaljahi, T. Lu, U. Pal, and M. H. Anisi, “CNN-RNN based method for license plate recognition,†CAAI Trans. Intell. Technol., vol. 3, no. 3, pp. 169–175, 2018, doi: 10.1049/trit.2018.1015.

L. Giridhar, A. Dharani, and V. Guruviah, “A novel approach to OCR using image recognition based classification for ancient tamil inscriptions in temples,†arXiv, pp. 1–8, 2019.

S. Prajapati, S. R. Joshi, A. Maharjan, and B. Balami, “Evaluating Performance of Nepali Script OCR using Tesseract and Artificial Neural Network,†Proc. 2018 IEEE 3rd Int. Conf. Comput. Commun. Secur. ICCCS 2018, pp. 104–107, 2018, doi: 10.1109/CCCS.2018.8586808.

A. S., J. Yankey, and E. O., “An Automatic Number Plate Recognition System using OpenCV and Tesseract OCR Engine,†Int. J. Comput. Appl., vol. 180, no. 43, pp. 1–5, 2018, doi: 10.5120/ijca2018917150.

P. Duygulu, K. Barnard, J. F. G. de Freitas, and D. A. Forsyth, “Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2353, pp. 97–112, 2002, doi: 10.1007/3-540-47979-1_7.

T. Deselaers, S. Hasan, O. Bender, and H. Ney, “A deep learning approach to machine transliteration,†no. March, p. 233, 2009, doi: 10.3115/1626431.1626476.

M. Alam and S. ul Hussain, “Sequence to sequence networks for roman-Urdu to Urdu transliteration,†arXiv, pp. 1–7, 2017.

Y. Wu et al., “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,†arXiv e-prints, p. arXiv:1609.08144, 2016, [Online]. Available:

T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan, “Recognizing text with perspective distortion in natural scenes,†Proc. IEEE Int. Conf. Comput. Vis., pp. 569–576, 2013, doi: 10.1109/ICCV.2013.76.

P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,†IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1532–1545, 2014, doi: 10.1109/TPAMI.2014.2300479.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,†2016, doi: 10.1109/CVPR.2016.90.

M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “TextBoxes: A fast text detector with a single deep neural network,†31st AAAI Conf. Artif. Intell. AAAI 2017, pp. 4161–4167, 2017.

Y. Zhu and J. Du, “Sliding line point regression for shape robust scene text detection,†arXiv, pp. 3735–3740, 2018.

S. R. Laskar, A. Dutta, P. Pakray, and S. Bandyopadhyay, “Neural machine translation: English to hindi,†2019 IEEE Conf. Inf. Commun. Technol. CICT 2019, pp. 25–30, 2019, doi: 10.1109/CICT48419.2019.9066238.

W. Wang et al., “Shape robust text detection with progressive scale expansion network,†Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, no. c, pp. 9328–9337, 2019, doi: 10.1109/CVPR.2019.00956.

T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,†Proc. - 2019 IEEE Intl Conf Parallel Distrib. Process. with Appl. Big Data Cloud Comput. Sustain. Comput. Commun. Soc. Comput. Networking, ISPA/BDCloud/SustainCom/SocialCom 2019, pp. 1500–1504, Dec. 2016, doi: 10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00217.

F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully convolutional neural networks for volumetric medical image segmentation,†Proc. - 2016 4th Int. Conf. 3D Vision, 3DV 2016, pp. 565–571, 2016, doi: 10.1109/3DV.2016.79.

A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,†Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 761–769, 2016, doi: 10.1109/CVPR.2016.89.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,†Proc. IEEE, vol. 86, no. 11, pp. 2278–2323, 1998, doi: 10.1109/5.726791.

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,†32nd Int. Conf. Mach. Learn. ICML 2015, vol. 1, pp. 448–456, 2015.

X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,†J. Mach. Learn. Res., vol. 15, pp. 315–323, 2011.

B. Leibe, J. Matas, N. Sebe, and M. Welling, “Preface,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9906 LNCS, pp. VII–IX, 2016, doi: 10.1007/978-3-319-46493-0.

Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image database,†pp. 248–255, 2009, doi: 10.1109/cvprw.2009.5206848.

K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,†in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034, doi: 10.1109/ICCV.2015.123.

A. Khan and A. Sarfaraz, “RNN-LSTM-GRU based language transformation,†Soft Comput., vol. 23, no. 24, pp. 13007–13024, 2019, doi: 10.1007/s00500-019-04281-z.