ENGLISH TO HINDI TRANSLITERATION SYSTEM USING COMBINATION-BASED APPROACH

Baljeet Kaur Dhindsa, Dharam Veer Sharma

Abstract


Transliteration plays a very significant role in machine translation, which has many applications such as cross-lingual information retrieval, communication, question-answering etc. The main objective of this research paper is to provide a method for transliteration of named entities from English to Hindi language. The proposed method consists of two modules, both of which apply phoneme-based approach to transliterate named entities. For transliteration, Module-I utilizes CMU Pronouncing dictionary, which is a collection of 133270 words along with their pronunciation. If the word to be transliterated is not found in CMU Pronouncing dictionary, Module-II is used. Module-II is based on 5-gram model, in which a maximum of five letters (two left, two right and one target letter) are used to generate transliterated target letter. The system has been tested on a database of 2408 North-Indian names. Google Input tool for Windows has been used for comparative study of the proposed transliteration system. The word accuracy of the transliteration system has been found to be 70.22% against 58.73% of Google Input tool.

Keywords


Transliteration; English-to-Hindi Transliteration; Combination-based Transliteration.

Full Text:

PDF

References


S. Karimi, F. Scholer, and A. Turpin, “Machine Transliteration Survey,” ACM Computing Survey, vol. 43(3), pp. 1-46, 2011.

S. Singh, English – Hindi Translation Grammar, New Delhi, Prabhat Prakashan, 2010, pp. 69-81.

A. Kumaran, M. M. Khapra and P. Bhattacharyya, “Compositional Machine Transliteration,” ACM Journal on Transactions on Asian Language Information Processing (TALIP), vol. 9, no 4, pp. 1-29, 2010.

G. Nicolai, B. Hauer, M. Salameh, A. S. Arnaud, Y. Xu, L. Yao and G. Kondrak, “Multiple System Combination for Transliteration,” in Proceedings of the Fifth Named Entity Workshop, joint with 53rd ACL and the 7th IJCNLP Beijing, China, July 26-31, pp. 72–77, 2015.

S. Mathur and V.P. Saxena, "Hybrid Approach to English-Hindi Name Entity Transliteration," Electrical, Electronics and Computer Science (SCEECS), 2014 IEEE Students' Conference on March 1-2, 2014, pp.1-5, 2014.

A. Das, A. Ekbal, T. Mandal, and S. Bandyopadhyay, “English to Hindi Machine Transliteration System at NEWS 2009,” in Proceedings of the 2009 Named Entities Workshop, ACL-IJCNLP 2009, 2009, pp. 80–83.

R. Haque, S. Dandapat, A. K. Srivastava,, S. K. Naskar, and A. Way, “English—Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009,” in Proceedings of the 2009 Named Entities Workshop, ACL-IJCNLP 2009, 2009, pp. 104–107.

D. Bhalla, N. Joshi and I. Mathur, “Rule Based Transliteration Sscheme for English to Punjabi,” International Journal on Natural Language Computing (IJNLC), vol. 2, no.2, pp. 67-73, Apr 2013.

B. J. Kang and K. S. Choi, “Automatic Transliteration and Back Transliteration by Decision Tree Learning,” in Proceedings of Conference on Language Resources and Evaluation. Athens, Greece, pp. 1135–1411, 2000.

The CMU Pronouncing Dictionary. [Online]. Available: https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/. [Accessed: Jan 14, 2014].

Google Input Tool. [Online]. Available: https://www.google.com/inputtools/windows/. [Accessed: Feb 28, 2107].




DOI: https://doi.org/10.26483/ijarcs.v8i8.4801

Refbacks

  • There are currently no refbacks.




Copyright (c) 2017 International Journal of Advanced Research in Computer Science