Main Article Content

Đồng Văn Phạm


The motivation of this paper is to propose a set of best-quality linguistic materials for Vietnamese speech processing, which can be used for Vietnamese TTS and ASR problems. This proposed material includes: (1) a pronunciation dictionary, which adapts from X-SAMPA,  (2) a rule-based grapheme to phoneme for Vietnamese. In order to test and evaluate, we have built a Vietnamese TTS system based on the Merlin engine, using the above materials, and evaluating the quality of speech and the accuracy of pronunciation. The results show that the applicability of these materials is favorable for further research and development on Vietnamese speech processing.


Download data is not yet available.

Article Details

Author Biography

Đồng Văn Phạm, HUMG

Ha noi


P. Taylor, “Text-To-Speech Synthesis,†Camb. Univ. Press, 2009.

A.-G. Haudricourt, “La place du vietnamien dans les langues austroasiatiques,†Bull. Société Linguist. Paris, vol. 49, no. 1, pp. 122–128, 1953.

“PhÆ°Æ¡ng Ngữ Há»c Tiếng Việt (NXB Äại Há»c Quốc Gia 2009) - Hoà ng Thị Châu - 287 Trang | PDF,†Scribd. (accessed Dec. 14, 2022).

Q. C. Nguyen, “Reconnaissance de la parole en langue Vietnamienne,†PhD Thesis, Grenoble INPG, 2002.

J. C. Wells, “Computer-coding the IPA: a proposed extension of SAMPA,†Revis. Draft, vol. 4, no. 28, p. 1995, 1995.

N. T. T. Trang, C. D’ALESSANDRO, A. RILLIARD, and T. Do Dat, “HMM-based TTS for Hanoi Vietnamese: issues in design and evaluation,†in 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), 2013, pp. 2311–2315.

J. Kirby, “Kirby, James. vPhon: a Vietnamese phonetizer.†Nov. 15, 2016. Accessed: Nov. 21, 2019. [Online]. Available:

T. T. T. Nguyen, “HMM-based Vietnamese Text-To-Speech: Prosodic Phrasing Modeling, Corpus Design System Design, and Evaluation,†Paris 11, 2015. Accessed: May 27, 2017. [Online]. Available:

Z. Wu, O. Watts, and S. King, “Merlin: An Open Source Neural Network Speech Synthesis System.,†in SSW, 2016, pp. 202–207.

Z. Malisz, H. Berthelsen, J. Beskow, and J. Gustafson, “Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis.,†in INTERSPEECH, 2017, pp. 1079–1083.