Back to Search Start Over

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT

Authors :
Dai, Dongyang
Wu, Zhiyong
Kang, Shiyin
Wu, Xixin
Jia, Jia
Su, Dan
Yu, Dong
Meng, Helen
Source :
Proc. Interspeech 2019, pp. 2090-2094
Publication Year :
2025

Abstract

Grapheme-to-phoneme (G2P) conversion serves as an essential component in Chinese Mandarin text-to-speech (TTS) system, where polyphone disambiguation is the core issue. In this paper, we propose an end-to-end framework to predict the pronunciation of a polyphonic character, which accepts sentence containing polyphonic character as input in the form of Chinese character sequence without the necessity of any preprocessing. The proposed method consists of a pre-trained bidirectional encoder representations from Transformers (BERT) model and a neural network (NN) based classifier. The pre-trained BERT model extracts semantic features from a raw Chinese character sequence and the NN based classifier predicts the polyphonic character's pronunciation according to BERT output. In out experiments, we implemented three classifiers, a fully-connected network based classifier, a long short-term memory (LSTM) network based classifier and a Transformer block based classifier. The experimental results compared with the baseline approach based on LSTM demonstrate that, the pre-trained model extracts effective semantic features, which greatly enhances the performance of polyphone disambiguation. In addition, we also explored the impact of contextual information on polyphone disambiguation.<br />Comment: Accepted at INTERSPEECH 2019

Details

Database :
arXiv
Journal :
Proc. Interspeech 2019, pp. 2090-2094
Publication Type :
Report
Accession number :
edsarx.2501.01102
Document Type :
Working Paper