中文版 | English
Title

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

Author
Corresponding AuthorKo,Tom
DOI
Publication Years
2022
Conference Name
Interspeech Conference
ISSN
2308-457X
EISSN
1990-9772
Source Title
Volume
2022-September
Pages
501-505
Conference Date
SEP 18-22, 2022
Conference Place
null,Incheon,SOUTH KOREA
Publication Place
C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE
Publisher
Abstract
In human speech, the attitude of a speaker cannot be fully expressed only by the textual content. It has to come along with the intonation. Declarative questions are commonly used in daily Cantonese conversations, and they are usually uttered with rising intonation. Vanilla neural text-to-speech (TTS) systems are not capable of synthesizing rising intonation for these sentences due to the loss of semantic information. Though it has become more common to complement the systems with extra language models, their performance in modeling rising intonation is not well studied. In this paper, we propose to complement the Cantonese TTS model with a BERT-based statement/question classifier. We design different training strategies and compare their performance. We conduct our experiments on a Cantonese corpus named CanTTS. Empirical results show that the separate training approach obtains the best generalization performance and feasibility.
Keywords
SUSTech Authorship
First
Language
English
URL[Source Record]
Indexed By
WOS Research Area
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science ; Engineering
WOS Subject
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS Accession No
WOS:000900724500102
Scopus EID
2-s2.0-85140073777
Data Source
Scopus
Citation statistics
Cited Times [WOS]:0
Document TypeConference paper
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/406915
DepartmentDepartment of Computer Science and Engineering
Affiliation
1.Department of Computer Science and Engineering,Southern University of Science and Technology,China
2.ByteDance AI Lab,
First Author AffilicationDepartment of Computer Science and Engineering
First Author's First AffilicationDepartment of Computer Science and Engineering
Recommended Citation
GB/T 7714
Bai,Qibing,Ko,Tom,Zhang,Yu. A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis[C]. C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE:ISCA-INT SPEECH COMMUNICATION ASSOC,2022:501-505.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Bai,Qibing]'s Articles
[Ko,Tom]'s Articles
[Zhang,Yu]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Bai,Qibing]'s Articles
[Ko,Tom]'s Articles
[Zhang,Yu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Bai,Qibing]'s Articles
[Ko,Tom]'s Articles
[Zhang,Yu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.