中文版 | English
Title

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

Author
DOI
Publication Years
2022
Conference Name
Interspeech Conference
ISSN
2308-457X
EISSN
1990-9772
Source Title
Volume
2022-September
Pages
1781-1785
Conference Date
SEP 18-22, 2022
Conference Place
null,Incheon,SOUTH KOREA
Publication Place
C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE
Publisher
Abstract
Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.
Keywords
SUSTech Authorship
Others
Language
English
URL[Source Record]
Indexed By
WOS Research Area
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science ; Engineering
WOS Subject
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS Accession No
WOS:000900724501193
Scopus EID
2-s2.0-85140060672
Data Source
Scopus
Citation statistics
Cited Times [WOS]:0
Document TypeConference paper
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/406916
DepartmentDepartment of Computer Science and Engineering
Affiliation
1.ByteDance AI Lab,
2.Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,China
3.Peng Cheng Laboratory,Shenzhen,China
Recommended Citation
GB/T 7714
Dong,Qianqian,Yue,Fengpeng,Ko,Tom,et al. Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation[C]. C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE:ISCA-INT SPEECH COMMUNICATION ASSOC,2022:1781-1785.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Dong,Qianqian]'s Articles
[Yue,Fengpeng]'s Articles
[Ko,Tom]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Dong,Qianqian]'s Articles
[Yue,Fengpeng]'s Articles
[Ko,Tom]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Dong,Qianqian]'s Articles
[Yue,Fengpeng]'s Articles
[Ko,Tom]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.