Title | Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation |
Author | |
DOI | |
Publication Years | 2022
|
Conference Name | Interspeech Conference
|
ISSN | 2308-457X
|
EISSN | 1990-9772
|
Source Title | |
Volume | 2022-September
|
Pages | 1781-1785
|
Conference Date | SEP 18-22, 2022
|
Conference Place | null,Incheon,SOUTH KOREA
|
Publication Place | C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE
|
Publisher | |
Abstract | Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation. |
Keywords | |
SUSTech Authorship | Others
|
Language | English
|
URL | [Source Record] |
Indexed By | |
WOS Research Area | Acoustics
; Audiology & Speech-Language Pathology
; Computer Science
; Engineering
|
WOS Subject | Acoustics
; Audiology & Speech-Language Pathology
; Computer Science, Artificial Intelligence
; Engineering, Electrical & Electronic
|
WOS Accession No | WOS:000900724501193
|
Scopus EID | 2-s2.0-85140060672
|
Data Source | Scopus
|
Citation statistics |
Cited Times [WOS]:0
|
Document Type | Conference paper |
Identifier | http://kc.sustech.edu.cn/handle/2SGJ60CL/406916 |
Department | Department of Computer Science and Engineering |
Affiliation | 1.ByteDance AI Lab, 2.Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,China 3.Peng Cheng Laboratory,Shenzhen,China |
Recommended Citation GB/T 7714 |
Dong,Qianqian,Yue,Fengpeng,Ko,Tom,et al. Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation[C]. C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE:ISCA-INT SPEECH COMMUNICATION ASSOC,2022:1781-1785.
|
Files in This Item: | There are no files associated with this item. |
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment