Title | MTI-Net: A Multi-Target Speech Intelligibility Prediction Model |
Author | |
DOI | |
Publication Years | 2022
|
Conference Name | Interspeech Conference
|
ISSN | 2308-457X
|
EISSN | 1990-9772
|
Source Title | |
Volume | 2022-September
|
Pages | 5463-5467
|
Conference Date | SEP 18-22, 2022
|
Conference Place | null,Incheon,SOUTH KOREA
|
Publication Place | C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE
|
Publisher | |
Abstract | Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores. |
Keywords | |
SUSTech Authorship | Others
|
Language | English
|
URL | [Source Record] |
Indexed By | |
WOS Research Area | Acoustics
; Audiology & Speech-Language Pathology
; Computer Science
; Engineering
|
WOS Subject | Acoustics
; Audiology & Speech-Language Pathology
; Computer Science, Artificial Intelligence
; Engineering, Electrical & Electronic
|
WOS Accession No | WOS:000900724505130
|
Scopus EID | 2-s2.0-85140047138
|
Data Source | Scopus
|
Citation statistics |
Cited Times [WOS]:0
|
Document Type | Conference paper |
Identifier | http://kc.sustech.edu.cn/handle/2SGJ60CL/406918 |
Department | Southern University of Science and Technology |
Affiliation | 1.National Taiwan University,Taiwan 2.Academia Sinica, 3.Microsoft Corporation, 4.Southern University of Science and Technology of China,China |
Recommended Citation GB/T 7714 |
Zezario,Ryandhimas E.,Fu,Szu Wei,Chen,Fei,et al. MTI-Net: A Multi-Target Speech Intelligibility Prediction Model[C]. C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE:ISCA-INT SPEECH COMMUNICATION ASSOC,2022:5463-5467.
|
Files in This Item: | There are no files associated with this item. |
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment