中文版 | English
Title

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Author
DOI
Publication Years
2022
Conference Name
Interspeech Conference
ISSN
2308-457X
EISSN
1990-9772
Source Title
Volume
2022-September
Pages
5463-5467
Conference Date
SEP 18-22, 2022
Conference Place
null,Incheon,SOUTH KOREA
Publication Place
C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE
Publisher
Abstract
Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.
Keywords
SUSTech Authorship
Others
Language
English
URL[Source Record]
Indexed By
WOS Research Area
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science ; Engineering
WOS Subject
Acoustics ; Audiology & Speech-Language Pathology ; Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS Accession No
WOS:000900724505130
Scopus EID
2-s2.0-85140047138
Data Source
Scopus
Citation statistics
Cited Times [WOS]:0
Document TypeConference paper
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/406918
DepartmentSouthern University of Science and Technology
Affiliation
1.National Taiwan University,Taiwan
2.Academia Sinica,
3.Microsoft Corporation,
4.Southern University of Science and Technology of China,China
Recommended Citation
GB/T 7714
Zezario,Ryandhimas E.,Fu,Szu Wei,Chen,Fei,et al. MTI-Net: A Multi-Target Speech Intelligibility Prediction Model[C]. C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE:ISCA-INT SPEECH COMMUNICATION ASSOC,2022:5463-5467.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Zezario,Ryandhimas E.]'s Articles
[Fu,Szu Wei]'s Articles
[Chen,Fei]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Zezario,Ryandhimas E.]'s Articles
[Fu,Szu Wei]'s Articles
[Chen,Fei]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zezario,Ryandhimas E.]'s Articles
[Fu,Szu Wei]'s Articles
[Chen,Fei]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.