Title | Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features |
Author | |
Publication Years | 2022
|
DOI | |
Source Title | |
ISSN | 2329-9304
|
EISSN | 2329-9304
|
Volume | PPIssue:99Pages:1-17 |
Abstract | This study proposes a cross-domain multi-objective speech assessment model, called MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and distortion assessment scores of an input speech signal. MOSA-Net comprises a convolutional neural network and bidirectional long short-term memory architecture for representation extraction, and a multiplicative attention layer and a fully connected layer for each assessment metric prediction. Additionally, cross-domain features (spectral and time-domain features) and latent representations from self-supervised learned (SSL) models are used as inputs to combine rich acoustic information to obtain more accurate assessments. Experimental results show that in both seen and unseen noise environments, MOSA-Net can improve the linear correlation coefficient (LCC) scores in perceptual evaluation of speech quality (PESQ) prediction, compared to Quality-Net, an existing single-task model for PESQ prediction, and improve LCC scores in short-time objective intelligibility (STOI) prediction, compared to STOI-Net, an existing single-task model for STOI prediction. Moreover, MOSA-Net can be used as a pre-trained model to be effectively adapted to an assessment model for predicting subjective quality and intelligibility scores with a limited amount of training data. Experimental results show that MOSA-Net can improve LCC scores in mean opinion score (MOS) predictions, compared to MOS-SSL, a strong single-task model for MOS prediction. We further adopt the latent representations of MOSA-Net to guide the speech enhancement (SE) process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach. Experimental results show that QIA-SE outperforms the baseline SE system with improved PESQ scores in both seen and unseen noise environments over a baseline SE model. |
Keywords | |
URL | [Source Record] |
Indexed By | |
Language | English
|
SUSTech Authorship | Others
|
Funding Project | Shenzhen Sustainable Support Program for High-level University[20200925154002001]
; National Science and Technology Council[111-2221-E-001-016-MY3]
; Academia Sinica[AS-GC-111-M01]
|
WOS Research Area | Acoustics
; Engineering
|
WOS Subject | Acoustics
; Engineering, Electrical & Electronic
|
WOS Accession No | WOS:000923960000005
|
Publisher | |
Data Source | IEEE
|
PDF url | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9905733 |
Citation statistics |
Cited Times [WOS]:0
|
Document Type | Journal Article |
Identifier | http://kc.sustech.edu.cn/handle/2SGJ60CL/406132 |
Department | Department of Electrical and Electronic Engineering |
Affiliation | 1.Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan 2.Microsoft, Vancouver, BC, Canada 3.Department of Electrical and Electronic Engineering, Southern University of Science and Technology of China, Shenzhen, China 4.Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 5.Institute of Information Science, Academia Sinica, Taipei, Taiwan |
Recommended Citation GB/T 7714 |
Ryandhimas E. Zezario,Szu-Wei Fu,Fei Chen,et al. Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing,2022,PP(99):1-17.
|
APA |
Ryandhimas E. Zezario,Szu-Wei Fu,Fei Chen,Chiou-Shann Fuh,Hsin-Min Wang,&Yu Tsao.(2022).Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.IEEE/ACM Transactions on Audio, Speech, and Language Processing,PP(99),1-17.
|
MLA |
Ryandhimas E. Zezario,et al."Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features".IEEE/ACM Transactions on Audio, Speech, and Language Processing PP.99(2022):1-17.
|
Files in This Item: | There are no files associated with this item. |
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment