中文版 | English
Title

Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features

Author
Publication Years
2022
DOI
Source Title
ISSN
2329-9304
EISSN
2329-9304
VolumePPIssue:99Pages:1-17
Abstract
This study proposes a cross-domain multi-objective speech assessment model, called MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and distortion assessment scores of an input speech signal. MOSA-Net comprises a convolutional neural network and bidirectional long short-term memory architecture for representation extraction, and a multiplicative attention layer and a fully connected layer for each assessment metric prediction. Additionally, cross-domain features (spectral and time-domain features) and latent representations from self-supervised learned (SSL) models are used as inputs to combine rich acoustic information to obtain more accurate assessments. Experimental results show that in both seen and unseen noise environments, MOSA-Net can improve the linear correlation coefficient (LCC) scores in perceptual evaluation of speech quality (PESQ) prediction, compared to Quality-Net, an existing single-task model for PESQ prediction, and improve LCC scores in short-time objective intelligibility (STOI) prediction, compared to STOI-Net, an existing single-task model for STOI prediction. Moreover, MOSA-Net can be used as a pre-trained model to be effectively adapted to an assessment model for predicting subjective quality and intelligibility scores with a limited amount of training data. Experimental results show that MOSA-Net can improve LCC scores in mean opinion score (MOS) predictions, compared to MOS-SSL, a strong single-task model for MOS prediction. We further adopt the latent representations of MOSA-Net to guide the speech enhancement (SE) process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach. Experimental results show that QIA-SE outperforms the baseline SE system with improved PESQ scores in both seen and unseen noise environments over a baseline SE model.
Keywords
URL[Source Record]
Indexed By
Language
English
SUSTech Authorship
Others
Funding Project
Shenzhen Sustainable Support Program for High-level University[20200925154002001] ; National Science and Technology Council[111-2221-E-001-016-MY3] ; Academia Sinica[AS-GC-111-M01]
WOS Research Area
Acoustics ; Engineering
WOS Subject
Acoustics ; Engineering, Electrical & Electronic
WOS Accession No
WOS:000923960000005
Publisher
Data Source
IEEE
PDF urlhttps://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9905733
Citation statistics
Cited Times [WOS]:0
Document TypeJournal Article
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/406132
DepartmentDepartment of Electrical and Electronic Engineering
Affiliation
1.Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
2.Microsoft, Vancouver, BC, Canada
3.Department of Electrical and Electronic Engineering, Southern University of Science and Technology of China, Shenzhen, China
4.Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
5.Institute of Information Science, Academia Sinica, Taipei, Taiwan
Recommended Citation
GB/T 7714
Ryandhimas E. Zezario,Szu-Wei Fu,Fei Chen,et al. Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing,2022,PP(99):1-17.
APA
Ryandhimas E. Zezario,Szu-Wei Fu,Fei Chen,Chiou-Shann Fuh,Hsin-Min Wang,&Yu Tsao.(2022).Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features.IEEE/ACM Transactions on Audio, Speech, and Language Processing,PP(99),1-17.
MLA
Ryandhimas E. Zezario,et al."Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features".IEEE/ACM Transactions on Audio, Speech, and Language Processing PP.99(2022):1-17.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Ryandhimas E. Zezario]'s Articles
[Szu-Wei Fu]'s Articles
[Fei Chen]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Ryandhimas E. Zezario]'s Articles
[Szu-Wei Fu]'s Articles
[Fei Chen]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Ryandhimas E. Zezario]'s Articles
[Szu-Wei Fu]'s Articles
[Fei Chen]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.