中文版 | English
Title

Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation

Author
DOI
Publication Years
2022
Conference Name
30th European Signal Processing Conference (EUSIPCO)
ISSN
2219-5491
ISBN
978-1-6654-6799-5
Source Title
Volume
2022-August
Pages
155-159
Conference Date
29 Aug.-2 Sept. 2022
Conference Place
Belgrade, Serbia
Publication Place
345 E 47TH ST, NEW YORK, NY 10017 USA
Publisher
Abstract
Although deep learning-based algorithms have achieved great success in single-channel and multi-channel speech separation tasks, limited studies have focused on the binaural output and the preservation of spatial cues. Existing methods indirectly preserve spatial cues by enhancing signal-to-noise ratios (SNRs), and the accuracy of spatial cue preservation remains unsatisfactory. A framework has been proposed before to directly restore the spatial cues of the separated speech by applying relative transfer function (RTF) estimation and correction after speech separation. To further improve this framework, a new RTF estimator based on recurrent neural network is proposed in this study, which directly estimates the RTF from the separated speech and the noisy mixture. The upgraded framework was evaluated with spatialized WSJ0-2mix dataset with diffused noise. Experimental results showed that the interaural time difference and interaural level difference errors of the separated speech were significantly reduced after RTF correction, and its SNR was not sacrificed. The new RTF estimator further improved the performance of the system, with about 5 times smaller model than the previous one. As the proposed framework does not rely on any specific type of model structure, it could be incorporated with both multi-channel and single-channel speech separation models.
Keywords
SUSTech Authorship
First
Language
English
URL[Source Record]
Indexed By
Funding Project
Guangdong Provincial Key Laboratory of Robotics and Intelligent Systems[ZDSYS20200810171800001];
WOS Research Area
Acoustics ; Computer Science ; Engineering ; Imaging Science & Photographic Technology ; Telecommunications
WOS Subject
Acoustics ; Computer Science, Software Engineering ; Engineering, Electrical & Electronic ; Imaging Science & Photographic Technology ; Telecommunications
WOS Accession No
WOS:000918827600032
Scopus EID
2-s2.0-85141011446
Data Source
Scopus
PDF urlhttps://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9909636
Citation statistics
Cited Times [WOS]:0
Document TypeConference paper
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/411946
DepartmentSouthern University of Science and Technology
Affiliation
1.Shenzhen Key Laboratory of Robotics Perception and Intelligence,Southern University of Science and Technology,Shenzhen,China
2.Research Center for Information Technology,Innovation Academia Sinica,Taipei,Taiwan
First Author AffilicationSouthern University of Science and Technology
First Author's First AffilicationSouthern University of Science and Technology
Recommended Citation
GB/T 7714
Feng,Zicheng,Tsao,Yu,Chen,Fei. Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2022:155-159.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Feng,Zicheng]'s Articles
[Tsao,Yu]'s Articles
[Chen,Fei]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Feng,Zicheng]'s Articles
[Tsao,Yu]'s Articles
[Chen,Fei]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Feng,Zicheng]'s Articles
[Tsao,Yu]'s Articles
[Chen,Fei]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.