Title | Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation |
Author | |
DOI | |
Publication Years | 2022
|
Conference Name | 30th European Signal Processing Conference (EUSIPCO)
|
ISSN | 2219-5491
|
ISBN | 978-1-6654-6799-5
|
Source Title | |
Volume | 2022-August
|
Pages | 155-159
|
Conference Date | 29 Aug.-2 Sept. 2022
|
Conference Place | Belgrade, Serbia
|
Publication Place | 345 E 47TH ST, NEW YORK, NY 10017 USA
|
Publisher | |
Abstract | Although deep learning-based algorithms have achieved great success in single-channel and multi-channel speech separation tasks, limited studies have focused on the binaural output and the preservation of spatial cues. Existing methods indirectly preserve spatial cues by enhancing signal-to-noise ratios (SNRs), and the accuracy of spatial cue preservation remains unsatisfactory. A framework has been proposed before to directly restore the spatial cues of the separated speech by applying relative transfer function (RTF) estimation and correction after speech separation. To further improve this framework, a new RTF estimator based on recurrent neural network is proposed in this study, which directly estimates the RTF from the separated speech and the noisy mixture. The upgraded framework was evaluated with spatialized WSJ0-2mix dataset with diffused noise. Experimental results showed that the interaural time difference and interaural level difference errors of the separated speech were significantly reduced after RTF correction, and its SNR was not sacrificed. The new RTF estimator further improved the performance of the system, with about 5 times smaller model than the previous one. As the proposed framework does not rely on any specific type of model structure, it could be incorporated with both multi-channel and single-channel speech separation models. |
Keywords | |
SUSTech Authorship | First
|
Language | English
|
URL | [Source Record] |
Indexed By | |
Funding Project | Guangdong Provincial Key Laboratory of Robotics and Intelligent Systems[ZDSYS20200810171800001];
|
WOS Research Area | Acoustics
; Computer Science
; Engineering
; Imaging Science & Photographic Technology
; Telecommunications
|
WOS Subject | Acoustics
; Computer Science, Software Engineering
; Engineering, Electrical & Electronic
; Imaging Science & Photographic Technology
; Telecommunications
|
WOS Accession No | WOS:000918827600032
|
Scopus EID | 2-s2.0-85141011446
|
Data Source | Scopus
|
PDF url | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9909636 |
Citation statistics |
Cited Times [WOS]:0
|
Document Type | Conference paper |
Identifier | http://kc.sustech.edu.cn/handle/2SGJ60CL/411946 |
Department | Southern University of Science and Technology |
Affiliation | 1.Shenzhen Key Laboratory of Robotics Perception and Intelligence,Southern University of Science and Technology,Shenzhen,China 2.Research Center for Information Technology,Innovation Academia Sinica,Taipei,Taiwan |
First Author Affilication | Southern University of Science and Technology |
First Author's First Affilication | Southern University of Science and Technology |
Recommended Citation GB/T 7714 |
Feng,Zicheng,Tsao,Yu,Chen,Fei. Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2022:155-159.
|
Files in This Item: | There are no files associated with this item. |
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment