中文版 | English
Title

Multi-View Self-Attention Based Transformer for Speaker Recognition

Author
DOI
Publication Years
2022
Conference Name
47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ISSN
1520-6149
ISBN
978-1-6654-0541-6
Source Title
Volume
2022-May
Pages
6732-6736
Conference Date
23-27 May 2022
Conference Place
Singapore, Singapore
Publication Place
345 E 47TH ST, NEW YORK, NY 10017 USA
Publisher
Abstract
Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional self-attention mechanisms are originally designed for modeling textual sequence without considering the characteristics of speech and speaker modeling. Besides, different Transformer variants for speaker recognition have not been well studied. In this work, we propose a novel multi-view self-attention mechanism and present an empirical study of different Transformer variants with or without the proposed attention mechanism for speaker recognition. Specifically, to balance the capabilities of capturing global dependencies and modeling the locality, we propose a multi-view self-attention mechanism for speaker Transformer, in which different attention heads can attend to different ranges of the receptive field. Furthermore, we introduce and compare five Transformer variants with different network architectures, embedding locations, and pooling methods to learn speaker embeddings. Experimental results on the VoxCeleb1 and VoxCeleb2 datasets show that the proposed multi-view self-attention mechanism achieves improvement in the performance of speaker recognition, and the proposed speaker Transformer network attains excellent results compared with state-of-the-art models.
Keywords
SUSTech Authorship
Others
Language
English
URL[Source Record]
Indexed By
Funding Project
National Nature Science Foundation of China["61976160","62076182","61906137"] ; Technology research plan project of Ministry of Public and Security[2020JSYJD01] ; Shanghai Science and Technology Plan Project[21DZ1204800]
WOS Research Area
Acoustics ; Computer Science ; Engineering
WOS Subject
Acoustics ; Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS Accession No
WOS:000864187907007
EI Accession Number
20222312199281
Data Source
IEEE
PDF urlhttps://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9746639
Citation statistics
Cited Times [WOS]:1
Document TypeConference paper
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/347982
DepartmentDepartment of Computer Science and Engineering
Affiliation
1.Tongji University,Department of Computer Science and Technology
2.Southern University of Science and Technology,Department of Computer Science and Engineering
3.Microsoft Research Asia
4.The Hong Kong Polytechnic University,Department of Computing
Recommended Citation
GB/T 7714
Rui Wang,Junyi Ao,Long Zhou,et al. Multi-View Self-Attention Based Transformer for Speaker Recognition[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2022:6732-6736.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Rui Wang]'s Articles
[Junyi Ao]'s Articles
[Long Zhou]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Rui Wang]'s Articles
[Junyi Ao]'s Articles
[Long Zhou]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Rui Wang]'s Articles
[Junyi Ao]'s Articles
[Long Zhou]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.