Title | Spatial-Temporal Pyramid Graph Reasoning for Action Recognition |
Author | |
Publication Years | 2022
|
DOI | |
Source Title | |
ISSN | 1941-0042
|
EISSN | 1941-0042
|
Volume | PPIssue:99Pages:1-1 |
Abstract | Spatial-temporal relation reasoning is a significant yet challenging problem for video action recognition. Previous works typically apply local operations like 2D or 3D CNNs to conduct space-time interactions in video sequences, or simply capture space-time long-range relations of a single fixed scale. However, this is inadequate for obtaining a comprehensive action representation. Besides, most models treat all input frames equally for the final classification, without selecting key frames and motion-sensitive regions. This introduces irrelevant video content and hurts the performance of models. In this paper, we propose a generic Spatial-Temporal Pyramid Graph Network (STPG-Net) to adaptively capture long-range spatial-temporal relations in video sequences at multiple scales. Specifically, we design a temporal attention (TA) module and a spatial-temporal attention (STA) module to learn the contribution of each frame and each space-time region to an action at a feature level, respectively. We then apply the selected key information to build spatial-temporal pyramid graphs for long-range relation reasoning and more comprehensive action representation learning. STPG-Net can be flexibly integrated into 2D and 3D backbone networks in a plug-and-play manner. Extensive experiments show that it brings consistent improvements over many challenging baselines on several standard action recognition benchmarks (i.e., Something-Something V1 & V2, and FineGym), demonstrating the effectiveness of our approach. |
Keywords | |
URL | [Source Record] |
Indexed By | |
Language | English
|
SUSTech Authorship | Others
|
Funding Project | National Natural Science Foundation of China["61972188","62122035"]
|
WOS Research Area | Computer Science
; Engineering
|
WOS Subject | Computer Science, Artificial Intelligence
; Engineering, Electrical & Electronic
|
WOS Accession No | WOS:000844128200001
|
Publisher | |
EI Accession Number | 20223412602333
|
EI Keywords | Graphic methods
; Three dimensional displays
; Video recording
|
ESI Classification Code | Television Systems and Equipment:716.4
; Computer Peripheral Equipment:722.2
|
ESI Research Field | ENGINEERING
|
Data Source | Web of Science
|
PDF url | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9852978 |
Citation statistics |
Cited Times [WOS]:2
|
Document Type | Journal Article |
Identifier | http://kc.sustech.edu.cn/handle/2SGJ60CL/375591 |
Department | Research Institute of Trustworthy Autonomous Systems 工学院_计算机科学与工程系 |
Affiliation | 1.School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China 2.Department of Computer Science and Engineering and the Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen, China 3.Peng Cheng Laboratory, Shenzhen, China 4.Futurewei Technologies, Seattle, WA, USA 5.Terminus Group, China |
Recommended Citation GB/T 7714 |
Tiantian Geng,Feng Zheng,Xiaorong Hou,et al. Spatial-Temporal Pyramid Graph Reasoning for Action Recognition[J]. IEEE Transactions on Image Processing,2022,PP(99):1-1.
|
APA |
Tiantian Geng,Feng Zheng,Xiaorong Hou,Ke Lu,Guo-Jun Qi,&Ling Shao.(2022).Spatial-Temporal Pyramid Graph Reasoning for Action Recognition.IEEE Transactions on Image Processing,PP(99),1-1.
|
MLA |
Tiantian Geng,et al."Spatial-Temporal Pyramid Graph Reasoning for Action Recognition".IEEE Transactions on Image Processing PP.99(2022):1-1.
|
Files in This Item: | There are no files associated with this item. |
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment