Start Over

MMNet: A Model-based Multimodal Network for Human Action Recognition in RGB-D Videos

Authors :: Bruce X B, Yu
Yan, Liu
Xiang, Zhang
Sheng-Hua, Zhong
Keith C C, Chan
Source :: IEEE Transactions on Pattern Analysis and Machine Intelligence. :1-1
Publication Year :: 2022
Publisher :: Institute of Electrical and Electronics Engineers (IEEE), 2022.
Abstract: Human action recognition (HAR) in RGB-D videos has been widely investigated since the release of affordable depth sensors. Currently, unimodal approaches (e.g., skeleton-based and RGB video-based) have realized substantial improvements with increasingly larger datasets. However, multimodal methods specifically with model-level fusion have seldom been investigated. In this paper, we propose a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities via a model-based approach. The objective of our method is to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. For the model-based fusion scheme, we use a spatiotemporal graph convolution network for the skeleton modality to learn attention weights that will be transferred to the network of the RGB modality. Extensive experiments are conducted on five benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, PKU-MMD, Northwestern-UCLA Multiview, and Toyota Smarthome. Upon aggregating the results of multiple modalities, our method is found to outperform state-of-the-art approaches on six evaluation protocols of the five datasets; thus, the proposed MMNet can effectively capture mutually complementary features in different RGB-D video modalities and provide more discriminative features for HAR. We also tested our MMNet on an RGB video dataset Kinetics 400 that contains more outdoor actions, which shows consistent results with those of RGB-D video datasets.

Subjects :: Computational Theory and Mathematics
Artificial Intelligence
Applied Mathematics
Computer Vision and Pattern Recognition
Software

Details

ISSN :: 19393539 and 01628828
Database :: OpenAIRE
Journal :: IEEE Transactions on Pattern Analysis and Machine Intelligence
Accession number :: edsair.doi.dedup.....7f97be16dc310888a30060fb924eaedc
Full Text :: https://doi.org/10.1109/tpami.2022.3177813

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

MMNet: A Model-based Multimodal Network for Human Action Recognition in RGB-D Videos

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

MMNet: A Model-based Multimodal Network for Human Action Recognition in RGB-D Videos

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources