Start Over

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

Authors :: Mingyue Niu
Zheng Lian
Jian Huang
Jianhua Tao
Bin Liu
Source :: IEEE Transactions on Affective Computing. 14:294-307
Publication Year :: 2023
Publisher :: Institute of Electrical and Electronics Engineers (IEEE), 2023.
Abstract: Physiological studies have shown that there are some differences in speech and facial activities between depressive and healthy individuals. Based on this fact, we propose a novel Spatio-Temporal Attention (STA) network and a Multimodal Attention Feature Fusion (MAFF) strategy to obtain the multimodal representation of depression cues for predicting the individual depression level. Specifically, we firstly divide the speech amplitude spectrum/video into fixed-length segments and input these segments into the STA network, which not only integrates the spatial and temporal information through attention mechanism, but also emphasizes the audio/video frames related to depression detection. The audio/video segment-level feature is obtained from the output of the last full connection layer of the STA network. Secondly, this paper employs the eigen evolution pooling method to summarize the changes of each dimension of the audio/video segment-level features to aggregate them into the audio/video level feature. Thirdly, the multimodal representation with modal complementary information is generated using the MAFF and inputs into the support vector regression predictor for estimating depression severity. Experimental results on the AVEC2013 and AVEC2014 depression databases illustrate the effectiveness of our method.

Subjects :: 0209 industrial biotechnology
Computer science
Speech recognition
Feature extraction
Pooling
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
02 engineering and technology
Image segmentation
Human-Computer Interaction
Support vector machine
020901 industrial engineering & automation
Modal
Dimension (vector space)
Feature (computer vision)
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Representation (mathematics)
Software

Details

ISSN :: 23719850
Volume :: 14
Database :: OpenAIRE
Journal :: IEEE Transactions on Affective Computing
Accession number :: edsair.doi...........9ee9b331d6b440a937f16a38e71bc713
Full Text :: https://doi.org/10.1109/taffc.2020.3031345

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources