Back to Search Start Over

From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals

Authors :
Cai, Ruichu
Jiang, Zhifang
Li, Zijian
Chen, Weilin
Chen, Xuexin
Hao, Zhifeng
Shen, Yifan
Chen, Guangyi
Zhang, Kun
Publication Year :
2024

Abstract

Existing methods for multi-modal time series representation learning aim to disentangle the modality-shared and modality-specific latent variables. Although achieving notable performances on downstream tasks, they usually assume an orthogonal latent space. However, the modality-specific and modality-shared latent variables might be dependent on real-world scenarios. Therefore, we propose a general generation process, where the modality-shared and modality-specific latent variables are dependent, and further develop a \textbf{M}ulti-mod\textbf{A}l \textbf{TE}mporal Disentanglement (\textbf{MATE}) model. Specifically, our \textbf{MATE} model is built on a temporally variational inference architecture with the modality-shared and modality-specific prior networks for the disentanglement of latent variables. Furthermore, we establish identifiability results to show that the extracted representation is disentangled. More specifically, we first achieve the subspace identifiability for modality-shared and modality-specific latent variables by leveraging the pairing of multi-modal data. Then we establish the component-wise identifiability of modality-specific latent variables by employing sufficient changes of historical latent variables. Extensive experimental studies on multi-modal sensors, human activity recognition, and healthcare datasets show a general improvement in different downstream tasks, highlighting the effectiveness of our method in real-world scenarios.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2405.16083
Document Type :
Working Paper