9 results on '"Sijie Mai"'
Search Results
2. Multi-Fusion Residual Memory Network for Multimodal Human Sentiment Comprehension
- Author
-
Sijie Mai, Haifeng Hu, Songlong Xing, and Jia Xu
- Subjects
Focus (computing) ,Sequence ,Modalities ,Forgetting ,Dependency (UML) ,Computer science ,business.industry ,Process (engineering) ,020206 networking & telecommunications ,02 engineering and technology ,Machine learning ,computer.software_genre ,Human-Computer Interaction ,Comprehension ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,State (computer science) ,Artificial intelligence ,0305 other medical science ,business ,computer ,Software - Abstract
Multimodal human sentiment comprehension refers to recognizing human affection from multiple modalities. There exist two key issues for this problem. Firstly, it is difficult to explore time-dependent interactions between modalities and focus on the important time steps. Secondly, processing the long fused sequence of utterances is susceptible to the forgetting problem due to the long-term temporal dependency. In this paper, we introduce a hierarchical learning architecture to classify utterance-level sentiment. To address the first issue, we perform time-step level fusion to generate fused features for each time step, which explicitly models time-restricted interactions by incorporating information across modalities at the same time step. Furthermore, based on the assumption that acoustic features directly reflect emotional intensity, we pioneer emotion intensity attention to focus on the time steps where emotion changes or intense affections take place. To handle the second issue, we propose Residual Memory Network (RMN) to process the fused sequence. RMN utilizes some techniques such as directly passing the previous state into the next time step, which helps to retain the information from many time steps ago. We show that our method achieves state-of-the-art performance on multiple datasets. Results also suggest that RMN yields competitive performance on sequence modeling tasks.
- Published
- 2022
- Full Text
- View/download PDF
3. Learning to Balance the Learning Rates Between Various Modalities via Adaptive Tracking Factor
- Author
-
Sijie Mai, Haifeng Hu, and Ya Sun
- Subjects
Modality (human–computer interaction) ,Modalities ,Computer science ,business.industry ,Applied Mathematics ,Sentiment analysis ,Equalization (audio) ,Overfitting ,Machine learning ,computer.software_genre ,ComputerApplications_MISCELLANEOUS ,Signal Processing ,Convergence (routing) ,Task analysis ,Artificial intelligence ,Electrical and Electronic Engineering ,Representation (mathematics) ,business ,computer - Abstract
Multimodal networks with richer information contents should always outperform the unimodal counterparts. In our experiment, however, we observe that this is not always the case. Prior efforts on multimodal tasks mainly tend to design a uniform optimization algorithm for all modalities, and yet only obtain a sub-optimal multimodal representation with the fusion of under-optimized unimodal representations, which are still challenged by performance drop on multimodal networks caused by heterogeneity among modalities. In this work, to remove the slowdowns in performance on multimodal tasks, we decouple the learning procedures of unimodal and multimodal networks by dynamically balancing the learning rates for various modalities, so that the modality-specific optimization algorithm for each modality can be obtained. Specifically, the adaptive tracking factor (ATF) is introduced to adjust the learning rate for each modality on a real-time basis. Furthermore, adaptive convergent equalization (ACE) and bilevel directional optimization (BDO) are proposed to equalize and update the ATF, avoiding sub-optimal unimodal representations due to overfitting or underfitting. Extensive experiments on multimodal sentiment analysis demonstrate that our method achieves superior performance.
- Published
- 2021
- Full Text
- View/download PDF
4. A Unimodal Reinforced Transformer With Time Squeeze Fusion for Multimodal Sentiment Analysis
- Author
-
Jiaxuan He, Sijie Mai, and Haifeng Hu
- Subjects
Computer science ,business.industry ,Applied Mathematics ,Sentiment analysis ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Visualization ,Kernel (linear algebra) ,Discriminative model ,Multiple time dimensions ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,F1 score ,Transformer (machine learning model) - Abstract
Multimodal sentiment analysis refers to inferring sentiment from language, acoustic, and visual sequences. Previous studies focus on analyzing aligned sequences, while the unaligned sequential analysis is more practical in real-world scenarios. Due to the long-time dependency hidden in the multimodal unaligned sequence and time alignment information is not provided, exploring the time-dependent interactions within unaligned sequences is more challenging. To this end, we introduce the time squeeze fusion to automatically explore the time-dependent interactions by modeling the unimodal and multimodal sequences from the perspective of compressing the time dimension. Moreover, prior methods tend to fuse unimodal features into a multimodal embedding, based on which sentiment is inferred. However, we argue that the unimodal information may be lost or the generated multimodal embedding may be redundant. Addressing this issue, we propose a unimodal reinforced Transformer to progressively attend and distill unimodal information from the multimodal embedding, which enables the multimodal embedding to highlight the discriminative unimodal information. Extensive experiments suggest that our model reaches state-of-the-art performance in terms of accuracy and F1 score on MOSEI dataset.
- Published
- 2021
- Full Text
- View/download PDF
5. Dynamic graph dropout for subgraph-based relation prediction
- Author
-
Sijie Mai, Shuangjia Zheng, Ya Sun, Ying Zeng, Yuedong Yang, and Haifeng Hu
- Subjects
Information Systems and Management ,Artificial Intelligence ,Software ,Management Information Systems - Published
- 2022
- Full Text
- View/download PDF
6. Graph Capsule Aggregation for Unaligned Multimodal Sequences
- Author
-
Sijie Mai, Haifeng Hu, and Jianfeng Wu
- Subjects
Structure (mathematical logic) ,FOS: Computer and information sciences ,Computer Science - Computation and Language ,Modalities ,Interpretation (logic) ,Dependency (UML) ,business.industry ,Computer science ,Sentiment analysis ,Pattern recognition ,Recurrent neural network ,Benchmark (computing) ,Graph (abstract data type) ,Artificial intelligence ,business ,Computation and Language (cs.CL) - Abstract
Humans express their opinions and emotions through multiple modalities which mainly consist of textual, acoustic and visual modalities. Prior works on multimodal sentiment analysis mostly apply Recurrent Neural Network (RNN) to model aligned multimodal sequences. However, it is unpractical to align multimodal sequences due to different sample rates for different modalities. Moreover, RNN is prone to the issues of gradient vanishing or exploding and it has limited capacity of learning long-range dependency which is the major obstacle to model unaligned multimodal sequences. In this paper, we introduce Graph Capsule Aggregation (GraphCAGE) to model unaligned multimodal sequences with graph-based neural model and Capsule Network. By converting sequence data into graph, the previously mentioned problems of RNN are avoided. In addition, the aggregation capability of Capsule Network and the graph-based structure enable our model to be interpretable and better solve the problem of long-range dependency. Experimental results suggest that GraphCAGE achieves state-of-the-art performance on two benchmark datasets with representations refined by Capsule Network and interpretation provided.
- Published
- 2021
- Full Text
- View/download PDF
7. Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing
- Author
-
Sijie Mai, Haifeng Hu, and Songlong Xing
- Subjects
Feature fusion ,Basis (linear algebra) ,business.industry ,Computer science ,Feature vector ,020206 networking & telecommunications ,02 engineering and technology ,Machine learning ,computer.software_genre ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Affective computing ,business ,computer - Abstract
We propose a general strategy named ‘divide, conquer and combine’ for multimodal fusion. Instead of directly fusing features at holistic level, we conduct fusion hierarchically so that both local and global interactions are considered for a comprehensive interpretation of multimodal embeddings. In the ‘divide’ and ‘conquer’ stages, we conduct local fusion by exploring the interaction of a portion of the aligned feature vectors across various modalities lying within a sliding window, which ensures that each part of multimodal embeddings are explored sufficiently. On its basis, global fusion is conducted in the ‘combine’ stage to explore the interconnection across local interactions, via an Attentive Bi-directional Skip-connected LSTM that directly connects distant local interactions and integrates two levels of attention mechanism. In this way, local interactions can exchange information sufficiently and thus obtain an overall view of multimodal information. Our method achieves state-of-the-art performance on multimodal affective computing with higher efficiency.
- Published
- 2019
- Full Text
- View/download PDF
8. Attentive matching network for few-shot learning
- Author
-
Sijie Mai, Haifeng Hu, and Jia Xu
- Subjects
Matching (graph theory) ,Computer science ,business.industry ,Feature extraction ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Function (mathematics) ,Class (biology) ,Similarity (network science) ,Discriminative model ,Signal Processing ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Few-shot learning has attracted increasing attention recently due to its broad applications. However, it remains unsolved for the difficulty of modeling under few data. In this paper, we present an effective framework named Attentive Matching Network (AMN) to address few-shot learning problem. Based on metric learning, AMN firstly learns robust representations via an elaborately designed embedding network using only few samples. And then distances between representations of support samples and target samples are calculated using similarity function to form a score vector, according to which classification is conducted. Different from existing algorithms, we propose a feature-level attention mechanism to help similarity function pay more emphasis on the features that better reflect the inter-class differences as well as to help embedding network learn better feature extraction capability. Furthermore, to learn a discriminative embedding space that maximizes inter-class distance and minimizes intra-class distance, we introduce a novel Complementary Cosine Loss, which consists of two parts: a modified Cosine Distance Loss for calculating distance between predicted category similarity and the true one that directly takes advantage of all support samples to compute gradients, and a Hardest-category Discernment Loss for handling the similarity of the hardest incorrect class. Results demonstrate that AMN achieves competitive performances on Omniglot and miniImageNet datasets. In addition, we conduct extensive experiments to discuss the influences of embedding network, attention mechanism and loss function.
- Published
- 2019
- Full Text
- View/download PDF
9. Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion
- Author
-
Sijie Mai, Haifeng Hu, and Songlong Xing
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,Machine Learning (cs.LG) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Adversarial system ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,business.industry ,020206 networking & telecommunications ,General Medicine ,Visualization ,Multimedia (cs.MM) ,Embedding ,Graph (abstract data type) ,Artificial intelligence ,0305 other medical science ,business ,computer ,Encoder ,Feature learning ,Computer Science - Multimedia - Abstract
Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore, we exert additional constraints on embedding space by introducing reconstruction loss and classification loss. Then we fuse the encoded representations using hierarchical graph neural network which explicitly explores unimodal, bimodal and trimodal interactions in multi-stage. Our method achieves state-of-the-art performance on multiple datasets. Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative. code is available at: \url{https://github.com/TmacMai/ARGF_multimodal_fusion}, Comment: Accepted by AAAI-2020; code is available at: https://github.com/TmacMai/ARGF_multimodal_fusion
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.