The new AI framework for emotion analysis introduced by the Chinese research team consists of two main components: feature extraction and two stages of information fusion. The framework utilizes stacked transformers, including bidirectional cross-modal transformers and a transformer encoder, to capture interactions between different modalities and enhance emotion prediction. An attention weight accumulation mechanism is also implemented to extract deeper shared information during fusion.
The framework improves emotion analysis by using a novel two-stage approach that fuses information from different modalities in two stages, effectively capturing information on both levels. The core of the framework is stacked transformers, which consist of bidirectional cross-modal transformers and a transformer encoder, allowing for cross-modal interaction and more nuanced second-stage fusion.
The publication date of the study in Intelligent Computing is May 24, 2024. The study introduces a novel two-stage framework using two stacked layers of transformers for multimodal sentiment analysis, effectively capturing information on both levels of modality interactions.