Home

Awesome

Decoupled Multimodal Distilling for Emotion Recognition, CVPR 2023.

Highlight paper (10% of accepted papers, 2.5% of submissions)

We propose a decoupled multimodal distillation (DMD) approach that facilitates flexible and adaptive crossmodal knowledge distillation. The key ingredients includes:

In general, the proposed GD paradigm provides a flexible knowledge transfer manner where the distillation weights can be automatically learned, thus enabling diverse crossmodal knowledge transfer patterns.

The motivation.

<div align=center><img src="figure_1.png" width="50%"></img></div>

Motivation and main idea:

The Framework.

The framework of DMD. Please refer to Paper Link for details.

The learned graph edges.

Illustration of the graph edges in HomoGD and HeteroGD. In (a), $L \to A$ and $L \to V$ are dominated because the homogeneous language features contribute most and the other modalities perform poorly. In (b), $L \to A$, $L \to V$, and $V \to A$ are dominated. $V \to A$ emerges because the visual modality enhanced its feature discriminability via the multimodal transformer mechanism in HeteroGD.

Usage

Prerequisites

Datasets

Data files (containing processed MOSI, MOSEI datasets) can be downloaded from here. You can put the downloaded datasets into ./dataset directory. Please note that the meta information and the raw data are not available due to privacy of Youtube content creators. For more details, please follow the official website of these datasets.

Run the Codes

First, you need to set the necessary parameters in the ./config/config.json. Then, you can select the training dataset in train.py. Training the model as below:

python train.py

By default, the trained model will be saved in ./pt directory. You can change this in train.py.

Testing the trained model as below:

python test.py

Please set the path of trained model in run.py (line 174). We also provide some pretrained models for testing. (Google drive)

Citation

If you find the code helpful in your resarch or work, please cite the following paper.

@InProceedings{Li_2023_CVPR,
    author    = {Li, Yong and Wang, Yuanzhi and Cui, Zhen},
    title     = {Decoupled Multimodal Distilling for Emotion Recognition},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {6631-6640}
}