Home

Awesome

MutualSL

Main Method

main

we propose a novel cross-modal mutual knowledge transfer span localization (MutualSL) method to reduce the cross-modal knowledge deviation shown in Fig.~\ref{sample2}(d). Specifically, the MutualSL uses both visual predictor and textual predictor, where these two predictors have different prediction targets so that they have different strength perceptions of different-modal information. We expect that these two predictors can enhance the information perception of their own modal. Each predictor needs to predict the output value of another predictor on the basis of the target answer in the training stage. Then we design a one-way dynamic loss function (ODL) to dynamically adjust the knowledge transfer, which can alleviate the difference of cross-modal knowledge transferring in the training process.

Prerequisites

Installing the GPU driver

# preparing environment
sudo apt-get install gcc
sudo apt-get install make
wget https://developer.download.nvidia.com/compute/cuda/11.5.1/local_installers/cuda_11.5.1_495.29.05_linux.run
sudo sh cuda_11.5.1_495.29.05_linux.run

Installing Conda and Python

# preparing environment
wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
sudo chmod 777 Miniconda3-latest-Linux-x86_64.sh 
bash Miniconda3-latest-Linux-x86_64.sh

conda create -n mutualsl python==3.7
conda activate mutualsl

Installing Python Libraries

# preparing environment
pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install tqdm transformers sklearn pandas numpy accelerate sentencepiece pkuseg

Data

Download the MedVidQA dataset from [here](https://pan.baidu.com/s/1lcBUb8JYWUVaZcRZq3RT5Q?pwd=8888

) [Copyright belongs to NIH]

Download the TutorialVQA dataset from here (Researchers should get in touch with us and sign a copyright agreement)

Download the VehicleVQA dataset from here (Researchers should get in touch with us and sign a copyright agreement)

place them in ./data directory.

Quick Start

Get Best

python main.py

All our hyperparameters are saved to run.sh file, you can easily reproduce our best results.

Files

-- data
-- paperlog

-- main.py
-- model.py


contributions

Cite

@article{weng2022visual,
  title={Visual Answer Localization with Cross-modal Mutual Knowledge Transfer},
  author={Weng, Yixuan and Li, Bin},
  journal={arXiv preprint arXiv:2210.14823},
  year={2022}
}