Home

Awesome

Framework

This repo contains the official implementation of the ACM MM 2023 paper:

<div align="center"> <h1> <b> CATR: Combinatorial-Dependence Audio-Queried Transformer<br> for Audio-Visual Video Segmentation </b> </h1> <h4> <b> Kexin Li, Zongxin Yang∗, Lei Chen, Yi Yang, Jun Xiao </b> </h4> </div>

Motivation

motivation

Environment Installation

The code was tested on a Conda environment with CUDA Version as 11.7. Install Conda and then create an environment as follows:

conda create -n catr python=3.8.17 pip -y

conda activate catr

conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia

Note that you might have to change the cudatoolkit version above according to your system's CUDA version.

pip install transformers==4.11.3

pip install h5py wandb opencv-python protobuf av einops ruamel.yaml timm joblib

conda install -c conda-forge pandas matplotlib cython scipy cupy

Running Configuration

Parameter Settings are divided into fixed parameters and adjustable parameters.

The following table lists the parameters which can be configured directly from the command line.

The rest of the fixed parameters for each dataset can be configured in configs/DATASET_NAME.yaml.

CommandDescription
-visual_backboneresnet50 or pvt
-log_dirthe path to save train logs
-config_paththe path for fixed parameters
-train_batch_sizetraining batch size per GPU
-val_batch_sizeeval batch size per GPU
-max_epochesthe max number of epoches to run

Data Preparation

MS(Fully-supervised Multiple-sound Source Segmentation): https://forms.gle/GKzkU2pEkh8aQVHN6 <br> S4(Semi-supervised Single-sound Source Segmentation): https://forms.gle/GKzkU2pEkh8aQVHN6 <br> AVSS( Fully-supervised Audio-Visual Semantic Segmentation): https://forms.gle/15GZeDbVMe2z6YUGA <br>

Pretrained Backbones

The pretrained backbones can be downloaded from here and placed to the directory pretrained_backbones.

Config File

The Config File can be downloaded from here and the extraction code is 'tybr'.

Framework

framework