

Paper Link

Please find our paper at https://aclanthology.org/2024.findings-acl.663/


  1. Clone the repository:
git clone https://github.com/justaguyalways/ToxVidLLM_ACL_2024.git
cd ToxVidLLM_ACL_2024
  1. Create a conda environment and activate it:
conda create --name your-env-name python=3.8
conda activate your-env-name
  1. Install the required packages:
pip install -r requirements.txt


  1. Download the dataset from the following link: ToxCMM Dataset Link

  2. Unzip the downloaded file:

unzip dataset.zip
  1. Move the unzipped folder to the final_data directory within the repository:
mv path_to_unzipped_folder final_data



To train the model, run train.py. You can specify which GPU to use with the CUDA_VISIBLE_DEVICES environment variable. Replace xxxx with the appropriate GPU ID (e.g., 0 for the first GPU).

CUDA_VISIBLE_DEVICES=xxxx python train.py


CUDA_VISIBLE_DEVICES=0 python train.py


To test the model, run test.py. Similarly, you can specify the GPU with CUDA_VISIBLE_DEVICES.

CUDA_VISIBLE_DEVICES=xxxx python test.py


CUDA_VISIBLE_DEVICES=0 python test.py


If you use our work or find it useful, please cite:

    title = "{T}ox{V}id{LM}: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos",
    author = "Maity, Krishanu  and
      Sangeetha, Poornash  and
      Saha, Sriparna  and
      Bhattacharyya, Pushpak",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.663",
    pages = "11130--11142",