Home

Awesome

[COLING'25] HGCLIP

👀Introduction

This repository contains the code for our paper HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding. [arXiv]

Created by Peng Xia, Xingtong Yu, ‪Ming Hu‬‬, Lie Ju, Zhiyong Wang, Peibo Duan, Zongyuan Ge.

💡Requirements

Environment

  1. Python 3.8.*
  2. CUDA 12.2
  3. PyTorch
  4. TorchVision

Install

Create a virtual environment and activate it.

conda create -n hgclip python=3.8
conda activate hgclip

The code has been tested with PyTorch 1.13 and CUDA 12.2.

pip install -r requirements.txt

⏳Dataset

Please first download the required datasets. Follow prepare_datasets.md to install the datasets.

📦Usage

Training & Evaluation

To train or evaluation our HGCLIP, you need to first generate and save the prototypes.

python generate_prototypes.py \
--dataset 'air' \
--batch_size 64 \
--gpu_id 0 

Then run

cd hgclip
python main.py \
--config configs/air/train_gnn.py

Zero-Shot Evaluation

To evaluation the performance of zero-shot CLIP, run

cd zsclip
python zero_shot.py \
--config configs/air/zero_shot_clip.py

Quick Start

The main script for training and evaluating model performance is hgclip/main.py. Here are the list of key arguments:

🙏Acknowledgements

We use code from MaPLe, CoCoOp-CoOp and CLIP. We thank the authors for releasing their code.

📧Contact

If you have any questions, please create an issue on this repository or contact at richard.peng.xia@gmail.com.

📝Citing

If you find this code useful, please consider to cite our work.

@article{xia2023hgclip,
 title={HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding},
 author={Xia, Peng and Yu, Xingtong and Hu, Ming and Ju, Lie and Wang, Zhiyong and Duan, Peibo and Ge, Zongyuan},
 journal={arXiv preprint arXiv:2311.14064},
 year={2023}
}