Home

Awesome

Structure-CLIP

license arxiv badge AAAI Pytorch

This paper introduces an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge to enhance multi-modal structured representations.

🔔 News

🌈 Model Architecture

Model_architecture

📚 Dataset Download

Training datasets are available here .

📕 Code Path

Code Structures

There are four parts in the code.

🔬 Dependencies

🚀 Train & Eval

The training script:

bash script/run.sh

Parameter

[--train_path TRAIN_PATH] [--test_path TEST_PATH] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--lr LEARNING-RATE] [--weight_decay WEIGHT_DECAY] [--knowledge_weight KNOWLEDGE_WEIGHT] [--transformer_layer_num NUMBER] [--model_name MODEL_NAME] [--neg_loss_weight NEG_LOSS_WEIGHT] 

Note:

🤝 Cite:

Please consider citing this paper if you use the code or data from our work. Thanks a lot :)

@inproceedings{DBLP:conf/aaai/StructureCLIP,
  author       = {Yufeng Huang and
                  Jiji Tang and
                  Zhuo Chen and
                  Rongsheng Zhang and
                  Xinfeng Zhang and
                  Weijie Chen and
                  Zeng Zhao and
                  Zhou Zhao and
                  Tangjie Lv and
                  Zhipeng Hu and
                  Wen Zhang},
  title        = {Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations},
  booktitle    = {{AAAI}},
  publisher    = {{AAAI} Press},
  year         = {2024}
}