Awesome
DUET
In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.
- Due to the
page and format restrictions
set by AAAI publications, we have omitted some details and appendix content. For the complete version of the paper, including theselection of prompts
andexperiment details
, please refer to our arXiv version.
π News
2024-02
We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo
].2023-12
Our paper: Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations was accepted byAAAI 2024
π€ Model Architecture
π Dataset Download
- The cache data for
(CUB, AWA, SUN)
are availablehere
(Baidu cloud
,19.89G
, Code:s07d
).
π Code Path
Code Structures
There are four parts in the code.
- model: It contains the main files for DUET network.
- data: It contains the data splits for different datasets.
- cache: It contains some cache files.
- script: The training scripts for DUET.
DUET
βββ cache
βΒ Β βββ AWA2
βΒ Β βΒ Β βββ attributeindex2prompt.json
βΒ Β βΒ Β βββ id2imagepixel.pkl
βΒ Β βββ CUB
βΒ Β βΒ Β βββ attributeindex2prompt.json
βΒ Β βΒ Β βββ id2imagepixel.pkl
βΒ Β βΒ Β βββ mapping.json
βΒ Β βββ SUN
βΒ Β βΒ Β βββ attributeindex2prompt.json
βΒ Β βΒ Β βββ id2imagepixel.pkl
βΒ Β βΒ Β βββ mapping.json
βββ data
βΒ Β βββ AWA2
βΒ Β βΒ Β βββ APN.mat
βΒ Β βΒ Β βββ TransE_65000.mat
βΒ Β βΒ Β βββ att_splits.mat
βΒ Β βΒ Β βββ attri_groups_9.json
βΒ Β βΒ Β βββ kge_CH_AH_CA_60000.mat
βΒ Β βΒ Β βββ res101.mat
βΒ Β βββ CUB
βΒ Β βΒ Β βββ APN.mat
βΒ Β βΒ Β βββ att_splits.mat
βΒ Β βΒ Β βββ attri_groups_8.json
βΒ Β βΒ Β βββ attri_groups_8_layer.json
βΒ Β βΒ Β βββ res101.mat
βΒ Β βββ SUN
βΒ Β βββ APN.mat
βΒ Β βββ att_splits.mat
βΒ Β βββ attri_groups_4.json
βΒ Β βββ res101.mat
βββ log
βΒ Β βββ AWA2
βΒ Β βββ CUB
βΒ Β βββ SUN
βββ model
βΒ Β βββ log.py
βΒ Β βββ main.py
βΒ Β βββ main_utils.py
βΒ Β βββ model_proto.py
βΒ Β βββ modeling_lxmert.py
βΒ Β βββ opt.py
βΒ Β βββ swin_modeling_bert.py
βΒ Β βββ util.py
βΒ Β βββ visual_utils.py
βββ out
βΒ Β βββ AWA2
βΒ Β βββ CUB
βΒ Β βββ SUN
βββ script
βββ AWA2
βΒ Β βββ AWA2_GZSL.sh
βββ CUB
βΒ Β βββ CUB_GZSL.sh
βββ SUN
βββ SUN_GZSL.sh
π¬ Dependencies
Python 3
PyTorch >= 1.8.0
Transformers>= 4.11.3
NumPy
- All experiments are performed with one RTX 3090Ti GPU.
π― Prerequisites
- Dataset: please download the dataset, i.e., CUB, AWA2, SUN, and change the
opt.image_root
to the dataset root path on your machine- βNOTE: For other required feature files like
APN.mat
andid2imagepixel.pkl
, please refer to here.
- βNOTE: For other required feature files like
- Data split: please download the data folder and place it in
./data/
. Attributeindex2prompt.json
should generate and place it in./cache/dataset/
.- Download pretrained vision Transformer as the vision encoder:
π Train & Eval
The training script for AWA2_GZSL:
bash script/AWA2/AWA2_GZSL.sh
Parameter
[--dataset {AWA2, SUN, CUB}] [--calibrated_stacking CALIBRATED_STACKING] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--classifier_lr LEARNING-RATE] [--xe XE] [--attri ATTRI] [--gzsl] [--patient PATIENT] [--model_name MODEL_NAME] [--mask_pro MASK-PRO]
[--mask_loss_xishu MASK_LOSS_XISHU] [--xlayer_num XLAYER_NUM] [--construct_loss_weight CONSTRUCT_LOSS_WEIGHT] [--sc_loss SC_LOSS] [--mask_way MASK_WAY]
[--attribute_miss ATTRIBUTE_MISS]
π Note:
- you can open the
.sh
file for <a href="#Parameter">parameter</a> modification. - Don't worry if you have any question. Just feel free to let we know via
Adding Issues
.
π€ Cite:
Please consider citing this paper if you use the code
or data
from our work.
Thanks a lot :)
@inproceedings{DBLP:conf/aaai/ChenHCGZFPC23,
author = {Zhuo Chen and
Yufeng Huang and
Jiaoyan Chen and
Yuxia Geng and
Wen Zhang and
Yin Fang and
Jeff Z. Pan and
Huajun Chen},
title = {{DUET:} Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning},
booktitle = {{AAAI}},
pages = {405--413},
publisher = {{AAAI} Press},
year = {2023}
}
<a href="https://info.flagcounter.com/VOlE"><img src="https://s11.flagcounter.com/count2/VOlE/bg_FFFFFF/txt_000000/border_F7F7F7/columns_6/maxflags_12/viewers_3/labels_0/pageviews_0/flags_0/percent_0/" alt="Flag Counter" border="0"></a>