Home

Awesome

DUET

license arxiv badge AAAI Pytorch

In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.

πŸ”” News

πŸ€– Model Architecture

Model_architecture

πŸ“š Dataset Download

πŸ“• Code Path

Code Structures

There are four parts in the code.

DUET
β”œβ”€β”€ cache
β”‚Β Β  β”œβ”€β”€ AWA2
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attributeindex2prompt.json
β”‚Β Β  β”‚Β Β  └── id2imagepixel.pkl
β”‚Β Β  β”œβ”€β”€ CUB
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attributeindex2prompt.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ id2imagepixel.pkl
β”‚Β Β  β”‚Β Β  └── mapping.json
β”‚Β Β  └── SUN
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attributeindex2prompt.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ id2imagepixel.pkl
β”‚Β Β  β”‚Β Β  └── mapping.json
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ AWA2
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ APN.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ TransE_65000.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ att_splits.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attri_groups_9.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ kge_CH_AH_CA_60000.mat
β”‚Β Β  β”‚Β Β  └── res101.mat
β”‚Β Β  β”œβ”€β”€ CUB
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ APN.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ att_splits.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attri_groups_8.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attri_groups_8_layer.json
β”‚Β Β  β”‚Β Β  └── res101.mat
β”‚Β Β  └── SUN
β”‚Β Β      β”œβ”€β”€ APN.mat
β”‚Β Β      β”œβ”€β”€ att_splits.mat
β”‚Β Β      β”œβ”€β”€ attri_groups_4.json
β”‚Β Β      └── res101.mat
β”œβ”€β”€ log
β”‚Β Β  β”œβ”€β”€ AWA2
β”‚Β Β  β”œβ”€β”€ CUB
β”‚Β Β  └── SUN
β”œβ”€β”€ model
β”‚Β Β  β”œβ”€β”€ log.py
β”‚Β Β  β”œβ”€β”€ main.py
β”‚Β Β  β”œβ”€β”€ main_utils.py
β”‚Β Β  β”œβ”€β”€ model_proto.py
β”‚Β Β  β”œβ”€β”€ modeling_lxmert.py
β”‚Β Β  β”œβ”€β”€ opt.py
β”‚Β Β  β”œβ”€β”€ swin_modeling_bert.py
β”‚Β Β  β”œβ”€β”€ util.py
β”‚Β Β  └── visual_utils.py
β”œβ”€β”€ out
β”‚Β Β  β”œβ”€β”€ AWA2
β”‚Β Β  β”œβ”€β”€ CUB
β”‚Β Β  └── SUN
└── script
    β”œβ”€β”€ AWA2
    β”‚Β Β  └── AWA2_GZSL.sh
    β”œβ”€β”€ CUB
    β”‚Β Β  └── CUB_GZSL.sh
    └── SUN
        └── SUN_GZSL.sh

πŸ”¬ Dependencies

🎯 Prerequisites

πŸš€ Train & Eval

The training script for AWA2_GZSL:

bash script/AWA2/AWA2_GZSL.sh

Parameter

[--dataset {AWA2, SUN, CUB}] [--calibrated_stacking CALIBRATED_STACKING] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--classifier_lr LEARNING-RATE] [--xe XE] [--attri ATTRI] [--gzsl] [--patient PATIENT] [--model_name MODEL_NAME] [--mask_pro MASK-PRO] 
[--mask_loss_xishu MASK_LOSS_XISHU] [--xlayer_num XLAYER_NUM] [--construct_loss_weight CONSTRUCT_LOSS_WEIGHT] [--sc_loss SC_LOSS] [--mask_way MASK_WAY]
[--attribute_miss ATTRIBUTE_MISS]

πŸ“Œ Note:

🀝 Cite:

Please consider citing this paper if you use the code or data from our work. Thanks a lot :)

@inproceedings{DBLP:conf/aaai/ChenHCGZFPC23,
  author       = {Zhuo Chen and
                  Yufeng Huang and
                  Jiaoyan Chen and
                  Yuxia Geng and
                  Wen Zhang and
                  Yin Fang and
                  Jeff Z. Pan and
                  Huajun Chen},
  title        = {{DUET:} Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning},
  booktitle    = {{AAAI}},
  pages        = {405--413},
  publisher    = {{AAAI} Press},
  year         = {2023}
}

<a href="https://info.flagcounter.com/VOlE"><img src="https://s11.flagcounter.com/count2/VOlE/bg_FFFFFF/txt_000000/border_F7F7F7/columns_6/maxflags_12/viewers_3/labels_0/pageviews_0/flags_0/percent_0/" alt="Flag Counter" border="0"></a>