Home

Awesome

<div align="center"> <img src="graphics/TV_GZSL.png" alt="TV-GZSL Framework" width="67%"/> </div> <h2 align="center">A <b>T</b>oolkit for large scale analysis of <b>V</b>isual features and<br> Generalized Zero-Shot Learning (<b>GZSL</b>) methods</h2> <hr>

Table of Contents

Requirements

Data Setup

  1. Clone repository and create a new data directory
    ~$ git clone https://github.com/uvavision/TV-GZSL.git
    ~$ cd TV-GZSL
    ~$ mkdir data
    
  2. Download all files located in this folder.
    • You can also download only the features from the backbones you want to use in your experiments.
    • If you want to use the traditional RN101 and RN101 fine-tuned features for each dataset you can download this folder only. Just make sure it is inside the data directory you just created.

Features and Methods

DatasetsBackbone TypesGZSL Families
CUBCNNEmbedding-based
SUNViTGenerative-based
AWA2MLP-MixerDisentanglement-based

FAQ

CUDA_VISIBLE_DEVICES=0 python main.py --method CADA --dataset CUB --feature_backbone resnet101
CUDA_VISIBLE_DEVICES=0 python main.py --method CADA --dataset CUB --feature_backbone resnet101 --finetuned_features
CUDA_VISIBLE_DEVICES=1 python main.py --method SDGZSL --dataset CUB --feature_backbone vit_b32_clip

Available Parameters

Everything you need to run is in main.py. The Wrapper class contains all the main functions to create the model, prepare the dataset, and train your model. The arguments you pass are handled by the Wrapper.<br> Please play a special attention to the --feature_backbone parameter to use the pre-computed features you are looking for!

usage: main.py [-h] [--dataset DATASET]
               [--feature_backbone {resnet101,resnet152,resnet50,resnet50_moco,googlenet,vgg16,alexnet,shufflenet,vit,vit_large,adv_inception_v3,inception_v3,resnet50_clip,resnet101_clip,resnet50x4_clip,resnet50x16_clip,resnet50x64_clip,vit_b32_clip,vit_b16_clip,vit_l14_clip,virtex,virtex2,mlp_mixer,mlp_mixer_l16,vit_base_21k,vit_large_21k,vit_huge,deit_base,dino_vitb16,dino_resnet50,biggan_138k_128size,biggan_100k_224size,vq_vae_fromScratch,soho,combinedv1,combinedv2,vit_l14_clip_finetune_v2,vit_l14_clip_finetune_classAndAtt,vit_l14_clip_finetune_class200Epochs,vit_l14_clip_finetune_trainsetAndgenerated_100Epochs,vit_l14_clip_finetune_trainsetAndgenerated_200Epochs,vit_l14_clip_finetuned_classAndAtt_200Epochs,vit_l14_clip_finetuned_setAndgenerated_classAndAtt_100Epochs,vit_l14_clip_finetuned_setAndgenerated_classAndAtt_200Epochs,clip_l14_finetune_classes_200epochs,clip_l14_finetun_atts_200epochs,clip_l14_finetun_atts_200epochs,clip_l14_finetune_classes_200epochs_frozenAllExc1Layer,clip_l14_finetun_atts_200epochs_frozenAllExc1Layer,clip_l14_finetune_classAndAtt_200epochs_frozenAllExc1Layer,clip_l14_finetune_classes_200epochs_frozenTextE,clip_l14_finetun_atts_200epochs_frozenTextE,clip_l14_finetune_classAndAtt_200epochs_frozenTextE,clip_l14_finetun_atts_fromMAT_200epochs,clip_l14_finetun_classAndatts_fromMAT_200epochs,clip_l14_finetun_class_fromMAT_200epochs,vit_large_finetune_classes_200epochs}]
               [--methods {DEVISE,ESZSL,ALE,CADA,tfVAEGAN,CE,SDGZSL,FREE,UPPER_BOUND}]
               [--finetuned_features] [--data_path DATA_PATH]
               [--workers WORKERS] [--dropout DO] [--optimizer OPTIMIZER]
               [--epochs N] [--start_epoch N] [-b N] [--lr LR]
               [--initial_lr LR] [--lr_rampup EPOCHS]
               [--lr_rampdown_epochs EPOCHS] [--momentum M] [--nesterov]
               [--weight-decay W] [--doParallel] [--print_freq N]
               [--root_dir ROOT_DIR] [--add_name ADD_NAME] [--exp_dir EXP_DIR]
               [--load_from_epoch LOAD_FROM_EPOCH] [--seed SEED]
<!-- <br/> -->

Original Method Repositories: - please cite all of them accordingly!

Finetuning

  1. Download the dataset images and annotations:
    1. CUB: http://www.vision.caltech.edu/datasets/cub_200_2011/
    2. SUN: https://cs.brown.edu/~gmpatter/sunattributes.html
    3. AWA2: https://cvml.ist.ac.at/AwA2/
  2. Unzip them in a data folder inside the finetune folder:
    ~$ cd anonymized_code/finetune/
    ~$ mkdir data
    ~$ tar -xvf [filename]
    
  3. Finetune:

How to: Adding New Methods

You can add a new method under the methods folder. Then, you should only modify the utils/general_config.py and wrapper.py files to reference your new method:

  1. Add your method name in the choices array of the methods argument in utils/general_config.py array all_methods.
  2. In wrapper.py you should include the new parameter option when initializing the Wrapper Class.
  3. To support all available features in your custom method: from utils.cada_dataloader import DATA_LOADER
  4. To reuse the final classifier for Generative-based and Disentanglement-based methods, you can use the LINEAR_LOGSOFTMAX class inside wrapper.py

Updates

<br><br>

<hr>

On the Transferability of Visual Features in Generalized Zero-Shot Learning (GZSL) :: TV-GZSL

About

Our work provides a comprehensive benchmark for Generalized Zero-Shot Learning (GZSL). We benchmark extensively the utility of different GZSL methods which we characterize as embedding-based, generative-based, and based on semantic disentanglement. We particularly investigate how these previous methods for GZSL fare against CLIP, a more recent large scale pretrained model that claims zero-shot performance by means of being trained with internet scale multimodal data. Our findings indicate that through prompt engineering over an off-the-shelf CLIP model, it is possible to surpass all previous methods on standard benchmarks for GZSL: CUB (Birds), SUN (scenes), and AWA2 (animals). While it is possible that CLIP has actually seen many of the unseen categories in these benchmarks, we also show that GZSL methods in combination with the feature backbones obtained through CLIP contrastive pretraining (e.g. ViT~L/14) still provide advantages in standard GZSL benchmarks over off-the-shelf CLIP with prompt engineering. In summary, some GZSL methods designed to transfer information from seen categories to unseen categories still provide valuable gains when paired with a comparable feature backbone such as the one in CLIP. Surprisingly, we find that generative-based GZSL methods provide more advantages compared to more recent methods based on semantic disentanglement. We release a well-documented codebase which both replicates our findings and provides a modular framework for analyzing representation learning issues in GZSL.