Awesome

APL

This repo is the official implementation of the paper "APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension" APL

Project structure

The directory structure of the project looks like this:

├── README.md            <- The top-level README for developers using this project.
│
├── config               <- configuration 
│
├── data
│   ├── anns            <- note: cat_name.json is for prompt template usage
│
├── datasets               <- dataloader file
│
│
├── models  <- Source code for use in this project.
│   │
│   ├── language_encoder.py             <- encoder for images' text descriptions 
│   ├── network_blocks.py               <- files included essential model blocks 
│   ├── tag_encoder.py                  <- encoder for extracting prompt embeddings 
│   ├── visual_encoder.py               <- visual backbone ,also includes prompt template encoder
│   │
│   │
│   ├── APL           <- most important files for APL model implementations
│   │   ├── __init__.py
│   │   ├── head.py   <- for anchor-prompt contrastive loss
|   |   ├── net.py    <- main code for APL model
│   │   ├── sup_head.py <- visual alignment loss
│   │
│   │
├── utils  <- hepler functions
├── requirements.txt     <- The requirements file for reproducing the analysis environment
│── train.py   <- script for training the model
│── test.py <- script for testing from a model
│
└── LICENSE              <- Open-source license if one is chosen

Installation

Instructions on how to clone and set up your repository:

Clone this repo :

Clone the repository and navigate to the project directory:

git clone https://github.com/Yaxin9Luo/APL.git
cd APL

Create a conda virtual environment and activate it:

conda create -n apl python=3.7 -y
conda activate apl

Install the required dependencies:

Install Pytorch following the offical installation instructions

(We run all our experiments on pytorch 1.11.0 with CUDA 11.3)

Install apex following the official installation guide for more details.

(or use the following commands we copied from their offical repo)

git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Compile the DCN layer:

cd utils/DCN
./make.sh

Install remaining dependencies

pip install -r requirements.txt
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz

Data Preparation

Download images and Generate annotations according to SimREC

(We also prepared the annotations inside the data/anns folder for saving your time)

Download the pretrained weights of YoloV3 from Google Drive

(We recommend to put it in the main path of APL otherwise, please modify the path in config files)

The data directory should look like this:

├── data
│   ├── anns            
│       ├── refcoco.json            
│       ├── refcoco+.json              
│       ├── refcocog.json                 
│       ├── refclef.json
│       ├── cat_name.json       
│   ├── images 
│       ├── train2014
│           ├── COCO_train2014_000000515716.jpg              
│           ├── ...
│       ├── refclef
│           ├── 99.jpg              
│           ├── ...

... the remaining directories

NOTE: our YoloV3 is trained on COCO’s training images, excluding those in RefCOCO, RefCOCO+, and RefCOCOg’s validation+testing

Training

python train.py --config ./configs/[DATASET_NAME].yaml

Evaluation

python test.py --config ./config/[DATASET_NAME].yaml --eval-weights [PATH_TO_CHECKPOINT_FILE]

Model Zoo

Weakly REC

Method	RefCOCO			RefCOCO+			RefCOCOg
	val	testA	testB	val	testA	testB	val-g
APL	64.51	61.91	63.57	42.70	42.84	39.80	50.22

Weakly RES

Method	RefCOCO			RefCOCO+			RefCOCOg
	val	testA	testB	val	testA	testB	val-g
APL	55.92	54.84	55.64	34.92	34.87	35.61	40.13

Pesudo Labels to training other models ( Weakly Supervsied Training Schema)

Method	RefCOCO			RefCOCO+			RefCOCOg
	val	testA	testB	val	testA	testB	val-g
APL_SimREC	63.94	64.72	61.21	42.11	44.85	38.31	48.35
APL_TransVG	64.86	64.89	63.87	39.28	41.08	36.45	46.11

Visualization Prediction Results (Blue box is ground truth)

Image Description : "No cut piece but 7am of cut piece"

Image Description : "Green apple on the left"

Image Description : "Purple book"

Image Description : "Yellow round fruit with blemish"

Image Description : "From bottom right second up"