Home

Awesome

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

πŸ† The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

framework

We have two codebases. For the final submission, we conduct the feature ensemble, where features are from two codebases.

Part One is at here: https://github.com/ShuaiBai623/AIC2021-T5-CLV

Part Two is at here: https://github.com/layumi/NLP-AICity2021

Prepare

scripts/extract_vdo_frms.py is a Python script that is used to extract frames.

scripts/get_motion_maps.py is a Python script that is used to get motion maps.

scripts/deal_nlpaug.py is a Python script that is used for NLP augmentation.

The directory structures in data and checkpoints are as follows:

.
β”œβ”€β”€ checkpoints
β”‚Β Β  β”œβ”€β”€ motion_effb2_1CLS_nlpaug_288.pth
β”‚Β Β  β”œβ”€β”€ motion_effb3_NOCLS_nlpaug_320.pth
β”‚Β Β  β”œβ”€β”€ motion_SE_3CLS_nonlpaug_288.pth
β”‚Β Β  β”œβ”€β”€ motion_SE_NOCLS_nlpaug_288.pth
β”‚Β Β  └── motion_SE_NOCLS_nonlpaug_288.pth
└── data
 Β Β  β”œβ”€β”€ AIC21_Track5_NL_Retrieval
    β”‚Β Β  β”œβ”€β”€ train
    β”‚Β Β  └── validation
 Β Β  β”œβ”€β”€ motion_map 
 Β Β  β”œβ”€β”€ test-queries.json
 Β Β  β”œβ”€β”€ test-queries_nlpaug.json    ## NLP augmentation (Refer to scripts/deal_nlpaug.py)
 Β Β  β”œβ”€β”€ test-tracks.json
  Β  β”œβ”€β”€ train.json
 Β Β  β”œβ”€β”€ train_nlpaug.json
 Β Β  β”œβ”€β”€ train-tracks.json
 Β Β  β”œβ”€β”€ train-tracks_nlpaug.json    ## NLP augmentation (Refer to scripts/deal_nlpaug.py)
 Β Β  β”œβ”€β”€ val.json
 Β Β  └── val_nlpaug.json             ## NLP augmentation (Refer to scripts/deal_nlpaug.py)

Part One

Train

The configuration files are in configs.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -u main.py --name your_experiment_name --config your_config_file |tee log

Test

Change the RESTORE_FROM in your configuration file.

python -u test.py --config your_config_file

Extract the visual and text embeddings. The extracted embeddings can be found here.

python -u test.py --config configs/motion_effb2_1CLS_nlpaug_288.yaml
python -u test.py --config configs/motion_SE_NOCLS_nlpaug_288.yaml
python -u test.py --config configs/motion_effb2_1CLS_nlpaug_288.yaml
python -u test.py --config configs/motion_SE_3CLS_nonlpaug_288.yaml
python -u test.py --config configs/motion_SE_NOCLS_nonlpaug_288.yaml

Part Two

Link

Submission

During the inference, we average all the frame features of the target in each track as track features, the embeddings of text descriptions are also averaged as the query features. The cosine distance is used for ranking as the final result.

python scripts/get_submit.py

Friend Links: