Awesome

[ECCV2024 oral] C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Project Page | Paper

C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiaojun Wu†, Muhammad Awais, Sara Atito, Josef Kittler
ECCV, 2024

<div align="center"> <table style="border-collapse: collapse;"> <tr> <td style="text-align: center; padding: 10px;"> <img src="samples/open_door.gif" width="120" /> Seen: Open a door </td> <td style="text-align: center; padding: 10px;"> <img src="samples/close_book.gif" width="120" /> Seen: Close a book </td> <td style="height: 120px; width: 1px; border-left: 2px dashed gray; text-align: center; padding: 10px;"></td> <td style="text-align: center; padding: 10px;"> <img src="samples/close_door.gif" width="120" /> Unseen: Close a door </td> </tr> </table> <div style="margin-top: 1px;"> Zero-Shot Compositional Action Recognition (ZS-CAR) </div> </div>

🛠️ Prepare Something-composition (Sth-com)

<img src="samples/bend_spoon.gif" height="80" /> <img src="samples/bend_book.gif" height="80" /> <img src="samples/close_door.gif" height="80" /> <img src="samples/close_book.gif" height="80" /> <img src="samples/twist_obj.gif" height="80" /> <img src="samples/squeeze_bottle.gif" height="80" /> <img src="samples/squeeze_pillow.gif" height="80" /> <img src="samples/tear_card.gif" height="80" /> <img src="samples/tear_leaf.gif" height="80" /> <img src="samples/open_wallet.gif" height="80" /> Some samples in Something-composition

Download Something-Something V2 (Sth-v2). Our proposed Something-composition (Sth-com) is based on Sth-V2. We refer to the official website to download the videos to the path video_path.
Extract frames. To accelerate the dataloader when training, we extract the frames for each video and save them in the frame_path. The command is:
```
python tools/extract_frames.py --video_root video_path --frame_root frame_path
```

Download Dataset annotations. We provide our Sth-com annotation files in the data_split dir. The format is like:

  [
      {
      "id": "54463", # means the sample name
      "action": "opening a book", # means composition
      "verb": "Opening [something]", # means the verb component
      "object": "book" # means the object component
      },
      {
        ...
      },
      {
        ...
      },
  ]

Please kindly download these files to annotation_path.

Finally, the dataset is built successfully. The structure looks like:
- annotation_path
  - data_split
    - generalized
      
      train_pairs.json
      
      val_pairs.json
      
      test_pairs.json
- frame_path
  - 0
    - 000001.jpg
    - 000002.jpg
    - ......
  - 1
    - 000001.jpg
    - 000002.jpg
    - ......
  - ......

🚀 Train and test

🔔 Now take the dir codes as the project root.

Before running

Prepare the word embedding models. We recommend following Compcos to download the word embedding models.
You should modify the paths :

(For example, running C2C_vanilla, TSM-18 as the backbone.)
1. dataset_path in ./config/c2c_vanilla_tsm.yml
2. save_path in ./config/c2c_vanilla_tsm.yml
3. The code line: t=fasttext.load_model('YOUR_PATH/cc.en.300.bin') in models/vm_models/word_embedding.py

Train

Train a model with the command:

CUDA_VISIBLE_DEVICES=YOUR_GPU_INDEXEX python train.py --config config/c2c_vm/c2c_vanilla_tsm.yml

Test

For the test, imagine you have trained your model and set the log dir as YOUR_LOG_PATH.

Then, you can test it using:

CUDA_VISIBLE_DEVICES=YOUR_GPU_INDEXEX python test_for_models.py --logpath YOUR_LOG_PATH

📝 TODO List

Add training codes for VM+word embedding paradigm.
Add training codes from VLM paradigm.