Awesome

Object-aware Gaze Target Detection

Official repo of the paper "Object-aware Gaze Target Detection" (ICCV 2023).

Method

Description

This repo contains all the code to train and evaluate our method. The code is based on PyTorch Lightning and Hydra.

Please follow the instructions below to install dependencies and run the code. We provide configurations to train the model on GazeFollow and VideoAttentionTarget, and you can easily tune them by looking at the parameters of each module in the configs/ folder.

Prerequisites

Environment and dependencies

We provide a pip requirements file to install all the dependencies. We recommend using a conda environment to install the dependencies.

# Clone project and submodules
git clone --recursive https://github.com/francescotonini/object-aware-gaze-target-detection.git
cd object-aware-gaze-target-detection

# Create conda environment
conda create -n object-aware-gaze-target-detection python=3.9
conda activate object-aware-gaze-target-detection

# Install requirements
pip install -r requirements.txt

(optional) Setup wandb

cp .env.example .env

# Add token to .env

Dataset preprocessing

The code expects that the datasets are placed under the data/ folder. You can change this by modifying the data_dir parameter in the configuration files.

cat <<EOT >> configs/local/default.yaml
# @package _global_

paths:
  data_dir: "{PATH TO DATASETS}"
EOT

The implementation requires both object and face annotations and depth maps from MiDaS. Therefore, you need to run the following script to extract face and object annotations.

# GazeFollow
python scripts/gazefollow_get_aux_faces.py --dataset_dir /path/to/gazefollow --subset train
python scripts/gazefollow_get_aux_faces.py --dataset_dir /path/to/gazefollow --subset test
python scripts/gazefollow_get_objects.py --dataset_dir /path/to/gazefollow --subset train
python scripts/gazefollow_get_objects.py --dataset_dir /path/to/gazefollow --subset test
python scripts/gazefollow_get_depth.py --dataset_dir /path/to/gazefollow

# VideoAttentionTarget
cp data/videoattentiontarget_extended/*.csv /path/to/videoattentiontarget

python scripts/videoattentiontarget_get_aux_faces.py --dataset_dir /path/to/videoattentiontarget --subset train
python scripts/videoattentiontarget_get_aux_faces.py --dataset_dir /path/to/videoattentiontarget --subset test
python scripts/videoattentiontarget_get_objects.py --dataset_dir /path/to/videoattentiontarget --subset train
python scripts/videoattentiontarget_get_objects.py --dataset_dir /path/to/videoattentiontarget --subset test
python scripts/videoattentiontarget_get_depth.py --dataset_dir /path/to/videoattentiontarget

Training

We provide configuration to train on GazeFollow and VideoAttentionTarget (see configs/experiment/). First, you need to pretrain the method for object detection only.

python src/train.py experiment=gotd_gazefollow_pretrain_od

The pretraining is useful to initialize the object detection head of the model for face recognition. Then, you can train the model on GazeFollow or VideoAttentionTarget.

# GazeFollow
python src/train.py experiment=gotd_gazefollow model.net_pretraining={URL/PATH TO GAZEFOLLOW OD PRETRAINING}

# VideoAttentionTarget
python src/train.py experiment=gotd_videoattentiontarget model.net_pretraining={URL/PATH TO GAZEFOLLOW TRAINED MODEL}

Evaluation

The configuration files are also useful when evaluating the model.

# GazeFollow
python src/eval.py experiment=gotd_gazefollow ckpt_path={PATH TO CHECKPOINT}

# VideoAttentionTarget
python src/eval.py experiment=gotd_videoattentiontarget ckpt_path={PATH TO CHECKPOINT}

Checkpoints

We provide checkpoints for GazeFollow and VideoAttentionTarget. NOTE: when evaluating on the checkpoints above, replace ckpt_path={PATH_TO_CHECKPOINT} with +model.net_pretraining={PATH_TO_CHECKPOINT}.

Acknowledgements

This code is based on PyTorch Lightning, Hydra, and the official DETR implementation.

Cite us

@inproceedings{tonini2023objectaware,
  title={Object-aware Gaze Target Detection},
  author={Tonini, Francesco and Dall'Asen, Nicola and Beyan, Cigdem and Ricci, Elisa},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={21860--21869},
  year={2023}
}