Awesome

Event Penguins (CVPR 2024)

This is the official repository for Low-power, Continuous Remote Behavioral Localization with Event Cameras accepted at CVPR 2024 by Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez, Tom Hart, Alex Kacelnik, Guillermo Gallego

Project Page | Paper | Video | Data

</h2>

Citation
Quickstart
Details
Acknowledgements
In the Press
Additional Resources

Citation

If you use this work in your research, please consider citing:

@InProceedings{Hamann24cvpr,
    author    = {Hamann, Friedhelm and Ghosh, Suman and Martinez, Ignacio Juarez and Hart, Tom and Kacelnik, Alex and Gallego, Guillermo},
    title     = {Low-power Continuous Remote Behavioral Localization with Event Cameras},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {18612-18621}
}

Quickstart

Setup

You can use Miniconda to set up an environment:

conda create --name eventpenguins python=3.8
conda activate eventpenguins

Install PyTorch by choosing a command that matches your CUDA version. You can find the compatible commands on the PyTorch official website (tested with PyTorch 2.2.2), e.g.:

conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia

Install other required packages:

pip install -r requirements.txt

Preprocessing the data

Create a folder for the data:

cd <project-root>
mkdir data

Download the data and save it in <project-root>/data.
Create the pre-processed dataset with the following command:

python scripts/preprocess.py --data_root data/EventPenguins --output_dir data --recording_info_path config/annotations/recording_info.csv

This crops the events according to the pre-annotated nests and stores the recordings according to the split specified in the paper.

Inference

Create a folder for models:

mkdir models

Download the pre-trained model weights from here and save them in the models folder.
Run inference with the following command:

python scripts/inference.py --config config/exp/inference.yaml --verbose

Details

Original Data

The EventPenguins dataset contains 24 ten-minute recordings, with 16 annotated nests. An overview of the data can be found in config/annotations/recording_info.csv. Each recording has a roi_group_id, which links to the location of the 16 pre-annotated regions of interest, which can be found in config/annotations/rois (new set of ROIs when the camera was moved). The dataset is structured as follows:

EventPenguins/
├── <yy-mm-dd>_<hh-mm-ss>/  # (these folders are referred to as "recordings")
│   ├── frames/
│   │   ├── 000000000000.png
│   │   ├── 000000000001.png
│   │   └── ...
│   ├── events.h5
│   ├── frame_timestamps.txt  # [us]
│   └── metadata.yaml       
└── ...

Please note that we do not use the grayscale frames in our method but provide them for completeness.

Pre-processed Data

Structure

The processed data is stored in a single HDF5 file named preprocessed.h5. The file structure is organized as follows:

Each ten-minute recording is stored in a group labeled by its timestamp (e.g., 22-01-12_17-26-00).
Each group (timestamp) contains multiple subgroups, each corresponding to a specific ROI (nest) identified by an ID (e.g., N01).
Each ROI subgroup contains:
- An events dataset, where each event is represented as a row [x, y, t, p] indicating the event's x-position, y-position, timestamp (us), and polarity, respectively.
- Attributes height and width indicating the dimensions of the ROI.

Attributes

Each subgroup (ROI) has the following attributes:

height: The height of the ROI in pixels.
width: The width of the ROI in pixels.

Each main group (recording timestamp) has the following attribute:

split: Indicates the data split (e.g., train, test, validate) that the recording belongs to.

Annotations

The annotations are in config/annotations/annotations.json. The structure is very similar to ActivityNet, with an additional layer to consider different nests.

{
  "version": "VERSION 0.0",
  "database": {
    "<yy-mm-dd>_<hh-mm-ss>": {
      "annotations": {
        "<roi_id>": [
          {
            "label": <label>,
            "segment": [
              <t_start>,
              <t_end>
            ]
          },
          ...
        ]
      }
    }
  }
}

<yy-mm-dd>_<hh-mm-ss> is the identifier for a ten-minute recording
roi_id is an integer number encoding the nest
t_start and t_end are the start and end times of an action in seconds
The label is one of ["ed", "adult_flap", "chick_flap"].

"adult_flap" and "chick_flap" are other types of wing flapping easily confused with the ecstatic display (ed). We provide these labels for completeness, but they are not considered in our method.

Acknowledgements

The evaluation for activity detection is largely inspired by ActivityNet. We thank the authors for their excellent work.