Awesome
<div align="center">Temporal Logic Video (TLV) Dataset
</div> <!-- PROJECT LOGO --> <br /> <div align="center"> <a href="https://github.com/UTAustin-SwarmLab/temporal-logic-video-dataset"> <img src="images/logo.png" alt="Logo" width="240" height="240"> </a> <h3 align="center">Temporal Logic Video (TLV) Dataset</h3> <p align="center"> Synthetic and real video dataset with temporal logic annotation <br /> <a href="https://github.com/UTAustin-SwarmLab/temporal-logic-video-dataset"><strong>Explore the docs »</strong></a> <br /> <br /> <a href="https://anoymousu1.github.io/nsvs-anonymous.github.io/">NSVS-TL Project Webpage</a> · <a href="https://github.com/UTAustin-SwarmLab/Neuro-Symbolic-Video-Search-Temploral-Logic">NSVS-TL Source Code</a> </p> </div>Overview
The Temporal Logic Video (TLV) Dataset addresses the scarcity of state-of-the-art video datasets for long-horizon, temporally extended activity and object detection. It comprises two main components:
- Synthetic datasets: Generated by concatenating static images from established computer vision datasets (COCO and ImageNet), allowing for the introduction of a wide range of Temporal Logic (TL) specifications.
- Real-world datasets: Based on open-source autonomous vehicle (AV) driving datasets, specifically NuScenes and Waymo.
Table of Contents
- Dataset Composition
- Dataset (Release)
- Installation
- Usage
- Data Generation
- Contribution Guidelines
- License
- Acknowledgments
Dataset Composition
Synthetic Datasets
- Source: COCO and ImageNet
- Purpose: Introduce artificial Temporal Logic specifications
- Generation Method: Image stitching from static datasets
Real-world Datasets
- Sources: NuScenes and Waymo
- Purpose: Provide real-world autonomous vehicle scenarios
- Annotation: Temporal Logic specifications added to existing data
Dataset
<div align="center"> <a href="https://github.com/UTAustin-SwarmLab/temporal-logic-video-dataset"> <img src="images/teaser.png" alt="Logo" width="840" height="440"> </a> </div>Though we provide a source code to generate datasets from different types of data sources, we release a dataset v1 as a proof of concept.
Dataset Structure
We provide a v1 dataset as a proof of concept. The data is offered as serialized objects, each containing a set of frames with annotations. You can download the dataset from our dataset repository in Hugging Face.
File Naming Convention
\<tlv_data_type\>:source:\<datasource\>-number_of_frames:\<number_of_frames\>-\<uuid\>.pkl
Object Attributes
Each serialized object contains the following attributes:
ground_truth
: Boolean indicating whether the dataset contains ground truth labelsltl_formula
: Temporal logic formula applied to the datasetproposition
: A set of proposition for ltl_formulanumber_of_frame
: Total number of frames in the datasetframes_of_interest
: Frames of interest which satisfy the ltl_formulalabels_of_frames
: Labels for each frameimages_of_frames
: Image data for each frame
You can download a dataset from here. The structure of dataset is as follows: serializer
tlv-dataset-v1/
├── tlv_real_dataset/
├──── prop1Uprop2/
├──── (prop1&prop2)Uprop3/
├── tlv_synthetic_dataset/
├──── Fprop1/
├──── Gprop1/
├──── prop1&prop2/
├──── prop1Uprop2/
└──── (prop1&prop2)Uprop3/
Dataset Statistics
- Total Number of Frames
Ground Truth TL Specifications | Synthetic TLV Dataset | Real TLV Dataset | ||
---|---|---|---|---|
COCO | ImageNet | Waymo | Nuscenes | |
Eventually Event A | - | 15,750 | - | - |
Always Event A | - | 15,750 | - | - |
Event A And Event B | 31,500 | - | - | - |
Event A Until Event B | 15,750 | 15,750 | 8,736 | 19,808 |
(Event A And Event B) Until Event C | 5,789 | - | 7,459 | 7,459 |
- Total Number of datasets
Ground Truth TL Specifications | Synthetic TLV Dataset | Real TLV Dataset | ||
---|---|---|---|---|
COCO | ImageNet | Waymo | Nuscenes | |
Eventually Event A | - | 60 | - | - |
Always Event A | - | 60 | - | - |
Event A And Event B | 120 | - | - | - |
Event A Until Event B | 60 | 60 | 45 | 494 |
(Event A And Event B) Until Event C | 97 | - | 30 | 186 |
Installation
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip build
python -m pip install --editable ."[dev, test]"
Prerequisites
-
ImageNet (ILSVRC 2017):
ILSVRC/ ├── Annotations/ ├── Data/ ├── ImageSets/ └── LOC_synset_mapping.txt
-
COCO (2017):
COCO/ └── 2017/ ├── annotations/ ├── train2017/ └── val2017/
Usage
Detailed usage instructions for data loading and processing.
Data Loader Configuration
data_root_dir
: Root directory of the datasetmapping_to
: Label mapping scheme (default: "coco")save_dir
: Output directory for processed data
Synthetic Data Generator Configuration
initial_number_of_frame
: Starting frame count per videomax_number_frame
: Maximum frame count per videonumber_video_per_set_of_frame
: Videos to generate per frame setincrease_rate
: Frame count increment rateltl_logic
: Temporal Logic specification (e.g., "F prop1", "G prop1")save_images
: Boolean flag for saving individual frames
Data Generation
COCO Synthetic Data Generation
python3 run_scripts/run_synthetic_tlv_coco.py --data_root_dir "../COCO/2017" --save_dir "<output_dir>"
ImageNet Synthetic Data Generation
python3 run_synthetic_tlv_imagenet.py --data_root_dir "../ILSVRC" --save_dir "<output_dir>"
Note: ImageNet generator does not support '&' LTL logic formulae.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Connect with Me
<p align="center"> <em>Feel free to connect with me through these professional channels:</em> <p align="center"> <a href="https://www.linkedin.com/in/mchoi07/" target="_blank"><img src="https://img.shields.io/badge/-LinkedIn-0077B5?style=flat-square&logo=Linkedin&logoColor=white" alt="LinkedIn"/></a> <a href="mailto:minkyu.choi@utexas.edu"><img src="https://img.shields.io/badge/-Email-D14836?style=flat-square&logo=Gmail&logoColor=white" alt="Email"/></a> <a href="https://scholar.google.com/citations?user=ai4daB8AAAAJ&hl" target="_blank"><img src="https://img.shields.io/badge/-Google%20Scholar-4285F4?style=flat-square&logo=google-scholar&logoColor=white" alt="Google Scholar"/></a> <a href="https://minkyuchoi-07.github.io" target="_blank"><img src="https://img.shields.io/badge/-Website-00C7B7?style=flat-square&logo=Internet-Explorer&logoColor=white" alt="Website"/></a> <a href="https://x.com/MinkyuChoi7" target="_blank"><img src="https://img.shields.io/badge/-Twitter-1DA1F2?style=flat-square&logo=Twitter&logoColor=white" alt="Twitter"/></a> </p>Citation
If you find this repo useful, please cite our paper:
@inproceedings{Choi_2024_ECCV,
author={Choi, Minkyu and Goel, Harsh and Omama, Mohammad and Yang, Yunhao and Shah, Sahil and Chinchali, Sandeep},
title={Towards Neuro-Symbolic Video Understanding},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
month={September},
year={2024}
}