Home

Awesome

<!-- # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -->

TokenBench

Cosmos-Tokenizer Code | Technical Report

https://github.com/user-attachments/assets/72536cfc-5cb5-4b48-88fa-b06f3c8c4495

TokenBench is a comprehensive benchmark to standardize the evaluation for Cosmos-Tokenizer, which covers a wide variety of domains including robotic manipulation, driving, egocentric, and web videos. It consists of high-resolution, long-duration videos, and is designed to evaluate the performance of video tokenizers. We resort to existing video datasets that are commonly used for various tasks, including BDD100K, EgoExo-4D, BridgeData V2, and Panda-70M. This repo provides instructions on how to download and preprocess the videos for TokenBench.

Instructions to build TokenBench

  1. Download the datasets from the official websites:
  1. Pick the videos as specified in the video/list.txt file.
  2. Preprocess the videos using the script video/preprocessing_script.py.

Continuous video tokenizer leaderboard

TokenizerCompression Ratio (T x H x W)FormulationPSNRSSIMrFVD
CogVideoX4 × 8 × 8VAE33.1490.9086.970
OmniTokenizer4 × 8 × 8VAE29.7050.83035.867
Cosmos-CV4 × 8 × 8AE37.2700.9286.849
Cosmos-CV8 × 8 × 8AE36.8560.91711.624
Cosmos-CV8 × 16 × 16AE35.1580.87543.085

Discrete video tokenizer leaderboard

TokenizerCompression Ratio (T x H x W)QuantizationPSNRSSIMrFVD
VideoGPT4 × 4 × 4VQ35.1190.91413.855
OmniTokenizer4 × 8 × 8VQ30.1520.82753.553
Cosmos-DV4 × 8 × 8FSQ35.1370.88719.672
Cosmos-DV8 × 8 × 8FSQ34.7460.87243.865
Cosmos-DV8 × 16 × 16FSQ33.7180.828113.481

Core contributors

Fitsum Reda, Jinwei Gu, Xian Liu, Songwei Ge, Ting-Chun Wang, Haoxiang Wang, Ming-Yu Liu