Home

Awesome

Finite-State Autoregressive Entropy Coding for Efficient Learned Lossless Compression

This is the official Pytorch implementation of our ICLR paper:

Finite-State Autoregressive Entropy Coding for Efficient Learned Lossless Compression

<img src="imgs/main.png" width="800">

Finite-State Autoregressive Entropy Coding is a VAE-based compression method designed for better compression ratio and computational efficiency. It extends Asymmetric Numeral Systems (ANS) with a lookup-table-based autoregressive model, which efficiently performs autoregressive encoding/decoding that improves compression ratio, even without parallel computation. Besides, the Straight-Through Hardmax Quantization (STHQ) is proposed to enhance the optimization of discrete latent space in VAE.

Setup

Hardware Requirements

Software Requirements

The recommended environment setup script with conda:

conda create -n cbench python=3.7
conda activate cbench
conda install -c pytorch pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.2
# conda install -c pytorch pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3
# if gcc version < 7
conda install -c conda-forge gcc gxx
pip install -r requirements.txt
python setup.py build develop

(Optional) Environment setup according to your machine

See configs/env.py

(Optional) 3rdparty setup

git submodule update --init --recursive
cd 3rdparty/craystack
python setup.py build develop

Dataset Prepare

We use 5 datasets in our experiments:

Code Structure

Experiments

For any experiment you want to run (including Training/Validation/Testing, thanks to pytorch-lightning and our BasicLosslessCompressionBenchmark):

python tools/run_benchmark.py [config_file]

You can use tensorboard to visualize the training process.

tensorboard --logdir experiments

Experiment List

NOTE: If an GPU out-of-memory error occured, adjust batch_size_total in each config file. In our experiments, we use 8 A100 for most configs so the batch_size_total might be too large.

Model Implementation List

See configs/presets.

Our full model is tagged as "V2DVQ-c2-FSAR-O2S-catreduce1.5"

Pretrained Models

TBA

tools/run_benchmark.py can automatically look for config.pth in a given directory to build the benchmark. Therefore, to test a pretrained model, simply run:

python tools/run_benchmark.py [model_directory]

Citation

@inproceedings{
zhang2024finitestate,
title={Finite-State Autoregressive Entropy Coding for Efficient Learned Lossless Compression},
author={Yufeng Zhang and Hang Yu and Jianguo Li and Weiyao Lin},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=D5mJSNtUtv}
}

Contact

TBA