Home

Awesome

VQ-BeT: Behavior Generation with Latent Actions

Official implementation of VQ-BeT: Behavior Generation with Latent Actions.

project website: https://sjlee.cc/vq-bet

<img src="https://github.com/jayLEE0301/vq_bet/assets/30570922/da0654cf-a15a-4ea3-9f90-c389f06e8796">

Installation

Usage

Step 0: Download dataset and set dataset path / saving path

Step 1: pretrain vq-vae

Step 2: train and evaluate vq-bet

Training visual observation envs:

In this repo, we provide pre-processed embedding vectors with ResNet18 for the PushT and Kitchen environments. To train VQ-BeT with visual observation, set visual_input: true in ./examples/train_[env name].yaml. Please not that using freezed embedding could show lower performance compared to fine-tuning ResNet18 while it is much faster (We will release additional modules for fine-tuning ResNet with VQ-BeT soon).

(Optional) quick start: evaluating VQ-BeT with pretrained weights (on goal-cond Kitchen env)

If you want to quickly see the performance of VQ-BeT on goal-cond Kitchen env without training it from scratch, please check the description below.

How can I train VQ-BeT using my own Env?

NOTE: You should make your own ./examples/configs/train_[env name].yaml and ./examples/configs/pretrain_[env name].yaml

Tips for hyperparameter tuning on you own env.

During Residual VQ pretraining, the hyperparameters to be determined (in order of importance, with the most important at the top):

  1. action_window_size:

    • 1 (single-step prediction): Generally sufficient for most environments.

    • 3~5 (multi-step prediction): Can be helpful in environments where action correlation, such as in PushT, is important.

  2. encoder_loss_multiplier: Adjust this value when the action scale is not between -1 and 1. For example, if the action scale is -100 to 100, a value of 0.01 could be used. If action data is normalized, the default value can be used without adjustment.

  3. vqvae_n_embed: (10~16 or more) This represents the total possible number of modes, calculated as vqvae_n_embed^vqvae_groups. VQ-BeT has robust performance to the size of the dictionary if it is enough to capture the major modes in the dataset (it depends on the tasks, but usually >= 10). Please refer to <em>Section B.1.</em> in the manuscript to see the performance of VQ-BeT with various size of Residual VQ dictionary.

Hyperparameters to be determined during the VQ-BeT training (in order of importance, with the most important at the top):

  1. window_size: 10 ~ 100: While 10 is suitable in most cases, consider increasing it if a longer observation history is deemed beneficial.

  2. offset_loss_multiplier: If the action scale is around -1 to 1, the most common value of offset_loss_multiplier is 100 (default). Adjust this value if the action scale is not between -1 and 1. For example, if the action scale is -100 to 100, a value of 1 could be used.

  3. secondary_code_multiplier: The default value is 0.5. Experimenting with values between 0.5 and 3 is recommended. A larger value emphasizes predictions for the secondary code more than offset predictions.

Common errors and solutions


Our code sourced and modified from miniBeT implementation for conditional and unconditional behavior transformer Algorithm. Also, we utilizes residual VQ-VAE codes from Vector Quantization - Pytorch repo, PushT env from Diffusion Policy, Ant env base from DHRL and UR3 env from here.