Awesome

Language-Integrated Value Iteration

Code for How Can LLM Guide RL? A Value-Based Approach.

Authors: Shenao Zhang*, Sirui Zheng*, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang (* indicates equal contribution)

ALFWorld

Environment setup

Clone the repository:

git clone https://github.com/agentification/Language-Integrated-VI.git
cd Language-Integrated-VI/alfworld

Create a virtual environment and install the required packages:

pip install -r requirements.txt

Install the ALFWorld environment. Please refer to https://github.com/alfworld/alfworld.
Set OPENAI_API_KEY environment variable to your OpenAI API key:

export OPENAI_API_KEY=<your key>

Run the code

./run.sh

InterCode

Steps to run our algorithm in the InterCode environment.

Environment setup

Clone the repository, create a virtual environment, and install necessary dependencies:

git clone https://github.com/agentification/Language-Integrated-VI.git
cd Language-Integrated-VI/intercode
conda env create -f environment.yml
conda activate intercode

Run setup.sh to create the docker images for the InterCode Bash, SQL, and CTF environments.
Set OPENAI_API_KEY environment variable to your OpenAI API key:

export OPENAI_API_KEY=<your key>

Run the code

For InterCode-SQL, run

./scripts/expr_slinvit_sql.sh

For InterCode-Bash, run

./scripts/expr_slinvit_bash.sh

BlocksWorld

Environment setup

Our experiments are conducted with Vicuna-13B/33B (v1.3). The required packages can be installed by
```
pip install -r requirements.txt
```

Run the code

To run the RAP experiments, here is a shell script of the script

CUDA_VISIBLE_DEVICES=0,1,2 nohup python -m torch.distributed.run --master_port 1034 --nproc_per_node 1 run_mcts.py --task mcts --model_name Vicuna --verbose False --data data/blocksworld/step_6.json --max_depth 6 --name m6ct_roll60 --rollouts 60 --model_path lmsys/vicuna-33b-v1.3 --num_gpus 3

To run the SLINVIT experiments, here is a shell script example

CUDA_VISIBLE_DEVICES=3,4,5 nohup python -m torch.distributed.run --master_port 39855 --nproc_per_node 1 run.py \
--model_name Vicuna \
--name planning_step6_13b \
--data data/blocksworld/step_6.json \
--horizon 6 \
--search_depth 5 \
--alpha 0 \
--sample_per_node 2 \
--model_path lmsys/vicuna-13b-v1.3 \
--num_gpus 3 \
--use_lang_goal

Citation

@article{zhang2024can,
  title={How Can LLM Guide RL? A Value-Based Approach},
  author={Zhang, Shenao and Zheng, Sirui and Ke, Shuqi and Liu, Zhihan and Jin, Wanxin and Yuan, Jianbo and Yang, Yingxiang and Yang, Hongxia and Wang, Zhaoran},
  journal={arXiv preprint arXiv:2402.16181},
  year={2024}
}