Awesome

Rememberer: Large Language Models Are Semi-Parametric Reinforcement Learning Agents

Code repository for RLEM (Reinforcement Learning with Experience Memory) agent, Rememberer. The corresponding paper is available at arXiv. Our paper is accepted by NeurIPS 2023.

Launch Test

launchw.sh is the launcher for the WebShop experiments. The corresponding main program is webshop.py. To launch the experiment, WebShop environment should be set up.

launch.sh is the launcher for the WikiHow experiments. The corresponding main program is wikihow.py. To launch the program, Mobile-Env environment should be set up. WikiHow task set v1.2 is used. Additionally, a tokenizer is required for VhIoWrapper wrapper, which can be downloaded from Hugging Face. The tokenizer of bert-base-uncased is ok.

To launch test with static exemplars, you may add --static option in the script.

To train a Rememberer agent, you may add --train option in the script. When launching training, you may want to shrink the size of test set for the program to prevent a complete evaluation each epoch.

The exemplars and prompt templates are stored under prompts and the initial history memories are stored under history-pools.

OpenAI API key is configed through openaiconfig.yaml

About Training Set

In this paper, two training sets are used for WebShop experiments:

S0: [500, 510)
S1: [510, 520)

These training sets are completely outside the test set of ReAct and this paper. You can simply use --trainseta 0 --trainsetb 10 or --trainseta 10 --trainsetb 20 to enable these two training sets. You can also try other training sets.

The training sets for WikiHow experiments are selected from the complementary set of the micro canonical set in the canonical set of WikiHow. They are

S0:
- add_a_contact_on_whatsapp-8
- avoid_misgendering-0
- become_a_grandmaster-7
- become_a_hooters_girl-8
- become_a_pro_footballP28soccerP29_manager-7
- become_a_specialist_physician-4
- be_cool_in_high_school_P28boysP29-0
- care_for_florida_white_rabbits-4
- fix_wet_suede_shoes-6
- get_zorua_in_pokPC3PA9mon_white-6
S1:
- be_free-0
- build_a_robot_car-8
- change_an_excel_sheet_from_read_only-4
- choose_a_swiss_army_knife-8
- color_streak_a_ponytail-0
- come_up_with_a_movie_idea-4
- contact_avast_customer_support-7
- drink_mezcal-7
- identify_hickory_nuts-6
- wear_a_dress_to_school-6

The selection simply keeps the balance of task categories and applies no other filtering.

Customized Codes for WebShop

As stated in the paper and the supplementary, the text_rich observation format of WebShop is further simplified in the certain way of ReAct. Besides, two typos of the closed tag in the HTML templates are corrected. The customized codes ared provided at zdy023/WebShop.

Citation

@article{DanyangZhang2023_Rememberer,
  author       = {Danyang Zhang and
                  Lu Chen and
                  Situo Zhang and
                  Hongshen Xu and
                  Zihan Zhao and
                  Kai Yu},
  title        = {Large Language Model Is Semi-Parametric Reinforcement Learning Agent},
  journal      = {CoRR},
  volume       = {abs/2306.07929},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2306.07929},
  doi          = {10.48550/arXiv.2306.07929},
  eprinttype    = {arXiv},
  eprint       = {2306.07929},
}