Home

Awesome

muzero-pytorch

Pytorch Implementation of MuZero : "Mastering Atari , Go, Chess and Shogi by Planning with a Learned Model" based on pseudo-code provided by the authors

Note: This implementation has just been tested on CartPole-v1 and would required modifications(in config folder) for other environments

Installation

Usage:

Required ArgumentsDescription
--envName of the environment
--case {atari,classic_control,box2d}It's used for switching between different domains(default: None)
--opr {train,test}select the operation to be performed
Optional ArgumentsDescription
--value_loss_coeffScale for value loss (default: None)
--revisit_policy_search_rateRate at which target policy is re-estimated (default:None)( only valid if --use_target_model is enabled)
--use_priorityUses priority for data sampling in replay buffer. Also, priority for new data is calculated based on loss (default: False)
--use_max_priorityForces max priority assignment for new incoming data in replay buffer (only valid if --use_priority is enabled) (default: False)
--use_target_modelUse target model for bootstrap value estimation (default: False)
--result_dirDirectory Path to store results (defaut: current working directory)
--no_cudano cuda usage (default: False)
--no_mpsno mps (Metal Performance Shaders) usage (default: False)
--debugIf enables, logs additional values (default:False)
--renderRenders the environment (default: False)
--forceOverrides past results (default: False)
--seedseed (default: 0)
--num_actorsNumber of actors running concurrently (default: 32)
--test_episodesEvaluation episode count (default: 10)
--use_wandbLogs console and tensorboard data on wandb (default: False)

Note: default: None => Values are loaded from the corresponding config

Training

CartPole-v1