Home

Awesome

RLCard: A Toolkit for Reinforcement Learning in Card Games

<img width="500" src="https://dczha.com/files/rlcard/logo.jpg" alt="Logo" />

Testing PyPI version Coverage Status Downloads Downloads License: MIT

中文文档

RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces for implementing various reinforcement learning and searching algorithms. The goal of RLCard is to bridge reinforcement learning and imperfect information games. RLCard is developed by DATA Lab at Rice and Texas A&M University, and community contributors.

Community:

News:

Contributors

The following games are mainly developed and maintained by community contributors. Thank you!

Thank all the contributors!

<a href="https://github.com/daochenzha"><img src="https://github.com/daochenzha.png" width="40px" alt="daochenzha" /></a>   <a href="https://github.com/hsywhu"><img src="https://github.com/hsywhu.png" width="40px" alt="hsywhu" /></a>   <a href="https://github.com/CaoYuanpu"><img src="https://github.com/CaoYuanpu.png" width="40px" alt="CaoYuanpu" /></a>   <a href="https://github.com/billh0420"><img src="https://github.com/billh0420.png" width="40px" alt="billh0420" /></a>   <a href="https://github.com/ruzhwei"><img src="https://github.com/ruzhwei.png" width="40px" alt="ruzhwei" /></a>   <a href="https://github.com/adrianpgob"><img src="https://github.com/adrianpgob.png" width="40px" alt="adrianpgob" /></a>   <a href="https://github.com/Zhigal"><img src="https://github.com/Zhigal.png" width="40px" alt="Zhigal" /></a>   <a href="https://github.com/aypee19"><img src="https://github.com/aypee19.png" width="40px" alt="aypee19" /></a>   <a href="https://github.com/Clarit7"><img src="https://github.com/Clarit7.png" width="40px" alt="Clarit7" /></a>   <a href="https://github.com/lhenry15"><img src="https://github.com/lhenry15.png" width="40px" alt="lhenry15" /></a>   <a href="https://github.com/ismael-elatifi"><img src="https://github.com/ismael-elatifi.png" width="40px" alt="ismael-elatifi" /></a>   <a href="https://github.com/mjudell"><img src="https://github.com/mjudell.png" width="40px" alt="mjudell" /></a>   <a href="https://github.com/jkterry1"><img src="https://github.com/jkterry1.png" width="40px" alt="jkterry1" /></a>   <a href="https://github.com/kaanozdogru"><img src="https://github.com/kaanozdogru.png" width="40px" alt="kaanozdogru" /></a>   <a href="https://github.com/junyuGuo"><img src="https://github.com/junyuGuo.png" width="40px" alt="junyuGuo" /></a>   <br /> <a href="https://github.com/Xixo99"><img src="https://github.com/Xixo99.png" width="40px" alt="Xixo99" /></a>   <a href="https://github.com/rodrigodelazcano"><img src="https://github.com/rodrigodelazcano.png" width="40px" alt="rodrigodelazcano" /></a>   <a href="https://github.com/Michael1015198808"><img src="https://github.com/Michael1015198808.png" width="40px" alt="Michael1015198808" /></a>   <a href="https://github.com/mia1996"><img src="https://github.com/mia1996.png" width="40px" alt="mia1996" /></a>   <a href="https://github.com/kaiks"><img src="https://github.com/kaiks.png" width="40px" alt="kaiks" /></a>   <a href="https://github.com/claude9493"><img src="https://github.com/claude9493.png" width="40px" alt="claude9493" /></a>   <a href="https://github.com/SonSang"><img src="https://github.com/SonSang.png" width="40px" alt="SonSang" /></a>   <a href="https://github.com/rishabhvarshney14"><img src="https://github.com/rishabhvarshney14.png" width="40px" alt="rishabhvarshney14" /></a>   <a href="https://github.com/aetheryang"><img src="https://github.com/aetheryang.png" width="40px" alt="aetheryang" /></a>   <a href="https://github.com/rxng8"><img src="https://github.com/rxng8.png" width="40px" alt="rxng8" /></a>   <a href="https://github.com/nondecidibile"><img src="https://github.com/nondecidibile.png" width="40px" alt="nondecidibile" /></a>   <a href="https://github.com/benblack769"><img src="https://github.com/benblack769.png" width="40px" alt="benblack769" /></a>   <a href="https://github.com/zhengsx"><img src="https://github.com/zhengsx.png" width="40px" alt="zhengsx" /></a>   <a href="https://github.com/andrewnc"><img src="https://github.com/andrewnc.png" width="40px" alt="andrewnc" /></a>  

Cite this work

If you find this repo useful, you may cite:

Zha, Daochen, et al. "RLCard: A Platform for Reinforcement Learning in Card Games." IJCAI. 2020.

@inproceedings{zha2020rlcard,
  title={RLCard: A Platform for Reinforcement Learning in Card Games},
  author={Zha, Daochen and Lai, Kwei-Herng and Huang, Songyi and Cao, Yuanpu and Reddy, Keerthana and Vargas, Juan and Nguyen, Alex and Wei, Ruzhe and Guo, Junyu and Hu, Xia},
  booktitle={IJCAI},
  year={2020}
}

Installation

Make sure that you have Python 3.6+ and pip installed. We recommend installing the stable version of rlcard with pip:

pip3 install rlcard

The default installation will only include the card environments. To use PyTorch implementation of the training algorithms, run

pip3 install rlcard[torch]

If you are in China and the above command is too slow, you can use the mirror provided by Tsinghua University:

pip3 install rlcard -i https://pypi.tuna.tsinghua.edu.cn/simple

Alternatively, you can clone the latest version with (if you are in China and Github is slow, you can use the mirror in Gitee):

git clone https://github.com/datamllab/rlcard.git

or only clone one branch to make it faster:

git clone -b master --single-branch --depth=1 https://github.com/datamllab/rlcard.git

Then install with

cd rlcard
pip3 install -e .
pip3 install -e .[torch]

We also provide conda installation method:

conda install -c toubun rlcard

Conda installation only provides the card environments, you need to manually install Pytorch on your demands.

Examples

A short example is as below.

import rlcard
from rlcard.agents import RandomAgent

env = rlcard.make('blackjack')
env.set_agents([RandomAgent(num_actions=env.num_actions)])

print(env.num_actions) # 2
print(env.num_players) # 1
print(env.state_shape) # [[2]]
print(env.action_shape) # [None]

trajectories, payoffs = env.run()

RLCard can be flexibly connected to various algorithms. See the following examples:

Demo

Run examples/human/leduc_holdem_human.py to play with the pre-trained Leduc Hold'em model. Leduc Hold'em is a simplified version of Texas Hold'em. Rules can be found here.

>> Leduc Hold'em pre-trained model

>> Start a new game!
>> Agent 1 chooses raise

=============== Community Card ===============
┌─────────┐
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
└─────────┘
===============   Your Hand    ===============
┌─────────┐
│J        │
│         │
│         │
│    ♥    │
│         │
│         │
│        J│
└─────────┘
===============     Chips      ===============
Yours:   +
Agent 1: +++
=========== Actions You Can Choose ===========
0: call, 1: raise, 2: fold

>> You choose action (integer):

We also provide a GUI for easy debugging. Please check here. Some demos:

doudizhu-replay leduc-replay

Available Environments

We provide a complexity estimation for the games on several aspects. InfoSet Number: the number of information sets; InfoSet Size: the average number of states in a single information set; Action Size: the size of the action space. Name: the name that should be passed to rlcard.make to create the game environment. We also provide the link to the documentation and the random example.

GameInfoSet NumberInfoSet SizeAction SizeNameUsage
Blackjack (wiki, baike)10^310^110^0blackjackdoc, example
Leduc Hold’em (paper)10^210^210^0leduc-holdemdoc, example
Limit Texas Hold'em (wiki, baike)10^1410^310^0limit-holdemdoc, example
Dou Dizhu (wiki, baike)10^53 ~ 10^8310^2310^4doudizhudoc, example
Mahjong (wiki, baike)10^12110^4810^2mahjongdoc, example
No-limit Texas Hold'em (wiki, baike)10^16210^310^4no-limit-holdemdoc, example
UNO (wiki, baike)10^16310^1010^1unodoc, example
Gin Rummy (wiki, baike)10^52--gin-rummydoc, example
Bridge (wiki, baike)--bridgedoc, example

Supported Algorithms

Algorithmexamplereference
Deep Monte-Carlo (DMC)examples/run_dmc.py[paper]
Deep Q-Learning (DQN)examples/run_rl.py[paper]
Neural Fictitious Self-Play (NFSP)examples/run_rl.py[paper]
Counterfactual Regret Minimization (CFR)examples/run_cfr.py[paper]

Pre-trained and Rule-based Models

We provide a model zoo to serve as the baselines.

ModelExplanation
leduc-holdem-cfrPre-trained CFR (chance sampling) model on Leduc Hold'em
leduc-holdem-rule-v1Rule-based model for Leduc Hold'em, v1
leduc-holdem-rule-v2Rule-based model for Leduc Hold'em, v2
uno-rule-v1Rule-based model for UNO, v1
limit-holdem-rule-v1Rule-based model for Limit Texas Hold'em, v1
doudizhu-rule-v1Rule-based model for Dou Dizhu, v1
gin-rummy-novice-ruleGin Rummy novice rule model

API Cheat Sheet

How to create an environment

You can use the the following interface to make an environment. You may optionally specify some configurations with a dictionary.

Once the environemnt is made, we can access some information of the game.

What is state in RLCard

State is a Python dictionary. It consists of observation state['obs'], legal actions state['legal_actions'], raw observation state['raw_obs'] and raw legal actions state['raw_legal_actions'].

Basic interfaces

The following interfaces provide a basic usage. It is easy to use but it has assumtions on the agent. The agent must follow agent template.

Advanced interfaces

For advanced usage, the following interfaces allow flexible operations on the game tree. These interfaces do not make any assumtions on the agent.

Library Structure

The purposes of the main modules are listed as below:

More Documents

For more documentation, please refer to the Documents for general introductions. API documents are available at our website.

Contributing

Contribution to this project is greatly appreciated! Please create an issue for feedbacks/bugs. If you want to contribute codes, please refer to Contributing Guide. If you have any questions, please contact Daochen Zha with daochen.zha@rice.edu.

Acknowledgements

We would like to thank JJ World Network Technology Co.,LTD for the generous support and all the contributions from the community contributors.