Home

Awesome

CORL (Clean Offline Reinforcement Learning)

Twitter arXiv <img src="https://img.shields.io/badge/license-Apache_2.0-blue"> Ruff

🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA ORL algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too!<br/>



Getting started

git clone https://github.com/tinkoff-ai/CORL.git && cd CORL
pip install -r requirements/requirements_dev.txt

# alternatively, you could use docker
docker build -t <image_name> .
docker run --gpus=all -it --rm --name <container_name> <image_name>

Algorithms Implemented

AlgorithmVariants ImplementedWandb Report
Offline and Offline-to-Online
Conservative Q-Learning for Offline Reinforcement Learning <br>(CQL)offline/cql.py <br /> finetune/cql.pyOffline <br /> <br /> Offline-to-online
Accelerating Online Reinforcement Learning with Offline Datasets <br>(AWAC)offline/awac.py <br /> finetune/awac.pyOffline <br /> <br /> Offline-to-online
Offline Reinforcement Learning with Implicit Q-Learning <br>(IQL)offline/iql.py <br /> finetune/iql.pyOffline <br /> <br /> Offline-to-online
Offline-to-Online only
Supported Policy Optimization for Offline Reinforcement Learning <br>(SPOT)finetune/spot.pyOffline-to-online
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning <br>(Cal-QL)finetune/cal_ql.pyOffline-to-online
Offline only
✅ Behavioral Cloning <br>(BC)offline/any_percent_bc.pyOffline
✅ Behavioral Cloning-10% <br>(BC-10%)offline/any_percent_bc.pyOffline
A Minimalist Approach to Offline Reinforcement Learning <br>(TD3+BC)offline/td3_bc.pyOffline
Decision Transformer: Reinforcement Learning via Sequence Modeling <br>(DT)offline/dt.pyOffline
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble <br>(SAC-N)offline/sac_n.pyOffline
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble <br>(EDAC)offline/edac.pyOffline
Revisiting the Minimalist Approach to Offline Reinforcement Learning <br>(ReBRAC)offline/rebrac.pyOffline
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size <br>(LB-SAC)offline/lb_sac.pyOffline Gym-MuJoCo

D4RL Benchmarks

You can check the links above for learning curves and details. Here, we report reproduced final and best scores. Note that they differ by a significant margin, and some papers may use different approaches, not making it always explicit which reporting methodology they chose. If you want to re-collect our results in a more structured/nuanced manner, see results.

Offline

Last Scores

Gym-MuJoCo
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
halfcheetah-medium-v242.40 ± 0.1942.46 ± 0.7048.10 ± 0.1849.46 ± 0.6247.04 ± 0.2248.31 ± 0.2264.04 ± 0.6868.20 ± 1.2867.70 ± 1.0442.20 ± 0.26
halfcheetah-medium-replay-v235.66 ± 2.3323.59 ± 6.9544.84 ± 0.5944.70 ± 0.6945.04 ± 0.2744.46 ± 0.2251.18 ± 0.3160.70 ± 1.0162.06 ± 1.1038.91 ± 0.50
halfcheetah-medium-expert-v255.95 ± 7.3590.10 ± 2.4590.78 ± 6.0493.62 ± 0.4195.63 ± 0.4294.74 ± 0.52103.80 ± 2.9598.96 ± 9.31104.76 ± 0.6491.55 ± 0.95
hopper-medium-v253.51 ± 1.7655.48 ± 7.3060.37 ± 3.4974.45 ± 9.1459.08 ± 3.7767.53 ± 3.78102.29 ± 0.1740.82 ± 9.91101.70 ± 0.2865.10 ± 1.61
hopper-medium-replay-v229.81 ± 2.0770.42 ± 8.6664.42 ± 21.5296.39 ± 5.2895.11 ± 5.2797.43 ± 6.3994.98 ± 6.53100.33 ± 0.7899.66 ± 0.8181.77 ± 6.87
hopper-medium-expert-v252.30 ± 4.01111.16 ± 1.03101.17 ± 9.0752.73 ± 37.4799.26 ± 10.91107.42 ± 7.80109.45 ± 2.34101.31 ± 11.63105.19 ± 10.08110.44 ± 0.33
walker2d-medium-v263.23 ± 16.2467.34 ± 5.1782.71 ± 4.7866.53 ± 26.0480.75 ± 3.2880.91 ± 3.1785.82 ± 0.7787.47 ± 0.6693.36 ± 1.3867.63 ± 2.54
walker2d-medium-replay-v221.80 ± 10.1554.35 ± 6.3485.62 ± 4.0182.20 ± 1.0573.09 ± 13.2282.15 ± 3.0384.25 ± 2.2578.99 ± 0.5087.10 ± 2.7859.86 ± 2.73
walker2d-medium-expert-v298.96 ± 15.98108.70 ± 0.25110.03 ± 0.3649.41 ± 38.16109.56 ± 0.39111.72 ± 0.86111.86 ± 0.43114.93 ± 0.41114.75 ± 0.74107.11 ± 0.96
locomotion average50.4069.2976.4567.7278.2881.6389.7483.5292.9273.84
Maze2d
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
maze2d-umaze-v10.36 ± 8.6912.18 ± 4.2929.41 ± 12.3182.67 ± 28.30-8.90 ± 6.1142.11 ± 0.58106.87 ± 22.16130.59 ± 16.5295.26 ± 6.3918.08 ± 25.42
maze2d-medium-v10.79 ± 3.2514.25 ± 2.3359.45 ± 36.2552.88 ± 55.1286.11 ± 9.6834.85 ± 2.72105.11 ± 31.6788.61 ± 18.7257.04 ± 3.4531.71 ± 26.33
maze2d-large-v12.26 ± 4.3911.32 ± 5.1097.10 ± 25.41209.13 ± 8.1923.75 ± 36.7061.72 ± 3.5078.33 ± 61.77204.76 ± 1.1995.60 ± 22.9235.66 ± 28.20
maze2d average1.1312.5861.99114.8933.6546.2396.77141.3282.6428.48
Antmaze
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
antmaze-umaze-v255.25 ± 4.1565.75 ± 5.2670.75 ± 39.1857.75 ± 10.2892.75 ± 1.9277.00 ± 5.5297.75 ± 1.480.00 ± 0.000.00 ± 0.0057.00 ± 9.82
antmaze-umaze-diverse-v247.25 ± 4.0944.00 ± 1.0044.75 ± 11.6158.00 ± 7.6837.25 ± 3.7054.25 ± 5.5483.50 ± 7.020.00 ± 0.000.00 ± 0.0051.75 ± 0.43
antmaze-medium-play-v20.00 ± 0.002.00 ± 0.710.25 ± 0.430.00 ± 0.0065.75 ± 11.6165.75 ± 11.7189.50 ± 3.350.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze-medium-diverse-v20.75 ± 0.835.75 ± 9.390.25 ± 0.430.00 ± 0.0067.25 ± 3.5673.75 ± 5.4583.50 ± 8.200.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze-large-play-v20.00 ± 0.000.00 ± 0.000.00 ± 0.000.00 ± 0.0020.75 ± 7.2642.00 ± 4.5352.25 ± 29.010.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze-large-diverse-v20.00 ± 0.000.75 ± 0.830.00 ± 0.000.00 ± 0.0020.50 ± 13.2430.25 ± 3.6364.00 ± 5.430.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze average17.2119.7119.3319.2950.7157.1778.420.000.0018.12
Adroit
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
pen-human-v171.03 ± 6.2626.99 ± 9.60-3.88 ± 0.2181.12 ± 13.4713.71 ± 16.9878.49 ± 8.21103.16 ± 8.496.86 ± 5.935.07 ± 6.1667.68 ± 5.48
pen-cloned-v151.92 ± 15.1546.67 ± 14.255.13 ± 5.2889.56 ± 15.571.04 ± 6.6283.42 ± 8.19102.79 ± 7.8431.35 ± 2.1412.02 ± 1.7564.43 ± 1.43
pen-expert-v1109.65 ± 7.28114.96 ± 2.96122.53 ± 21.27160.37 ± 1.21-1.41 ± 2.34128.05 ± 9.21152.16 ± 6.3387.11 ± 48.95-1.55 ± 0.81116.38 ± 1.27
door-human-v12.34 ± 4.00-0.13 ± 0.07-0.33 ± 0.014.60 ± 1.905.53 ± 1.313.26 ± 1.83-0.10 ± 0.01-0.38 ± 0.00-0.12 ± 0.134.44 ± 0.87
door-cloned-v1-0.09 ± 0.030.29 ± 0.59-0.34 ± 0.010.93 ± 1.66-0.33 ± 0.013.07 ± 1.750.06 ± 0.05-0.33 ± 0.002.66 ± 2.317.64 ± 3.26
door-expert-v1105.35 ± 0.09104.04 ± 1.46-0.33 ± 0.01104.85 ± 0.24-0.32 ± 0.02106.65 ± 0.25106.37 ± 0.29-0.33 ± 0.00106.29 ± 1.73104.87 ± 0.39
hammer-human-v13.03 ± 3.39-0.19 ± 0.021.02 ± 0.243.37 ± 1.930.14 ± 0.111.79 ± 0.800.24 ± 0.240.24 ± 0.000.28 ± 0.181.28 ± 0.15
hammer-cloned-v10.55 ± 0.160.12 ± 0.080.25 ± 0.010.21 ± 0.240.30 ± 0.011.50 ± 0.695.00 ± 3.750.14 ± 0.090.19 ± 0.071.82 ± 0.55
hammer-expert-v1126.78 ± 0.64121.75 ± 7.673.11 ± 0.03127.06 ± 0.290.26 ± 0.01128.68 ± 0.33133.62 ± 0.2725.13 ± 43.2528.52 ± 49.00117.45 ± 6.65
relocate-human-v10.04 ± 0.03-0.14 ± 0.08-0.29 ± 0.010.05 ± 0.030.06 ± 0.030.12 ± 0.040.16 ± 0.30-0.31 ± 0.01-0.17 ± 0.170.05 ± 0.01
relocate-cloned-v1-0.06 ± 0.01-0.00 ± 0.02-0.30 ± 0.01-0.04 ± 0.04-0.29 ± 0.010.04 ± 0.011.66 ± 2.59-0.01 ± 0.100.17 ± 0.350.16 ± 0.09
relocate-expert-v1107.58 ± 1.2097.90 ± 5.21-1.73 ± 0.96108.87 ± 0.85-0.30 ± 0.02106.11 ± 4.02107.52 ± 2.28-0.36 ± 0.0071.94 ± 18.37104.28 ± 0.42
adroit average48.1842.6910.4056.751.5353.4359.3912.4318.7849.21

Best Scores

Gym-MuJoCo
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
halfcheetah-medium-v243.60 ± 0.1443.90 ± 0.1348.93 ± 0.1150.06 ± 0.5047.62 ± 0.0348.84 ± 0.0765.62 ± 0.4672.21 ± 0.3169.72 ± 0.9242.73 ± 0.10
halfcheetah-medium-replay-v240.52 ± 0.1942.27 ± 0.4645.84 ± 0.2646.35 ± 0.2946.43 ± 0.1945.35 ± 0.0852.22 ± 0.3167.29 ± 0.3466.55 ± 1.0540.31 ± 0.28
halfcheetah-medium-expert-v279.69 ± 3.1094.11 ± 0.2296.59 ± 0.8796.11 ± 0.3797.04 ± 0.1795.38 ± 0.17108.89 ± 1.20111.73 ± 0.47110.62 ± 1.0493.40 ± 0.21
hopper-medium-v269.04 ± 2.9073.84 ± 0.3770.44 ± 1.1897.90 ± 0.5670.80 ± 1.9880.46 ± 3.09103.19 ± 0.16101.79 ± 0.20103.26 ± 0.1469.42 ± 3.64
hopper-medium-replay-v268.88 ± 10.3390.57 ± 2.0798.12 ± 1.16100.91 ± 1.50101.63 ± 0.55102.69 ± 0.96102.57 ± 0.45103.83 ± 0.53103.28 ± 0.4988.74 ± 3.02
hopper-medium-expert-v290.63 ± 10.98113.13 ± 0.16113.22 ± 0.43103.82 ± 12.81112.84 ± 0.66113.18 ± 0.38113.16 ± 0.43111.24 ± 0.15111.80 ± 0.11111.18 ± 0.21
walker2d-medium-v280.64 ± 0.9182.05 ± 0.9386.91 ± 0.2883.37 ± 2.8284.77 ± 0.2087.58 ± 0.4887.79 ± 0.1990.17 ± 0.5495.78 ± 1.0774.70 ± 0.56
walker2d-medium-replay-v248.41 ± 7.6176.09 ± 0.4091.17 ± 0.7286.51 ± 1.1589.39 ± 0.8889.94 ± 0.9391.11 ± 0.6385.18 ± 1.6389.69 ± 1.3968.22 ± 1.20
walker2d-medium-expert-v2109.95 ± 0.62109.90 ± 0.09112.21 ± 0.06108.28 ± 9.45111.63 ± 0.38113.06 ± 0.53112.49 ± 0.18116.93 ± 0.42116.52 ± 0.75108.71 ± 0.34
locomotion average70.1580.6584.8385.9284.6886.2893.0095.6096.3677.49
Maze2d
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
maze2d-umaze-v116.09 ± 0.8722.49 ± 1.5299.33 ± 16.16136.61 ± 11.6592.05 ± 13.6650.92 ± 4.23162.28 ± 1.79153.12 ± 6.49149.88 ± 1.9763.83 ± 17.35
maze2d-medium-v119.16 ± 1.2427.64 ± 1.87150.93 ± 3.89131.50 ± 25.38128.66 ± 5.44122.69 ± 30.00150.12 ± 4.4893.80 ± 14.66154.41 ± 1.5868.14 ± 12.25
maze2d-large-v120.75 ± 6.6641.83 ± 3.64197.64 ± 5.26227.93 ± 1.90157.51 ± 7.32162.25 ± 44.18197.55 ± 5.82207.51 ± 0.96182.52 ± 2.6850.25 ± 19.34
maze2d average18.6730.65149.30165.35126.07111.95169.98151.48162.2760.74
Antmaze
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
antmaze-umaze-v268.50 ± 2.2977.50 ± 1.5098.50 ± 0.8778.75 ± 6.7694.75 ± 0.8384.00 ± 4.06100.00 ± 0.000.00 ± 0.0042.50 ± 28.6164.50 ± 2.06
antmaze-umaze-diverse-v264.75 ± 4.3263.50 ± 2.1871.25 ± 5.7688.25 ± 2.1753.75 ± 2.0579.50 ± 3.3596.75 ± 2.280.00 ± 0.000.00 ± 0.0060.50 ± 2.29
antmaze-medium-play-v24.50 ± 1.126.25 ± 2.383.75 ± 1.3027.50 ± 9.3980.50 ± 3.3578.50 ± 3.8493.50 ± 2.600.00 ± 0.000.00 ± 0.000.75 ± 0.43
antmaze-medium-diverse-v24.75 ± 1.0916.50 ± 5.595.50 ± 1.5033.25 ± 16.8171.00 ± 4.5383.50 ± 1.8091.75 ± 2.050.00 ± 0.000.00 ± 0.000.50 ± 0.50
antmaze-large-play-v20.50 ± 0.5013.50 ± 9.761.25 ± 0.431.00 ± 0.7134.75 ± 5.8553.50 ± 2.5068.75 ± 13.900.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze-large-diverse-v20.75 ± 0.436.25 ± 1.790.25 ± 0.430.50 ± 0.5036.25 ± 3.3453.00 ± 3.0069.50 ± 7.260.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze average23.9630.5830.0838.2161.8372.0086.710.007.0821.04
Adroit
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
pen-human-v199.69 ± 7.4559.89 ± 8.039.95 ± 8.19121.05 ± 5.4758.91 ± 1.81106.15 ± 10.28127.28 ± 3.2256.48 ± 7.1735.84 ± 10.5777.83 ± 2.30
pen-cloned-v199.14 ± 12.2783.62 ± 11.7552.66 ± 6.33129.66 ± 1.2714.74 ± 2.31114.05 ± 4.78128.64 ± 7.1552.69 ± 5.3026.90 ± 7.8571.17 ± 2.70
pen-expert-v1128.77 ± 5.88134.36 ± 3.16142.83 ± 7.72162.69 ± 0.2314.86 ± 4.07140.01 ± 6.36157.62 ± 0.26116.43 ± 40.2636.04 ± 4.60119.49 ± 2.31
door-human-v19.41 ± 4.557.00 ± 6.77-0.11 ± 0.0619.28 ± 1.4613.28 ± 2.7713.52 ± 1.220.27 ± 0.43-0.10 ± 0.062.51 ± 2.267.36 ± 1.24
door-cloned-v13.40 ± 0.9510.37 ± 4.09-0.20 ± 0.1112.61 ± 0.60-0.08 ± 0.139.02 ± 1.477.73 ± 6.80-0.21 ± 0.1020.36 ± 1.1111.18 ± 0.96
door-expert-v1105.84 ± 0.23105.92 ± 0.244.49 ± 7.39106.77 ± 0.2459.47 ± 25.04107.29 ± 0.37106.78 ± 0.040.05 ± 0.02109.22 ± 0.24105.49 ± 0.09
hammer-human-v112.61 ± 4.876.23 ± 4.792.38 ± 0.1422.03 ± 8.130.30 ± 0.056.86 ± 2.381.18 ± 0.150.25 ± 0.003.49 ± 2.171.68 ± 0.11
hammer-cloned-v18.90 ± 4.048.72 ± 3.280.96 ± 0.3014.67 ± 1.940.32 ± 0.0311.63 ± 1.7048.16 ± 6.2012.67 ± 15.020.27 ± 0.012.74 ± 0.22
hammer-expert-v1127.89 ± 0.57128.15 ± 0.6633.31 ± 47.65129.66 ± 0.330.93 ± 1.12129.76 ± 0.37134.74 ± 0.3091.74 ± 47.7769.44 ± 47.00127.39 ± 0.10
relocate-human-v10.59 ± 0.270.16 ± 0.14-0.29 ± 0.012.09 ± 0.761.03 ± 0.201.22 ± 0.283.70 ± 2.34-0.18 ± 0.140.05 ± 0.020.08 ± 0.02
relocate-cloned-v10.45 ± 0.310.74 ± 0.45-0.02 ± 0.040.94 ± 0.68-0.07 ± 0.021.78 ± 0.709.25 ± 2.560.10 ± 0.044.11 ± 1.390.34 ± 0.09
relocate-expert-v1110.31 ± 0.36109.77 ± 0.600.23 ± 0.27111.56 ± 0.170.03 ± 0.10110.12 ± 0.82111.14 ± 0.23-0.07 ± 0.0898.32 ± 3.75106.49 ± 0.30
adroit average58.9254.5820.5169.4213.6562.6269.7127.4933.8852.60

Offline-to-Online

Scores

Task-NameAWACCQLIQLSPOTCal-QL
antmaze-umaze-v252.75 ± 8.67 → 98.75 ± 1.0994.00 ± 1.58 → 99.50 ± 0.8777.00 ± 0.71 → 96.50 ± 1.1291.00 ± 2.55 → 99.50 ± 0.5076.75 ± 7.53 → 99.75 ± 0.43
antmaze-umaze-diverse-v256.00 ± 2.74 → 0.00 ± 0.009.50 ± 9.91 → 99.00 ± 1.2259.50 ± 9.55 → 63.75 ± 25.0236.25 ± 2.17 → 95.00 ± 3.6732.00 ± 27.79 → 98.50 ± 1.12
antmaze-medium-play-v20.00 ± 0.00 → 0.00 ± 0.0059.00 ± 11.18 → 97.75 ± 1.3071.75 ± 2.95 → 89.75 ± 1.0967.25 ± 10.47 → 97.25 ± 1.3071.75 ± 3.27 → 98.75 ± 1.64
antmaze-medium-diverse-v20.00 ± 0.00 → 0.00 ± 0.0063.50 ± 6.84 → 97.25 ± 1.9264.25 ± 1.92 → 92.25 ± 2.8673.75 ± 7.29 → 94.50 ± 1.6662.00 ± 4.30 → 98.25 ± 1.48
antmaze-large-play-v20.00 ± 0.00 → 0.00 ± 0.0028.75 ± 7.76 → 88.25 ± 2.2838.50 ± 8.73 → 64.50 ± 17.0431.50 ± 12.58 → 87.00 ± 3.2431.75 ± 8.87 → 97.25 ± 1.79
antmaze-large-diverse-v20.00 ± 0.00 → 0.00 ± 0.0035.50 ± 3.64 → 91.75 ± 3.9626.75 ± 3.77 → 64.25 ± 4.1517.50 ± 7.26 → 81.00 ± 14.1444.00 ± 8.69 → 91.50 ± 3.91
antmaze average18.12 → 16.4648.38 → 95.5856.29 → 78.5052.88 → 92.3853.04 → 97.33
pen-cloned-v188.66 ± 15.10 → 86.82 ± 11.12-2.76 ± 0.08 → -1.28 ± 2.1684.19 ± 3.96 → 102.02 ± 20.756.19 ± 5.21 → 43.63 ± 20.09-2.66 ± 0.04 → -2.68 ± 0.12
door-cloned-v10.93 ± 1.66 → 0.01 ± 0.00-0.33 ± 0.01 → -0.33 ± 0.011.19 ± 0.93 → 20.34 ± 9.32-0.21 ± 0.14 → 0.02 ± 0.31-0.33 ± 0.01 → -0.33 ± 0.01
hammer-cloned-v11.80 ± 3.01 → 0.24 ± 0.040.56 ± 0.55 → 2.85 ± 4.811.35 ± 0.32 → 57.27 ± 28.493.97 ± 6.39 → 3.73 ± 4.990.25 ± 0.04 → 0.17 ± 0.17
relocate-cloned-v1-0.04 ± 0.04 → -0.04 ± 0.01-0.33 ± 0.01 → -0.33 ± 0.010.04 ± 0.04 → 0.32 ± 0.38-0.24 ± 0.01 → -0.15 ± 0.05-0.31 ± 0.05 → -0.31 ± 0.04
adroit average22.84 → 21.76-0.72 → 0.2221.69 → 44.992.43 → 11.81-0.76 → -0.79

Regrets

Task-NameAWACCQLIQLSPOTCal-QL
antmaze-umaze-v20.04 ± 0.010.02 ± 0.000.07 ± 0.000.02 ± 0.000.01 ± 0.00
antmaze-umaze-diverse-v20.88 ± 0.010.09 ± 0.010.43 ± 0.110.22 ± 0.070.05 ± 0.01
antmaze-medium-play-v21.00 ± 0.000.08 ± 0.010.09 ± 0.010.06 ± 0.000.04 ± 0.01
antmaze-medium-diverse-v21.00 ± 0.000.08 ± 0.000.10 ± 0.010.05 ± 0.010.04 ± 0.01
antmaze-large-play-v21.00 ± 0.000.21 ± 0.020.34 ± 0.050.29 ± 0.070.13 ± 0.02
antmaze-large-diverse-v21.00 ± 0.000.21 ± 0.030.41 ± 0.030.23 ± 0.080.13 ± 0.02
antmaze average0.820.110.240.150.07
pen-cloned-v10.46 ± 0.020.97 ± 0.000.37 ± 0.010.58 ± 0.020.98 ± 0.01
door-cloned-v11.00 ± 0.001.00 ± 0.000.83 ± 0.030.99 ± 0.011.00 ± 0.00
hammer-cloned-v11.00 ± 0.001.00 ± 0.000.65 ± 0.100.98 ± 0.011.00 ± 0.00
relocate-cloned-v11.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.00
adroit average0.860.990.710.890.99

Citing CORL

If you use CORL in your work, please use the following bibtex

@inproceedings{
tarasov2022corl,
  title={{CORL}: Research-oriented Deep Offline Reinforcement Learning Library},
  author={Denis Tarasov and Alexander Nikulin and Dmitry Akimov and Vladislav Kurenkov and Sergey Kolesnikov},
  booktitle={3rd Offline RL Workshop: Offline RL as a ''Launchpad''},
  year={2022},
  url={https://openreview.net/forum?id=SyAS49bBcv}
}