From 100bc9ee6d2f7b793951fced9a3c7df8f6f0ec81 Mon Sep 17 00:00:00 2001 From: Vincent Moens Date: Mon, 2 Sep 2024 14:17:55 +0100 Subject: [PATCH 1/3] Update [ghstack-poisoned] --- README.md | 284 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 271 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 64559f7af37..c3db6eec5a4 100644 --- a/README.md +++ b/README.md @@ -523,19 +523,277 @@ If you would like to contribute to new features, check our [call for contributio ## Examples, tutorials and demos A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are provided with an illustrative purpose: -- [DQN](https://github.com/pytorch/rl/blob/main/sota-implementations/dqn) -- [DDPG](https://github.com/pytorch/rl/blob/main/sota-implementations/ddpg/ddpg.py) -- [IQL](https://github.com/pytorch/rl/blob/main/sota-implementations/iql/iql_offline.py) -- [CQL](https://github.com/pytorch/rl/blob/main/sota-implementations/cql/cql_offline.py) -- [TD3](https://github.com/pytorch/rl/blob/main/sota-implementations/td3/td3.py) -- [TD3+BC](https://github.com/pytorch/rl/blob/main/sota-implementations/td3+bc/td3+bc.py) -- [A2C](https://github.com/pytorch/rl/blob/main/examples/a2c_old/a2c.py) -- [PPO](https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/ppo.py) -- [SAC](https://github.com/pytorch/rl/blob/main/sota-implementations/sac/sac.py) -- [REDQ](https://github.com/pytorch/rl/blob/main/sota-implementations/redq/redq.py) -- [Dreamer](https://github.com/pytorch/rl/blob/main/sota-implementations/dreamer/dreamer.py) -- [Decision Transformers](https://github.com/pytorch/rl/blob/main/sota-implementations/decision_transformer) -- [RLHF](https://github.com/pytorch/rl/blob/main/examples/rlhf) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Algorithm + Compile Support + Tensordict-free API + Modular Losses + Continuous and Discrete +
DQN + + + + + NA + + (through ActionDiscretizer transform) +
DDPG + + + + + + + - (continuous only) +
IQL + + + + + + + + +
CQL + + + + + + + + +
TD3 + + + + + + + - (continuous only) +
+ TD3+BC + untested + + + + + - (continuous only) +
+ A2C + + + + + - + + +
+ PPO + + + + + - + + +
SAC + + + + + - + + +
REDQ + + + + + - + - (continuous only) +
Dreamer v1 + untested + + + + (different classes) + - (continuous only) +
Decision Transformers + untested + + + NA + - (continuous only) +
CrossQ + untested + + + + + - (continuous only) +
Gail + untested + + + NA + + +
Impala + untested + + + - + + +
IQL (MARL) + untested + + + + + + +
DDPG (MARL) + untested + + + + + - (continuous only) +
PPO (MARL) + untested + + + - + + +
QMIX-VDN (MARL) + untested + + + NA + + +
SAC (MARL) + untested + + + - + + +
RLHF + NA + + + NA + NA +
+ and many more to come! From c527bbe72867669e720dd576dd4267d34a41a83f Mon Sep 17 00:00:00 2001 From: Vincent Moens Date: Mon, 2 Sep 2024 16:02:05 +0100 Subject: [PATCH 2/3] Update [ghstack-poisoned] --- README.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index c3db6eec5a4..e663a376270 100644 --- a/README.md +++ b/README.md @@ -528,7 +528,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr Algorithm - Compile Support + Compile Support** Tensordict-free API @@ -540,7 +540,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr DQN - + + 1.53x + @@ -552,7 +552,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr DDPG - + + 1.54x + @@ -564,7 +564,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr IQL - + + 2.55x + @@ -576,7 +576,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr CQL - + + 1.91x + @@ -588,7 +588,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr TD3 - + + 1.79x + @@ -614,7 +614,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr A2C - + + 1.76x + @@ -627,7 +627,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr PPO - + + 2.67x + @@ -639,7 +639,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr SAC - + + 2.01x + @@ -651,7 +651,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr REDQ - + + 2.35x + @@ -794,6 +794,8 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr +** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on + architecture and device and many more to come! From 8820706de674fce73955df2e747ba61a8de9239d Mon Sep 17 00:00:00 2001 From: Vincent Moens Date: Tue, 17 Sep 2024 11:52:31 -0700 Subject: [PATCH 3/3] Update [ghstack-poisoned] --- README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 8234d21bc2f..8e9ea840d39 100644 --- a/README.md +++ b/README.md @@ -540,7 +540,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr DQN - 1.53x + 1.9x + @@ -552,7 +552,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr DDPG - 1.54x + 1.87x + @@ -564,7 +564,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr IQL - 2.55x + 3.22x + @@ -576,7 +576,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr CQL - 1.91x + 2.68x + @@ -588,7 +588,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr TD3 - 1.79x + 2.27x + @@ -614,7 +614,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr A2C - 1.76x + 2.67x + @@ -627,7 +627,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr PPO - 2.67x + 2.42x + @@ -639,7 +639,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr SAC - 2.01x + 2.62x + @@ -651,7 +651,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr REDQ - 2.35x + 2.28x + @@ -795,7 +795,7 @@ A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are pr ** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on - architecture and device + architecture and device. and many more to come!