Skip to content

naivoder/PPO

Repository files navigation

Proximal Policy Optimization (Continuous)

Overview

🚧 🛠️👷‍♀️ 🛑 Under construction...

Setup

Required Dependencies

Install the required dependencies using the following command:

pip install -r requirements.txt

Running the Algorithm

You can run the algorithm on any supported Gymnasium environment. For example:

python main.py --env 'LunarLanderContinuous-v2'

Notes: Reward scaling appears to work really well for some environments (BipedalWalker) but it might be limiting the upper bound of performance on some other environments. I've increased the number of episodes to 50k for the Mujoco environments, if that gives the agent enough time to learn I'll rerun on the Gymnasium ones. Examples in the paper train for millions of timesteps...

Pendulum-v1

MountainCarContinuous-v0

LunarLanderContinuous-v2

Pusher-v4

Reacher-v4

InvertedPendulum-v4

BipedalWalker-v3

InvertedDoublePendulum-v4

Walker2d-v4

Ant-v4

HalfCheetah-v4

Swimmer-v3

Acknowledgements

Special thanks to Phil Tabor, an excellent teacher! I highly recommend his Youtube channel.

About

Pytorch implementation of Proximal Policy Optimization (PPO) for continuous action spaces

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages