@@ -7,14 +7,47 @@ for now and under heavy development.
7
7
8
8
Currently the only supported algorithm is Proximy Policy Optimization - PPO.
9
9
10
- ## Sample usage - training in the Pendulum-v0 environment.
11
-
12
- ``` python rl/t2t_rl_trainer.py --problems=Pendulum-v0 --hparams_set continuous_action_base [--output_dir dir_location] ```
13
-
14
- ## Sample usage - training in the PongNoFrameskip-v0 environment.
15
-
16
- ``` python tensor2tensor/rl/t2t_rl_trainer.py --problem stacked_pong --hparams_set atari_base --hparams num_agents=5 [--output_dir dir_location] ```
17
-
18
- ## Sample usage - generation of trajectories data
19
-
20
- ``` python tensor2tensor/bin/t2t-datagen --data_dir=~/t2t_data --tmp_dir=~/t2t_data/tmp --problem=gym_pong_trajectories_from_policy --model_path [model] ```
10
+ # Sample usages
11
+
12
+ ## Training agent in the Pendulum-v0 environment.
13
+
14
+ ```
15
+ python rl/t2t_rl_trainer.py \
16
+ --problems=Pendulum-v0 \
17
+ --hparams_set continuous_action_base \
18
+ [--output_dir dir_location]
19
+ ```
20
+
21
+ ## Training agent in the PongNoFrameskip-v0 environment.
22
+
23
+ ```
24
+ python tensor2tensor/rl/t2t_rl_trainer.py \
25
+ --problem stacked_pong \
26
+ --hparams_set atari_base \
27
+ --hparams num_agents=5 \
28
+ [--output_dir dir_location]
29
+ ```
30
+
31
+ ## Generation of trajectories data
32
+
33
+ ```
34
+ python tensor2tensor/bin/t2t-datagen \
35
+ --data_dir=~/t2t_data \
36
+ --tmp_dir=~/t2t_data/tmp \
37
+ --problem=gym_pong_trajectories_from_policy \
38
+ --model_path [model]
39
+ ```
40
+
41
+ ## Training model for frames generation based on randomly played games
42
+
43
+ ```
44
+ python tensor2tensor/bin/t2t-trainer \
45
+ --generate_data \
46
+ --data_dir=~/t2t_data \
47
+ --output_dir=~/t2t_data/output \
48
+ --problems=gym_pong_random5k \
49
+ --model=basic_conv_gen \
50
+ --hparams_set=basic_conv_small \
51
+ --train_steps=1000 \
52
+ --eval_steps=10
53
+ ```
0 commit comments