Replies: 1 comment 1 reply
-
如果策略在一个Episode里面没有使用是可以拿到外面的 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
想请问一下,这两个伪代码中Policy improvement这个步骤是不是应该放在第二个for循环的外面。
第二个for 循环遍历的是一个完整的episode,在循环内更新策略好像没什么用,是不是应该在遍历完一整个episode之后进行一次Policy improvement?
就像书里说的——”Then, the policy can be improved in an episode-by-episode fashion.“
Beta Was this translation helpful? Give feedback.
All reactions