Skip to content

Conversation

QikeLi
Copy link

@QikeLi QikeLi commented Jul 5, 2018

The solution provided for the Policy Evaluation does not agree with the equation on page 8 of Dr. David Silvers' slides for lecture 3.

@amobiny
Copy link

amobiny commented Nov 30, 2018

What you are saying is correct, but Denny is implementing a more general case.
In fact, in David Silver slides, there's an assumption that taking an action, a, in state s will give a reward, R, no matter what the state transition is. In Denny's implementation, he takes into account that an action could result in different rewards based on what state the environment puts you in. Since this environment is deterministic, both implementation gives the same answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants