[TensorFlow2] Critic Loss Calculation for actor_critic

If I understand correctly, the code in [tensorflow2/actor_critic.py](https://github.com/philtabor/Youtube-Code-Repository/blob/master/ReinforcementLearning/PolicyGradient/actor_critic/tensorflow2/actor_critic.py) implements the `One-step Actor-Critic (episodic)` algorithm given on page 332 of RLbook2020 by Sutton/barto (picture given below).

![image](https://user-images.githubusercontent.com/24864163/151010488-98627635-31cc-406c-8664-fe0a8cac9350.png)

Here we can see that the critic parameters **w** are updated only using the gradient of the value function for the current state **S**
which is represented as `grad(V(S, w))` in the pseudocode shown above. The update skips the gradient of the value function for the next state **S'**. This can again be seen in the pseudocode above, **there is no**  `grad(V(S', w))` **present** in the update rule for critic parameters **w**.


In the code given below, including ` state_value_, _ = self.actor_critic(state_) ` (L43) inside the `GradientTape` would result in `grad(V(S', w))` appearing in the update for **w**, which contradicts the pseudocode shown above.

https://github.com/philtabor/Youtube-Code-Repository/blob/1ef76059bf55f7df9ccc09fce0e0bfb7c13e89bd/ReinforcementLearning/PolicyGradient/actor_critic/tensorflow2/actor_critic.py#L40-L45

Please let me know if there are some gaps in my understanding!

	reward = tf.convert_to_tensor(reward, dtype=tf.float32) # not fed to NN
	with tf.GradientTape(persistent=True) as tape:
	state_value, probs = self.actor_critic(state)
	state_value_, _ = self.actor_critic(state_)
	state_value = tf.squeeze(state_value)
	state_value_ = tf.squeeze(state_value_)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TensorFlow2] Critic Loss Calculation for actor_critic #41

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[TensorFlow2] Critic Loss Calculation for actor_critic #41

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions