Skip to content

[TensorFlow2] Critic Loss Calculation for actor_critic #41

Open
@srihari-humbarwadi

Description

@srihari-humbarwadi

If I understand correctly, the code in tensorflow2/actor_critic.py implements the One-step Actor-Critic (episodic) algorithm given on page 332 of RLbook2020 by Sutton/barto (picture given below).

image

Here we can see that the critic parameters w are updated only using the gradient of the value function for the current state S
which is represented as grad(V(S, w)) in the pseudocode shown above. The update skips the gradient of the value function for the next state S'. This can again be seen in the pseudocode above, there is no grad(V(S', w)) present in the update rule for critic parameters w.

In the code given below, including state_value_, _ = self.actor_critic(state_) (L43) inside the GradientTape would result in grad(V(S', w)) appearing in the update for w, which contradicts the pseudocode shown above.

reward = tf.convert_to_tensor(reward, dtype=tf.float32) # not fed to NN
with tf.GradientTape(persistent=True) as tape:
state_value, probs = self.actor_critic(state)
state_value_, _ = self.actor_critic(state_)
state_value = tf.squeeze(state_value)
state_value_ = tf.squeeze(state_value_)

Please let me know if there are some gaps in my understanding!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions