Clarification on DQN testing rewards on Atari games

I would like some help to clarify my confusion here:

1. It seems that we have agreed on (confirmed by the DQN nature paper) that during training, whenever a life is lost (even if the agent still has more lives left), we would send a terminal tag to the DQN (which terminates the summation of Q values to be just including the reward at last step for the state leading towards the life lost). Is this correct?
2. Also during training, after losing a life and sending the terminal tag, we would however still carry on with the agent using its remaining lives, rather than resetting the game. This is beneficial since the agent will go deeper into the game, and will see more advanced game states. Is this correct?
3. When evaluating the agent playing the game, how are the reward results computed as they present in DQN papers?
(3.1) Are they just summing up the rewards over a single life, or over all lives? They mentioned "episodic rewards" in testing, is the term "one episode" meaning just "one life"?
(3.2) If that's indeed the case (the rewards reported in testing are just summed over a single life), then during testing, after a life is lost, do they reset the environment, or let the agent to use remaining lives for obtaining more "episodic rewards", as separate trials of results to be averaged or compared for max?

Thanks a lot if anyone can confirm on this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on DQN testing rewards on Atari games #235

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clarification on DQN testing rewards on Atari games #235

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions