From 90490d874aaa01f4376d14aeff26f3b1b47da242 Mon Sep 17 00:00:00 2001 From: Doug Fulop Date: Mon, 16 Jun 2025 22:43:26 -0700 Subject: [PATCH] Update training_jetbot_reward_exploration.rst tiny doc formatting issue Signed-off-by: Doug Fulop --- .../setup/walkthrough/training_jetbot_reward_exploration.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst b/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst index 25f39ee8c93..f35cc087fb5 100644 --- a/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst +++ b/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst @@ -20,7 +20,7 @@ the x axis, so we apply the ``root_link_quat_w`` to ``[1,0,0]`` to get the forwa observations = {"policy": obs} return observations - So now what should the reward be? +So now what should the reward be? When the robot is behaving as desired, it will be driving at full speed in the direction of the command. If we reward both "driving forward" and "alignment to the command", then maximizing that combined signal should result in driving to the command... right?