-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Dear author(s),
Thank you for your invaluable work on this paper. While studying your design of the "Perception reward," I came across an interesting detail that I would like to discuss further.
During the Reinforcement Learning (RL) training phase, how did your prompt the Vision-Language Model (VLM) to include the predicted bounding box (i.e., in the box: [...] format) in its generation? I examined your dataset and noticed that this information does not seem to be included in the Gemini's Chain-of-Thought (CoT) data. Therefore, I'm speculating that you might be using a specific prompting technique to achieve this.
I look forward to your insights. Thank you again for your time and contribution.
Metadata
Metadata
Assignees
Labels
No labels