A discussion about Perception reward

Dear author(s),

Thank you for your invaluable work on this paper. While studying your design of the "Perception reward," I came across an interesting detail that I would like to discuss further.

During the Reinforcement Learning (RL) training phase, how did your prompt the Vision-Language Model (VLM) to include the predicted bounding box (i.e., in the box: [...] format) in its generation? I examined your dataset and noticed that this information does not seem to be included in the Gemini's Chain-of-Thought (CoT) data. Therefore, I'm speculating that you might be using a specific prompting technique to achieve this.

I look forward to your insights. Thank you again for your time and contribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A discussion about Perception reward #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A discussion about Perception reward #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions