-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add Humanline #4261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Humanline #4261
Conversation
Added humanline logic for log ratio calculation and clipping.
Added a new callback to synchronize the model with a reference model during training.
Added HumanlineSyncRefModelCallback to support humanline synchronization.
added humanline for dpo and kto
|
Thanks @Muennighoff ! Do you have some datasets we can test this on? |
|
Thanks! For GRPO on math, we used just MATH500 (this repo: https://github.com/kawine/open-r1-humanline); Maybe @sijial430 can comment regarding KTO/DPO |
|
For GRPO on instruction following, we use princeton-nlp/llama3-ultrafeedback-armorm mainly (repo: https://github.com/ContextualAI/HALOs/tree/research). Thanks! |
|
To provide some more context, all our math experiments were done in HF's open-r1 repo and were done by subclassing GRPOTrainer. All the major changes are in this file, save for adding some new arguments to configs.py (prefixed with 'humanline') and adding some recipes to the recipes/ folder. The results, as discussed in the paper, are:
Our instruction-following experiments were done in the research branch of the HALOs repo.
|
| kl = (policy_KL_logps - reference_KL_logps).mean().detach() | ||
| if self.humanline: | ||
| policy_KL_logps.clamp_(min=self.humanline_log_eps_P, max=self.humanline_log_eps_R) | ||
| reference_KL_logps.clamp_(min=self.humanline_log_eps_P, max=self.humanline_log_eps_R) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sijial430 This is incorrect. The logp ratios should be clamped, not the raw logps.
|
|
||
| @staticmethod | ||
| def get_batch_logps( | ||
| self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sijial430 get_batch_logps should be kept a static method. humanline should be passed as an argument.
|
@Muennighoff will you open an another PR or just fix this one? |
|
@kashif Thanks! We will reopen a new PR soon |



Adds Humanline from https://arxiv.org/abs/2509.24207
cc @kawine @sijial430