Add Humanline #4261

Muennighoff · 2025-10-13T03:50:53Z

Adds Humanline from https://arxiv.org/abs/2509.24207

cc @kawine @sijial430

Added humanline logic for log ratio calculation and clipping.

Added a new callback to synchronize the model with a reference model during training.

Added HumanlineSyncRefModelCallback to support humanline synchronization.

added humanline for dpo and kto

lewtun · 2025-10-15T10:30:18Z

Thanks @Muennighoff ! Do you have some datasets we can test this on?

Muennighoff · 2025-10-18T18:27:14Z

Thanks! For GRPO on math, we used just MATH500 (this repo: https://github.com/kawine/open-r1-humanline); Maybe @sijial430 can comment regarding KTO/DPO

sijial430 · 2025-10-19T18:45:30Z

For GRPO on instruction following, we use princeton-nlp/llama3-ultrafeedback-armorm mainly (repo: https://github.com/ContextualAI/HALOs/tree/research). Thanks!

kawine · 2025-10-19T19:15:52Z

To provide some more context, all our math experiments were done in HF's open-r1 repo and were done by subclassing GRPOTrainer. All the major changes are in this file, save for adding some new arguments to configs.py (prefixed with 'humanline') and adding some recipes to the recipes/ folder. The results, as discussed in the paper, are:

Our instruction-following experiments were done in the research branch of the HALOs repo.
We are working on some runs to ensure that the offline -> offline+humanline improvements like the one below for gemma2-27B also hold up in TRL.

kawine · 2025-10-20T22:01:57Z

trl/trainer/kto_trainer.py

-            kl = (policy_KL_logps - reference_KL_logps).mean().detach()
+            if self.humanline:
+                policy_KL_logps.clamp_(min=self.humanline_log_eps_P, max=self.humanline_log_eps_R)
+                reference_KL_logps.clamp_(min=self.humanline_log_eps_P, max=self.humanline_log_eps_R)


@sijial430 This is incorrect. The logp ratios should be clamped, not the raw logps.

kawine · 2025-10-20T22:03:32Z

trl/trainer/kto_trainer.py


-    @staticmethod
    def get_batch_logps(
+        self,


@sijial430 get_batch_logps should be kept a static method. humanline should be passed as an argument.

kashif · 2025-10-22T12:26:40Z

@Muennighoff will you open an another PR or just fix this one?

sijial430 · 2025-10-22T17:20:07Z

@kashif Thanks! We will reopen a new PR soon

Muennighoff and others added 8 commits October 7, 2025 11:02

Add HumanLine parameters to GRPO config

f3d0bf6

Implement humanline adjustments in log ratio

b8eafdc

Added humanline logic for log ratio calculation and clipping.

Implement HumanlineSyncRefModelCallback

ef9e4bc

Added a new callback to synchronize the model with a reference model during training.

Add HumanlineSyncRefModelCallback support

bfcaa20

Added HumanlineSyncRefModelCallback to support humanline synchronization.

Add validation for conflicting model settings

205fad8

added humanline for dpo and kto

389fedb

Merge pull request #2 from sijial430/humanline-1

eb6bf50

added humanline for dpo and kto

Merge branch 'main' into humanline-1

fe54365

kashif self-requested a review October 15, 2025 08:26

kawine reviewed Oct 20, 2025

View reviewed changes

Muennighoff closed this Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Humanline #4261

Add Humanline #4261

Muennighoff commented Oct 13, 2025

Uh oh!

lewtun commented Oct 15, 2025

Uh oh!

Muennighoff commented Oct 18, 2025

Uh oh!

sijial430 commented Oct 19, 2025

Uh oh!

kawine commented Oct 19, 2025 •

edited

Loading

Uh oh!

kawine Oct 20, 2025

Uh oh!

kawine Oct 20, 2025

Uh oh!

kashif commented Oct 22, 2025

Uh oh!

sijial430 commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add Humanline #4261

Add Humanline #4261

Conversation

Muennighoff commented Oct 13, 2025

Uh oh!

lewtun commented Oct 15, 2025

Uh oh!

Muennighoff commented Oct 18, 2025

Uh oh!

sijial430 commented Oct 19, 2025

Uh oh!

kawine commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kawine Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

kawine Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

kashif commented Oct 22, 2025

Uh oh!

sijial430 commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kawine commented Oct 19, 2025 •

edited

Loading