Release v0.24.0 · huggingface/trl

Features

Add accuracy reward by @pramodith in #4270
Add support for token_type_ids in DPOTrainer by @aweers in #4285
💰 RichProgressCallback enhancement by @qgallouedec in #4245
Include chat_template_kwargs in apply_chat_template by @cmpatino in #4233
🏷️ Account for token_type_ids in DataCollatorForVisionLanguageModeling by @qgallouedec in #4190
🎨 Support mixing image+text and text-only examples by @qgallouedec in #4203
🎁 RewardTrainer refactor by @qgallouedec in #4093
🎞️ Support sequence classification models in clone_chat_template by @qgallouedec in #4097
✨ Add logging for training completion and model saving in training scripts by @qgallouedec in #4048
🖨️ Print rich table for messages by @qgallouedec in #4160
😴 Add vllm_enable_sleep_mode to RLOO Trainer by @sergiopaniego in #4107
📽 Multi image support for GRPO/RLOO by @qgallouedec in #4113
👁️ Add VLM support to RLOO trainer by @behroozazarkhalili in #4067
ℹ️ Enable XPU for vLLM client by @jiqing-feng in #4031
🧶 feat: Add WeaveCallback for W&B Weave integration by @parambharat in #4089

Fixes

[Online-DPO] fix the completion_len == max_new_tokens crash by @kashif in #4193
Fix entropy and accuracy calculation for prompt_tuning techniques. by @pramodith in #4196
Fix prompt-completion labeling with add_generation_prompt and warning by @behroozazarkhalili in #4201
🌡️ Have vLLM return processed (temperature scaled) log probs by @YonatanGideoni in #4163
Fix handling of f_divergence_type in DPO by @albertvillanova in #4171
⚡ Fix Flash Attention x Padding-Free loss by @qgallouedec in #4170
Pass required token_type_ids by @albertvillanova in #4148
👩‍🦯 Fix usage of VLM using text only by @SamuelBarryCS in #4080
⚓ [vllm] ensure MASTER_ADDR/MASTER_PORT are set safely by @kashif in #4057
📤 Fix a dataset loading bug in scripts by @singing-cat in #4124
🐯 fix: use_liger_kernel with IterableDataset by @jue-jue-zi in #4087
[GKD] Fix batchmean reduce op in GKDTrainer's loss by @cmpatino in #4105
Fix get_peft_model() so that prepare_model_for_kbit_training does not reapply to an instance of PeftModel, thus freezing all the layers by @Hoesu in #4081
Aux loss is already included in the loss returned by Transformers by @pramodith in #4078
♨️ [GRPO] Fix potential hang in get_high_entropy_mask by @akakakakakaa in #4041

Documentation

Remove logging.md: trainer-specific metrics documentation by @behroozazarkhalili in #4269
Remove using_llama_models.md: outdated Llama2-specific documentation by @behroozazarkhalili in #4268
Remove how_to_train.md: outdated training FAQ by @behroozazarkhalili in #4267
Add Qwen3-VL notebooks (SFT, GRPO) by @sergiopaniego in #4275
Remove obsolete research_projects directory by @behroozazarkhalili in #4243
Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials by @sergiopaniego in #4219
Add trainers taxonomy to docs by @sergiopaniego in #4195
Updated vLLM integration guide by @sergiopaniego in #4162
[DOCS] Lora without regret by @burtenshaw in #4181
Add docstring for OnlineTrainerState by @albertvillanova in #4166
⚖️ Align SFT and DPO for model creation and deprecate DPOConfig.padding_value in favour or pad_token_id by @qgallouedec in #4006
🏞️ Context Parallelism benchmark guide by @sergiopaniego in #4075
▶️ Add video to community tutorials by @qgallouedec in #4090
Reviewed HF jobs updated docs by @sergiopaniego in #4088

Deprecations

Deprecate BestOfNSampler by @qgallouedec in #4291
Raise deprecation warning for Python 3.9 by @albertvillanova in #4226
Deprecate unused dataset_formatting module by @behroozazarkhalili in #4242
Warnings pointing to RFC by @qgallouedec in #4224
🅰️ Remove apex by @qgallouedec in #4139
🗑️ Remove deprecated AlignPropTrainer, DDPOTrainer and IterativeSFTTrainer by @qgallouedec in #4068

Experimental

🧪 Add trl.experimental Submodule by @August-murr in #4073
[GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. by @pramodith in #4060
🪙 [Experimental] Support GSPO-token by @hjh0119 in #3820
🌪️ [GFPO]: implement GFPO in GRPOTrainer by @Peter-Chou in #3989
🌾 [Experimental] BEMA for ref model by @qgallouedec in #3898

What's Changed

⬆️ Bump dev version by @qgallouedec in #4054
Remove redundant 'None' from docstrings by @albertvillanova in #4058
Hotfix: Add ParallelismConfig fallback for transformers with old accelerate by @albertvillanova in #4063
Fix CI failure in slow GRPO test due to missing pillow dependency by @albertvillanova in #4064
💡 Fix type hint to make_parser function in multiple scripts by @qgallouedec in #4050
Improve docstring of AlignPropTrainer by @albertvillanova in #4059
♨️ [GRPO] Fix potential hang in get_high_entropy_mask by @akakakakakaa in #4041
Set Ruff src for first-party imports by @albertvillanova in #4074
🧪 Add trl.experimental Submodule by @August-murr in #4073
🌾 [Experimental] BEMA for ref model by @qgallouedec in #3898
✂️ [GRPO VLM] Update split sizes to generalize by @zucchini-nlp in #4032
🛠️ Fix CI by @qgallouedec in #4076
🐳 Docker update + Simplify Jobs doc by @qgallouedec in #3931
Aux loss is already included in the loss returned by Transformers by @pramodith in #4078
Reviewed HF jobs updated docs by @sergiopaniego in #4088
🗑️ Remove deprecated AlignPropTrainer, DDPOTrainer and IterativeSFTTrainer by @qgallouedec in #4068
▶️ Add video to community tutorials by @qgallouedec in #4090
Align slow tests with regular tests by @albertvillanova in #4085
Add support for testing experimental features by @albertvillanova in #4082
Community Tutorials design adaptation for videos by @sergiopaniego in #4095
🏞️ Context Parallelism benchmark guide by @sergiopaniego in #4075
⌨️ Pin num2words by @lewtun in #4094
Add deprecation warnings to docstrings by @albertvillanova in #4083
📜 Convert set to list of tags by @qgallouedec in #4092
🧶 feat: Add WeaveCallback for W&B Weave integration by @parambharat in #4089
⚖️ Align SFT and DPO for model creation and deprecate DPOConfig.padding_value in favour or pad_token_id by @qgallouedec in #4006
🌪️ [GFPO]: implement GFPO in GRPOTrainer by @Peter-Chou in #3989
ℹ️ feat: Add NPU and XPU support for activation offloading by @zilongzheng in #4056
ℹ️ Enable XPU for vLLM client by @jiqing-feng in #4031
Fix get_peft_model() so that prepare_model_for_kbit_training does not reapply to an instance of PeftModel, thus freezing all the layers by @Hoesu in #4081
[GKD] Fix batchmean reduce op in GKDTrainer's loss by @cmpatino in #4105
👁️ Add VLM support to RLOO trainer by @behroozazarkhalili in #4067
Some nits GRPO and RLOO trainer docs by @sergiopaniego in #4108
Fix typos by @cyyever in #4106
Fix typos by @qgallouedec in #4109
Fix VLM configs in generate_tiny_models by @albertvillanova in #4101
docs: correct option name to enable vllm sleep mode by @muupan in #4102
CI hotfix: xfail test_training_with_transformers_paged for transformers<4.57.0 by @albertvillanova in #4120
Fix code style with make precommit by @albertvillanova in #4119
🟩 Drop image_split_sizes in favour of image_grid_thw by @qgallouedec in #4111
🔭 Align param passing to VLM configs in generate_tiny_models by @albertvillanova in #4118
📽 Multi image support for GRPO/RLOO by @qgallouedec in #4113
😴 Add vllm_enable_sleep_mode to RLOO Trainer by @sergiopaniego in #4107
🐯 fix: use_liger_kernel with IterableDataset by @jue-jue-zi in #4087
📤 Fix a dataset loading bug in scripts by @singing-cat in #4124
⚓ [vllm] ensure MASTER_ADDR/MASTER_PORT are set safely by @kashif in #4057
📌 Pin vLLM version by @qgallouedec in #4122
👋 Remove backend parameter from GuidedDecodingParams by @qgallouedec in #4123
🧹 Remove max_batch_tokens, num_blocks and block_size from generation kwargs by @qgallouedec in #4065
Remove Python version < 3.13 constraint from vllm extra dependencies by @albertvillanova in #4125
👩‍🦯 Fix usage of VLM using text only by @SamuelBarryCS in #4080
[SFTrainer]: Fix DFT Loss by @pramodith in #4112
Improve typing of SFT trainer by @cyyever in #4007
🌺 Fix GPT-OSS test by @qgallouedec in #4134
🪙 [Experimental] Support GSPO-token by @hjh0119 in #3820
Fix CI: torch.AcceleratorError: CUDA error: device-side assert triggered by @albertvillanova in #4138
🤸‍♀️ Fix DFT test by @qgallouedec in #4135
🌵 Mark GKD trainer test as expected failure due to OOM issue by @qgallouedec in #4126
[GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. by @pramodith in #4060
Fix import statement and GRPO test case by @qgallouedec in #4141
Refactor trainers classes to use BaseTrainer with shared functionality by @albertvillanova in #4128
Fixed some rendering issues by @sergiopaniego in #4143
😷 Refactor GRPO/RLOO to isolate _generate by @qgallouedec in #4114
🟩 Drop image_split_sizes in favour of image_grid_thw by @qgallouedec in #4156
📽 Multi image support for GRPO replay buffer by @qgallouedec in #4157
😷 Refactor GRPO/RLOO to isolate _generate for GRPO with replay buffer by @qgallouedec in #4158
Add docstring for OnlineTrainerState by @albertvillanova in #4166
Pass required token_type_ids by @albertvillanova in #4148
💡 Replace <Tip> with new markdown syntax by @qgallouedec in #4161
Remove unnecessary list comprehensions by @albertvillanova in #4164
Add missing FDivergenceType docstring by @albertvillanova in #4165
Fix docstrings with 'deprecated' Sphinx directive by @albertvillanova in #4174
Fix docstring interlink to parent class for NashMDTrainer and XPOTrainer by @albertvillanova in #4179
Fix link in docstring of RLOOTrainer by @albertvillanova in #4180
🖨️ Print rich table for messages by @qgallouedec in #4160
🅰️ Remove apex by @qgallouedec in #4139
Fix CI ValueError: Unknown loss type: dapo by @albertvillanova in #4173
Fix PEFT interlinks in docstrings by @albertvillanova in #4178
✨ Add logging for training completion and model saving in training scripts by @qgallouedec in #4048
👾 Use our own require_bitsandbytes by @qgallouedec in #4137
🎞️ Support sequence classification models in clone_chat_template by @qgallouedec in #4097
⚡ Fix Flash Attention x Padding-Free loss by @qgallouedec in #4170
🎁 RewardTrainer refactor by @qgallouedec in #4093
🧺 [1/N] Refactor _generate in GRPO/RLOO: list of ints instead of tensors by @qgallouedec in #4146
Fix handling of f_divergence_type in DPO by @albertvillanova in #4171
🔣 Fix test: replace trainer.tokenizer by trainer.processing_class by @qgallouedec in #4185
Fix CI ImportError: FlashAttention2 and decorator order for all parameterized tests by @albertvillanova in #4176
Hotfix wrong formatting of docstrings with blockquote tips by @albertvillanova in #4187
🌡️ Have vLLM return processed (temperature scaled) log probs by @YonatanGideoni in #4163
Replace remaining trainer.tokenizer with trainer.processing_class in GRPO test by @albertvillanova in #4192
[DOCS] Lora without regret by @burtenshaw in #4181
[DOCS/FIX] lora without regrets - fix lr by @burtenshaw in #4207
Remove custome_container for building the docs by @albertvillanova in #4198
Remove tokenizer creation from sft example script by @sergiopaniego in #4197
Hotfix: Exclude transformers 4.57.0 for Python 3.9 by @albertvillanova in #4209
Replace unittest with pytest by @albertvillanova in #4188
Updated vLLM integration guide by @sergiopaniego in #4162
Remove Optional from processing_class in PPOTrainer by @sergiopaniego in #4212
Replace setup with pyproject and fix packaging unintended modules by @albertvillanova in #4194
Removed tokenizer/processor creation from example scripts by @sergiopaniego in #4211
Apply style and revert change in sft_video_llm example by @qgallouedec in #4214
Fix trl-internal-testing/tiny-DbrxForCausalLM by @qgallouedec in #4213
Fix prompt-completion labeling with add_generation_prompt and warning by @behroozazarkhalili in #4201
Fix LoRA params in Python in LoRA without regret by @sergiopaniego in #4215
[DOCS] fix prose in lora guide by @burtenshaw in #4217
Add trainers taxonomy to docs by @sergiopaniego in #4195
🎨 Support mixing image+text and text-only examples by @qgallouedec in #4203
🧺 [2/N] Refactor _generate in GRPO/RLOO: Use prompt_ids from generation by @qgallouedec in #4152
Fix entropy and accuracy calculation for prompt_tuning techniques. by @pramodith in #4196
Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials by @sergiopaniego in #4219
🏷️ Account for token_type_ids in DataCollatorForVisionLanguageModeling by @qgallouedec in #4190
Exclude vllm dependencies from dev extra by @albertvillanova in #4229
Fix CI unittest asserts by @albertvillanova in #4234
Fix callable annotations by @albertvillanova in #4216
Remove unused Path import in init.py by @albertvillanova in #4227
Update CI Docker image to pytorch/pytorch:2.8.0 by @albertvillanova in #4232
Replace setup with pyproject in CI tests paths by @albertvillanova in #4230
Fix CI IndentationError for Python 3.13.8 by @albertvillanova in #4240
Remove unused log_example_reports.py script by @behroozazarkhalili in #4241
🧘 Enhance markdown style by @qgallouedec in #4235
Warnings pointing to RFC by @qgallouedec in #4224
Fix CI slow test ValueError: Backward pass should have cleared tracker of all tensors by @sywangyi in #4236
Fix CI CUDA out of memory errors by improving GPU memory management by @albertvillanova in #4238
Install peft from main for CI tests with dev dependencies by @albertvillanova in #4250
Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' by @albertvillanova in #4253
Fix CI slow test ValueError: Unknown loss type: dapo by @albertvillanova in #4254
🧺 [3/N] Refactor _generate in GRPO/RLOO: Rely on generator for prompt truncation by @qgallouedec in #4153
Remove obsolete research_projects directory by @behroozazarkhalili in #4243
Deprecate unused dataset_formatting module by @behroozazarkhalili in #4242
Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' by @albertvillanova in #4255
[Online-DPO] fix the completion_len == max_new_tokens crash by @kashif in #4193
Include chat_template_kwargs in apply_chat_template by @cmpatino in #4233
Fix Python version check for skipping tests on Python 3.13.8 by @albertvillanova in #4246
Raise deprecation warning for Python 3.9 by @albertvillanova in #4226
Fix docstring interlinks by @albertvillanova in #4221
Use FutureWarning instead of DeprecationWarning by @albertvillanova in #4266
Fix style with make precommit by @albertvillanova in #4265
Add Qwen3-VL notebooks (SFT, GRPO) by @sergiopaniego in #4275
Fix typo in Colab link by @sergiopaniego in #4276
Fix docstrings with Sphinx 'deprecated' directive by @albertvillanova in #4279
Fix CI slow test OSError: You are trying to access a gated repo by @albertvillanova in #4283
💰 RichProgressCallback enhancement by @qgallouedec in #4245
Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' by @albertvillanova in #4262
Replace unittest skipTest with pytest.skip by @albertvillanova in #4263
Fix CI slow tests: ImportError: vLLM is not installed by @albertvillanova in #4287
Remove logging.md: trainer-specific metrics documentation by @behroozazarkhalili in #4269
Remove using_llama_models.md: outdated Llama2-specific documentation by @behroozazarkhalili in #4268
Add support for token_type_ids in DPOTrainer by @aweers in #4285
Remove how_to_train.md: outdated training FAQ by @behroozazarkhalili in #4267
Add accuracy reward by @pramodith in #4270
Remove unused commands directory by @behroozazarkhalili in #4258
Deprecate BestOfNSampler by @qgallouedec in #4291
Release: v0.24 by @qgallouedec in #4292

New Contributors

@zucchini-nlp made their first contribution in #4032
@parambharat made their first contribution in #4089
@zilongzheng made their first contribution in #4056
@jiqing-feng made their first contribution in #4031
@Hoesu made their first contribution in #4081
@cmpatino made their first contribution in #4105
@singing-cat made their first contribution in #4124
@SamuelBarryCS made their first contribution in #4080
@YonatanGideoni made their first contribution in #4163
@aweers made their first contribution in #4285

Full Changelog: v0.23.0...v0.24.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.24.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Features

Fixes

Documentation

Deprecations

Experimental

What's Changed

New Contributors

Contributors

Uh oh!