v0.24.0
Features
- Add accuracy reward by @pramodith in #4270
- Add support for
token_type_idsinDPOTrainerby @aweers in #4285 - 💰
RichProgressCallbackenhancement by @qgallouedec in #4245 - Include
chat_template_kwargsinapply_chat_templateby @cmpatino in #4233 - 🏷️ Account for
token_type_idsinDataCollatorForVisionLanguageModelingby @qgallouedec in #4190 - 🎨 Support mixing image+text and text-only examples by @qgallouedec in #4203
- 🎁
RewardTrainerrefactor by @qgallouedec in #4093 - 🎞️ Support sequence classification models in
clone_chat_templateby @qgallouedec in #4097 - ✨ Add logging for training completion and model saving in training scripts by @qgallouedec in #4048
- 🖨️ Print rich table for messages by @qgallouedec in #4160
- 😴 Add
vllm_enable_sleep_modeto RLOO Trainer by @sergiopaniego in #4107 - 📽 Multi image support for GRPO/RLOO by @qgallouedec in #4113
- 👁️ Add VLM support to RLOO trainer by @behroozazarkhalili in #4067
- ℹ️ Enable XPU for vLLM client by @jiqing-feng in #4031
- 🧶 feat: Add WeaveCallback for W&B Weave integration by @parambharat in #4089
Fixes
- [Online-DPO] fix the completion_len == max_new_tokens crash by @kashif in #4193
- Fix entropy and accuracy calculation for prompt_tuning techniques. by @pramodith in #4196
- Fix prompt-completion labeling with add_generation_prompt and warning by @behroozazarkhalili in #4201
- 🌡️ Have vLLM return processed (temperature scaled) log probs by @YonatanGideoni in #4163
- Fix handling of f_divergence_type in DPO by @albertvillanova in #4171
- ⚡ Fix Flash Attention x Padding-Free loss by @qgallouedec in #4170
- Pass required token_type_ids by @albertvillanova in #4148
- 👩🦯 Fix usage of VLM using text only by @SamuelBarryCS in #4080
- ⚓ [vllm] ensure MASTER_ADDR/MASTER_PORT are set safely by @kashif in #4057
- 📤 Fix a dataset loading bug in scripts by @singing-cat in #4124
- 🐯 fix: use_liger_kernel with IterableDataset by @jue-jue-zi in #4087
- [GKD] Fix
batchmeanreduce op in GKDTrainer's loss by @cmpatino in #4105 - Fix get_peft_model() so that prepare_model_for_kbit_training does not reapply to an instance of PeftModel, thus freezing all the layers by @Hoesu in #4081
- Aux loss is already included in the loss returned by Transformers by @pramodith in #4078
- ♨️ [GRPO] Fix potential hang in
get_high_entropy_maskby @akakakakakaa in #4041
Documentation
- Remove logging.md: trainer-specific metrics documentation by @behroozazarkhalili in #4269
- Remove using_llama_models.md: outdated Llama2-specific documentation by @behroozazarkhalili in #4268
- Remove how_to_train.md: outdated training FAQ by @behroozazarkhalili in #4267
- Add Qwen3-VL notebooks (SFT, GRPO) by @sergiopaniego in #4275
- Remove obsolete research_projects directory by @behroozazarkhalili in #4243
- Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials by @sergiopaniego in #4219
- Add trainers taxonomy to docs by @sergiopaniego in #4195
- Updated vLLM integration guide by @sergiopaniego in #4162
- [DOCS] Lora without regret by @burtenshaw in #4181
- Add docstring for OnlineTrainerState by @albertvillanova in #4166
- ⚖️ Align SFT and DPO for model creation and deprecate
DPOConfig.padding_valuein favour orpad_token_idby @qgallouedec in #4006 - 🏞️ Context Parallelism benchmark guide by @sergiopaniego in #4075
▶️ Add video to community tutorials by @qgallouedec in #4090- Reviewed HF jobs updated docs by @sergiopaniego in #4088
Deprecations
- Deprecate
BestOfNSamplerby @qgallouedec in #4291 - Raise deprecation warning for Python 3.9 by @albertvillanova in #4226
- Deprecate unused dataset_formatting module by @behroozazarkhalili in #4242
- Warnings pointing to RFC by @qgallouedec in #4224
🅰️ Remove apex by @qgallouedec in #4139- 🗑️ Remove deprecated
AlignPropTrainer,DDPOTrainerandIterativeSFTTrainerby @qgallouedec in #4068
Experimental
- 🧪 Add
trl.experimentalSubmodule by @August-murr in #4073 - [GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. by @pramodith in #4060
- 🪙 [Experimental] Support GSPO-token by @hjh0119 in #3820
- 🌪️ [GFPO]: implement GFPO in GRPOTrainer by @Peter-Chou in #3989
- 🌾 [Experimental] BEMA for ref model by @qgallouedec in #3898
What's Changed
- ⬆️ Bump dev version by @qgallouedec in #4054
- Remove redundant 'None' from docstrings by @albertvillanova in #4058
- Hotfix: Add ParallelismConfig fallback for transformers with old accelerate by @albertvillanova in #4063
- Fix CI failure in slow GRPO test due to missing pillow dependency by @albertvillanova in #4064
- 💡 Fix type hint to
make_parserfunction in multiple scripts by @qgallouedec in #4050 - Improve docstring of AlignPropTrainer by @albertvillanova in #4059
- ♨️ [GRPO] Fix potential hang in
get_high_entropy_maskby @akakakakakaa in #4041 - Set Ruff src for first-party imports by @albertvillanova in #4074
- 🧪 Add
trl.experimentalSubmodule by @August-murr in #4073 - 🌾 [Experimental] BEMA for ref model by @qgallouedec in #3898
- ✂️ [GRPO VLM] Update split sizes to generalize by @zucchini-nlp in #4032
- 🛠️ Fix CI by @qgallouedec in #4076
- 🐳 Docker update + Simplify Jobs doc by @qgallouedec in #3931
- Aux loss is already included in the loss returned by Transformers by @pramodith in #4078
- Reviewed HF jobs updated docs by @sergiopaniego in #4088
- 🗑️ Remove deprecated
AlignPropTrainer,DDPOTrainerandIterativeSFTTrainerby @qgallouedec in #4068 ▶️ Add video to community tutorials by @qgallouedec in #4090- Align slow tests with regular tests by @albertvillanova in #4085
- Add support for testing experimental features by @albertvillanova in #4082
- Community Tutorials design adaptation for videos by @sergiopaniego in #4095
- 🏞️ Context Parallelism benchmark guide by @sergiopaniego in #4075
- ⌨️ Pin num2words by @lewtun in #4094
- Add deprecation warnings to docstrings by @albertvillanova in #4083
- 📜 Convert
settolistof tags by @qgallouedec in #4092 - 🧶 feat: Add WeaveCallback for W&B Weave integration by @parambharat in #4089
- ⚖️ Align SFT and DPO for model creation and deprecate
DPOConfig.padding_valuein favour orpad_token_idby @qgallouedec in #4006 - 🌪️ [GFPO]: implement GFPO in GRPOTrainer by @Peter-Chou in #3989
- ℹ️ feat: Add NPU and XPU support for activation offloading by @zilongzheng in #4056
- ℹ️ Enable XPU for vLLM client by @jiqing-feng in #4031
- Fix get_peft_model() so that prepare_model_for_kbit_training does not reapply to an instance of PeftModel, thus freezing all the layers by @Hoesu in #4081
- [GKD] Fix
batchmeanreduce op in GKDTrainer's loss by @cmpatino in #4105 - 👁️ Add VLM support to RLOO trainer by @behroozazarkhalili in #4067
- Some nits GRPO and RLOO trainer docs by @sergiopaniego in #4108
- Fix typos by @cyyever in #4106
- Fix typos by @qgallouedec in #4109
- Fix VLM configs in generate_tiny_models by @albertvillanova in #4101
- docs: correct option name to enable vllm sleep mode by @muupan in #4102
- CI hotfix: xfail test_training_with_transformers_paged for transformers<4.57.0 by @albertvillanova in #4120
- Fix code style with make precommit by @albertvillanova in #4119
- 🟩 Drop
image_split_sizesin favour ofimage_grid_thwby @qgallouedec in #4111 - 🔭 Align param passing to VLM configs in generate_tiny_models by @albertvillanova in #4118
- 📽 Multi image support for GRPO/RLOO by @qgallouedec in #4113
- 😴 Add
vllm_enable_sleep_modeto RLOO Trainer by @sergiopaniego in #4107 - 🐯 fix: use_liger_kernel with IterableDataset by @jue-jue-zi in #4087
- 📤 Fix a dataset loading bug in scripts by @singing-cat in #4124
- ⚓ [vllm] ensure MASTER_ADDR/MASTER_PORT are set safely by @kashif in #4057
- 📌 Pin vLLM version by @qgallouedec in #4122
- 👋 Remove
backendparameter fromGuidedDecodingParamsby @qgallouedec in #4123 - 🧹 Remove
max_batch_tokens,num_blocksandblock_sizefrom generation kwargs by @qgallouedec in #4065 - Remove Python version < 3.13 constraint from vllm extra dependencies by @albertvillanova in #4125
- 👩🦯 Fix usage of VLM using text only by @SamuelBarryCS in #4080
- [SFTrainer]: Fix DFT Loss by @pramodith in #4112
- Improve typing of SFT trainer by @cyyever in #4007
- 🌺 Fix GPT-OSS test by @qgallouedec in #4134
- 🪙 [Experimental] Support GSPO-token by @hjh0119 in #3820
- Fix CI: torch.AcceleratorError: CUDA error: device-side assert triggered by @albertvillanova in #4138
- 🤸♀️ Fix DFT test by @qgallouedec in #4135
- 🌵 Mark GKD trainer test as expected failure due to OOM issue by @qgallouedec in #4126
- [GRPO]: Sample from a Replay Buffer To Substitute Groups with 0 std. by @pramodith in #4060
- Fix import statement and GRPO test case by @qgallouedec in #4141
- Refactor trainers classes to use BaseTrainer with shared functionality by @albertvillanova in #4128
- Fixed some rendering issues by @sergiopaniego in #4143
- 😷 Refactor GRPO/RLOO to isolate
_generateby @qgallouedec in #4114 - 🟩 Drop
image_split_sizesin favour ofimage_grid_thwby @qgallouedec in #4156 - 📽 Multi image support for GRPO replay buffer by @qgallouedec in #4157
- 😷 Refactor GRPO/RLOO to isolate
_generatefor GRPO with replay buffer by @qgallouedec in #4158 - Add docstring for OnlineTrainerState by @albertvillanova in #4166
- Pass required token_type_ids by @albertvillanova in #4148
- 💡 Replace
<Tip>with new markdown syntax by @qgallouedec in #4161 - Remove unnecessary list comprehensions by @albertvillanova in #4164
- Add missing FDivergenceType docstring by @albertvillanova in #4165
- Fix docstrings with 'deprecated' Sphinx directive by @albertvillanova in #4174
- Fix docstring interlink to parent class for NashMDTrainer and XPOTrainer by @albertvillanova in #4179
- Fix link in docstring of RLOOTrainer by @albertvillanova in #4180
- 🖨️ Print rich table for messages by @qgallouedec in #4160
🅰️ Remove apex by @qgallouedec in #4139- Fix CI ValueError: Unknown loss type: dapo by @albertvillanova in #4173
- Fix PEFT interlinks in docstrings by @albertvillanova in #4178
- ✨ Add logging for training completion and model saving in training scripts by @qgallouedec in #4048
- 👾 Use our own
require_bitsandbytesby @qgallouedec in #4137 - 🎞️ Support sequence classification models in
clone_chat_templateby @qgallouedec in #4097 - ⚡ Fix Flash Attention x Padding-Free loss by @qgallouedec in #4170
- 🎁
RewardTrainerrefactor by @qgallouedec in #4093 - 🧺 [1/N] Refactor
_generatein GRPO/RLOO: list of ints instead of tensors by @qgallouedec in #4146 - Fix handling of f_divergence_type in DPO by @albertvillanova in #4171
- 🔣 Fix test: replace
trainer.tokenizerbytrainer.processing_classby @qgallouedec in #4185 - Fix CI ImportError: FlashAttention2 and decorator order for all parameterized tests by @albertvillanova in #4176
- Hotfix wrong formatting of docstrings with blockquote tips by @albertvillanova in #4187
- 🌡️ Have vLLM return processed (temperature scaled) log probs by @YonatanGideoni in #4163
- Replace remaining trainer.tokenizer with trainer.processing_class in GRPO test by @albertvillanova in #4192
- [DOCS] Lora without regret by @burtenshaw in #4181
- [DOCS/FIX] lora without regrets - fix lr by @burtenshaw in #4207
- Remove custome_container for building the docs by @albertvillanova in #4198
- Remove tokenizer creation from
sftexample script by @sergiopaniego in #4197 - Hotfix: Exclude transformers 4.57.0 for Python 3.9 by @albertvillanova in #4209
- Replace unittest with pytest by @albertvillanova in #4188
- Updated vLLM integration guide by @sergiopaniego in #4162
- Remove
Optionalfromprocessing_classinPPOTrainerby @sergiopaniego in #4212 - Replace setup with pyproject and fix packaging unintended modules by @albertvillanova in #4194
- Removed tokenizer/processor creation from example scripts by @sergiopaniego in #4211
- Apply style and revert change in
sft_video_llmexample by @qgallouedec in #4214 - Fix
trl-internal-testing/tiny-DbrxForCausalLMby @qgallouedec in #4213 - Fix prompt-completion labeling with add_generation_prompt and warning by @behroozazarkhalili in #4201
- Fix LoRA params in Python in LoRA without regret by @sergiopaniego in #4215
- [DOCS] fix prose in lora guide by @burtenshaw in #4217
- Add trainers taxonomy to docs by @sergiopaniego in #4195
- 🎨 Support mixing image+text and text-only examples by @qgallouedec in #4203
- 🧺 [2/N] Refactor
_generatein GRPO/RLOO: Useprompt_idsfrom generation by @qgallouedec in #4152 - Fix entropy and accuracy calculation for prompt_tuning techniques. by @pramodith in #4196
- Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials by @sergiopaniego in #4219
- 🏷️ Account for
token_type_idsinDataCollatorForVisionLanguageModelingby @qgallouedec in #4190 - Exclude vllm dependencies from dev extra by @albertvillanova in #4229
- Fix CI unittest asserts by @albertvillanova in #4234
- Fix callable annotations by @albertvillanova in #4216
- Remove unused Path import in init.py by @albertvillanova in #4227
- Update CI Docker image to pytorch/pytorch:2.8.0 by @albertvillanova in #4232
- Replace setup with pyproject in CI tests paths by @albertvillanova in #4230
- Fix CI IndentationError for Python 3.13.8 by @albertvillanova in #4240
- Remove unused log_example_reports.py script by @behroozazarkhalili in #4241
- 🧘 Enhance markdown style by @qgallouedec in #4235
- Warnings pointing to RFC by @qgallouedec in #4224
- Fix CI slow test ValueError: Backward pass should have cleared tracker of all tensors by @sywangyi in #4236
- Fix CI CUDA out of memory errors by improving GPU memory management by @albertvillanova in #4238
- Install peft from main for CI tests with dev dependencies by @albertvillanova in #4250
- Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' by @albertvillanova in #4253
- Fix CI slow test ValueError: Unknown loss type: dapo by @albertvillanova in #4254
- 🧺 [3/N] Refactor
_generatein GRPO/RLOO: Rely on generator for prompt truncation by @qgallouedec in #4153 - Remove obsolete research_projects directory by @behroozazarkhalili in #4243
- Deprecate unused dataset_formatting module by @behroozazarkhalili in #4242
- Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' by @albertvillanova in #4255
- [Online-DPO] fix the completion_len == max_new_tokens crash by @kashif in #4193
- Include
chat_template_kwargsinapply_chat_templateby @cmpatino in #4233 - Fix Python version check for skipping tests on Python 3.13.8 by @albertvillanova in #4246
- Raise deprecation warning for Python 3.9 by @albertvillanova in #4226
- Fix docstring interlinks by @albertvillanova in #4221
- Use FutureWarning instead of DeprecationWarning by @albertvillanova in #4266
- Fix style with make precommit by @albertvillanova in #4265
- Add Qwen3-VL notebooks (SFT, GRPO) by @sergiopaniego in #4275
- Fix typo in Colab link by @sergiopaniego in #4276
- Fix docstrings with Sphinx 'deprecated' directive by @albertvillanova in #4279
- Fix CI slow test OSError: You are trying to access a gated repo by @albertvillanova in #4283
- 💰
RichProgressCallbackenhancement by @qgallouedec in #4245 - Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' by @albertvillanova in #4262
- Replace unittest skipTest with pytest.skip by @albertvillanova in #4263
- Fix CI slow tests: ImportError: vLLM is not installed by @albertvillanova in #4287
- Remove logging.md: trainer-specific metrics documentation by @behroozazarkhalili in #4269
- Remove using_llama_models.md: outdated Llama2-specific documentation by @behroozazarkhalili in #4268
- Add support for
token_type_idsinDPOTrainerby @aweers in #4285 - Remove how_to_train.md: outdated training FAQ by @behroozazarkhalili in #4267
- Add accuracy reward by @pramodith in #4270
- Remove unused commands directory by @behroozazarkhalili in #4258
- Deprecate
BestOfNSamplerby @qgallouedec in #4291 - Release: v0.24 by @qgallouedec in #4292
New Contributors
- @zucchini-nlp made their first contribution in #4032
- @parambharat made their first contribution in #4089
- @zilongzheng made their first contribution in #4056
- @jiqing-feng made their first contribution in #4031
- @Hoesu made their first contribution in #4081
- @cmpatino made their first contribution in #4105
- @singing-cat made their first contribution in #4124
- @SamuelBarryCS made their first contribution in #4080
- @YonatanGideoni made their first contribution in #4163
- @aweers made their first contribution in #4285
Full Changelog: v0.23.0...v0.24.0