Skip to content

Conversation

GAD-cell
Copy link

This PR aims to add support for VLMs in GRPO, which is currently not supported by HF.

I've implemented a working version that does not yet include VLLM or video input support (mainly due to limited resources for testing video inputs haha).
I added a new variable, use_vision, to the GRPO config. Setting use_vision = True enables vision inputs, while use_vision = False keeps the default GRPO behavior. Default is False.
I also had to change a function in unsloth_zoo.peft_utils (requires_grad_post_hook) to make it work.
I've tested the implementation with Qwen 2.5 VL 7B for 250 steps, and training appears to proceed correctly (see TensorBoard screenshots for reference).

@GAD-cell
Copy link
Author

GAD-cell commented Jun 17, 2025

It's implemented for a specific input type :

{
"prompt": [
{
"role": "user",
"content": [
{"type": "image"}, # N times if you have an image sequence of length N
{"type": "text", "text": "Your prompt"}]
}]
"image": [a,list,of,images] # len==N,
"answer": "assistant expected answer according to the prompt"
}

There are still tasks to complete, particularly regarding the compute loss.

@danielhanchen
Copy link
Contributor

Fantastic work!

Comment on lines 181 to 190
if not self.use_vision:
pixel_values = None
image_grid_thw = None
prompt_inputs = self.processing_class(text=prompts_text, return_tensors='pt', padding=True,padding_side="left", add_special_tokens=False)
prompt_inputs = super()._prepare_inputs(prompt_inputs)
else:
images = [x['image'] for x in inputs] # Only image inputs support for now
prompt_inputs = self.processing_class(images = images, text=prompts_text, return_tensors='pt', padding=True,padding_side="left", add_special_tokens=False)
prompt_inputs = super()._prepare_inputs(prompt_inputs)
pixel_values, image_grid_thw = prompt_inputs['pixel_values'], prompt_inputs['image_grid_thw']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is the only change along with this

"completion_mask": completion_mask,
"advantages": advantages,
"old_per_token_logps": old_per_token_logps,
}
Copy link
Author

@GAD-cell GAD-cell Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and the return output also as well as here

So there are 4 changes in total.

@Datta0
Copy link
Collaborator

Datta0 commented Jun 18, 2025

Hey @GAD-cell thanks for the changes. Can you please provide a screenshot of the generated changes UnslothGRPOTrainer.py and paste it in PR desc.
Once that is done and validated, I'll approve this PR

@GAD-cell
Copy link
Author

Ok @Datta0 here are the generated changes in UnslothGRPOTrainer :

Capture d'écran 2025-06-18 141318
Capture d'écran 2025-06-18 141338
Capture d'écran 2025-06-18 141355
Capture d'écran 2025-06-18 141421
Capture d'écran 2025-06-18 141442
Capture d'écran 2025-06-18 141452
Capture d'écran 2025-06-18 141504

@GAD-cell
Copy link
Author

GAD-cell commented Jun 18, 2025

BTW, I still need to implement the code for grpo_accumulated_loss. In my version, it assumes that the slow compute loss is used.

I also have a question:

There are two paths for computing the loss:
-compute_grpo_loss_slow (the regular path),
-grpo_accumulated_loss (a sort of liger GRPO loss, if I understand correctly, using UnslothEfficientGRPO).

The code uses grpo_accumulated_loss if os.environ.get('UNSLOTH_USE_NEW_MODEL', '0') == '0'.
However, 'UNSLOTH_USE_NEW_MODEL' is set to '1' by default for models inheriting from FastBaseModel (basically vision models, see here),
And it's set to '0' for Language Models inheriting from FastLlamaModels (see here).

So I'm not entirely sure when grpo_accumulated_loss is actually supposed to be used. My guess: as GRPO was developed only for language models at first that was not an issue.
So for now if you use vision inputs, it will use compute_grpo_loss_slow.

@GAD-cell
Copy link
Author

GAD-cell commented Jun 19, 2025

update @Datta0 @danielhanchen :

logits_to_keep is not a parameter of the forward pass for Qwen-VL models (see qwenVL forward parameters ( so I had to slice manually (see last commit).
Since vision models are currently using the "slow path," I also had to add slicing for the final logits, this was part of a previous PR (see here) and should not affect the fast path (which is the regular path).

can you confirm ?

I can also add vlm support for the fast path, just need to change few things in grpo_accumulated_loss and UnslothEfficientGRPO.

@Datta0
Copy link
Collaborator

Datta0 commented Jun 19, 2025

Hey @GAD-cell, I haven't looked the entire code, just saw the last commit.
Yeah logits_to_keep is the right way to address the issue (just make sure the +-1 is rightly taken care of). I have had a case where I had to do something similar (don't fully recollect)
I'll check the whole PR sometime tonight

@GAD-cell
Copy link
Author

Hey @GAD-cell, I haven't looked the entire code, just saw the last commit. Yeah logits_to_keep is the right way to address the issue (just make sure the +-1 is rightly taken care of). I have had a case where I had to do something similar (don't fully recollect) I'll check the whole PR sometime tonight

Thank you !
Last commit is the working version, I tested it and trained a VL model with it.

@GAD-cell GAD-cell requested a review from Datta0 June 21, 2025 09:49
@danielhanchen
Copy link
Contributor

@GAD-cell Nice work again! Would it be possible to confirm if say the original Unsloth Qwen 4B GRPO notebook on our main Github page works as expected after your changes? Appreciate it

Also would it be possible to provide a full working notebook? The goal is to highlight your work in the notebook itself (ie made by you), and we'll post about it!

@GAD-cell
Copy link
Author

@GAD-cell Nice work again! Would it be possible to confirm if say the original Unsloth Qwen 4B GRPO notebook on our main Github page works as expected after your changes? Appreciate it

Also would it be possible to provide a full working notebook? The goal is to highlight your work in the notebook itself (ie made by you), and we'll post about it!

Thank you! I just tested with your Qwen3 4B notebook, everything works correctly, including the training!
Yes ofc, I'll work on a clean notebook this weekend then.

@GAD-cell
Copy link
Author

GAD-cell commented Jun 22, 2025

Hey @danielhanchen !
I've put together a clean notebook Qwen2.5_VL_(3B)_GRPO..
Let me know if you think I should add anything. I followed the Unsloth notebook templates.

@danielhanchen
Copy link
Contributor

danielhanchen commented Jun 23, 2025

@GAD-cell Oh the notebook looks very nice - great work! There are some spelling errors :) Also maybe add a sentence somewhere notebook contributed by GAD-cell with a hyperlink (if you want)

Then also move the notebook to the notebooks repo in Unsloth :)

There are also some merge conflicts :)

After that @Datta0 Could you maybe run the notebook once and see if everything functions well!

@GAD-cell
Copy link
Author

GAD-cell commented Jun 24, 2025

@GAD-cell is something missing in the notebook? I did

%%capture
import os
! pip install git+https://github.com/GAD-cell/unsloth.git@VLM_GRPO vllm=0.8.5.post1 trl==0.18.2

and getting

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[/tmp/ipython-input-14-4031639753.py](https://localhost:8080/#) in <cell line: 0>()
      7 
      8 FastVisionModel.for_training(model)
----> 9 training_args = GRPOConfig(
     10     use_vision = True, # Enable VLM GRPO
     11     learning_rate = 5e-6,

[/content/unsloth_compiled_cache/UnslothGRPOTrainer.py](https://localhost:8080/#) in __init__(self, output_dir, overwrite_output_dir, do_train, do_eval, do_predict, eval_strategy, prediction_loss_only, per_device_train_batch_size, per_device_eval_batch_size, per_gpu_train_batch_size, per_gpu_eval_batch_size, gradient_accumulation_steps, eval_accumulation_steps, eval_delay, torch_empty_cache_steps, learning_rate, weight_decay, adam_beta1, adam_beta2, adam_epsilon, max_grad_norm, num_train_epochs, max_steps, lr_scheduler_type, warmup_ratio, warmup_steps, log_level, log_level_replica, log_on_each_node, logging_dir, logging_strategy, logging_first_step, logging_steps, logging_nan_inf_filter, save_strategy, save_steps, save_total_limit, save_safetensors, save_on_each_node, save_only_model, restore_callback_states_from_checkpoint, no_cuda, use_cpu, use_mps_device, seed, data_seed, jit_mode_eval, use_ipex, bf16, fp16, fp16_opt_level, half_precision_backend, bf16_full_eval, fp16_full_eval, tf32, local_rank, ddp_backend, tpu_num_cores, tpu_metrics_debug, debug, dataloader_drop_last, eval_steps, dataloader_num_workers, dataloader_prefetch_factor, past_index, run_name, disable_tqdm, remove_unused_columns, label_names, load_best_model_at_end, metric_for_best_model, greater_is_better, ignore_data_skip, fsdp, fsdp_min_num_params, fsdp_config, fsdp_transformer_layer_cls_to_wrap, accelerator_config, deepspeed, label_smoothing_factor, optim, optim_args, adafactor, group_by_length, length_colu...
    826 
    827 
--> 828         super().__init__(
    829             output_dir = output_dir,
    830             overwrite_output_dir = overwrite_output_dir,

TypeError: GRPOConfig.__init__() got an unexpected keyword argument 'use_vision'

Ok that's strange, I reproduced and it's working for me.
the fact that use_vision is missing let me think that you forgot to remove unsloth_compiled_cache but maybe it's something else ? Can you confirm ?

@Datta0
Copy link
Collaborator

Datta0 commented Jun 24, 2025

I just opened the notebook in new instance. The first install of unsloth on the session. So it should not be a unsloth_compiled_cache thingy.

I am using T4 but I don't think it should matter anyhow right.

@GAD-cell
Copy link
Author

GAD-cell commented Jun 24, 2025

I just opened the notebook in new instance. The first install of unsloth on the session. So it should not be a unsloth_compiled_cache thingy.

I am using T4 but I don't think it should matter anyhow right.

I tried again in a new instance and it's still working for me haha. Can't figure out what's going wrong.

I did this (you can find the second part in the cell "Colab Extra install" ):

import os
! pip install git+https://github.com/GAD-cell/unsloth.git@VLM_GRPO





!pip install --no-deps unsloth vllm==0.8.5.post1
import sys, re, requests; modules = list(sys.modules.keys())
for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer

#added for this specific notebook
!pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
!pip install --no-deps -U transformers
!pip install --no-deps -U accelerate
!pip install --no-deps trl==0.18.2

# vLLM requirements - vLLM breaks Colab due to reinstalling numpy
f = requests.get("https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/requirements/common.txt").content
with open("vllm_requirements.txt", "wb") as file:
    file.write(re.sub(rb"(transformers|numpy|xformers)[^\n]{1,}\n", b"", f))
!pip install -r vllm_requirements.txt

I'm going to try with a T4. Let me know if this worked

@Datta0
Copy link
Collaborator

Datta0 commented Jun 24, 2025

Ok my bad :)
Idk what I missed. I see it working now...

@GAD-cell
Copy link
Author

Ok my bad :) Idk what I missed. I see it working now...

Ok ok perfect :) glad it worked haha

Copy link
Collaborator

@Datta0 Datta0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Great work :)

@GAD-cell
Copy link
Author

LGTM
Great work :)

Thank you for your time !

@GAD-cell
Copy link
Author

Hey @danielhanchen. I've resolved all the conflicts and tested again the VL GRPO notebook and the regular GRPO notebook.
Everything looks good to me :)

hidden_states = model(input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1).logits
#logits = logits[:, :-1, :] # (B, L-1, V), exclude the last logit: it corresponds to the next token pred

if hidden_states.size(1) != logits_to_keep+1 : # Some models like Qwen VL don't have logits_to_keep parameter so you need to trim the output manually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I think we do this automatically in kernels

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you do it in grpo_accumulated_loss, but for now vlm grpo uses grpo_compute_loss_slow. And in that case It needs to be trimmed. I've commented about this here. We can implement it for fast path but I didn't want to touch that part yet since the flag is not clear for me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and sorry for the spacing I forgot to double-check it :)

@GAD-cell
Copy link
Author

GAD-cell commented Jul 2, 2025

Ok @danielhanchen, I've made the necessary changes. Apologies again, there was some confusion around get_per_token_logps, and I didn’t realize it now always returns None.
I’ve now properly implemented VLM GRPO for the "efficient path" (I highlighted the changes) and tested it both with my own notebook and with the regular GRPO notebook (Qwen 4B).
I also had to update grpo_accumulated_loss in unsloth_zoo. See PR 188.

input_ids = _input_ids,
pixel_values = pixel_values,
image_grid_thw = image_grid_thw,
logits_to_keep = logits_to_keep,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes for efficient path here

input_ids = _input_ids,
pixel_values = pixel_values,
image_grid_thw = image_grid_thw,
logits_to_keep = logits_to_keep,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

@Sweaterdog
Copy link

Hey there! I was testing out this branch to use GRPO for text-based tasks on a model that supports vision (A Qwen2.5-VL 3B model that I had primed already). I keep getting this error. I will paste my notebook here to see if it is an issue with my code (Might be, I Frankensteined it) but it is an issue with HF Transformers, so I don't know.

https://github.com/Sweaterdog/curly-goggles

@GAD-cell
Copy link
Author

Hey there! I was testing out this branch to use GRPO for text-based tasks on a model that supports vision (A Qwen2.5-VL 3B model that I had primed already). I keep getting this error. I will paste my notebook here to see if it is an issue with my code (Might be, I Frankensteined it) but it is an issue with HF Transformers, so I don't know.

https://github.com/Sweaterdog/curly-goggles

Hey !
I think you forgot to paste the error haha.
However in your notebook I didn't see any installation for the dependencies (look at this).

@Sweaterdog
Copy link

Hey there! I was testing out this branch to use GRPO for text-based tasks on a model that supports vision (A Qwen2.5-VL 3B model that I had primed already). I keep getting this error. I will paste my notebook here to see if it is an issue with my code (Might be, I Frankensteined it) but it is an issue with HF Transformers, so I don't know.

https://github.com/Sweaterdog/curly-goggles

Hey !
I think you forgot to paste the error haha.
However in your notebook I didn't see any installation for the dependencies (look at this).

Sorry! I can paste the errors ASAP. I am using it all locally, and not running this on Google colab, hence why I don't have the other dependencies installed.

@Sweaterdog
Copy link

This was the error that I am getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/unsloth/models/vision.py:227, in unsloth_base_fast_generate(self, *args, **kwargs)
    226     with torch.inference_mode(), autocaster:
--> 227         output = self._old_generate(*args, **kwargs)
    228 except:

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    115 with ctx_factory():
--> 116     return func(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/generation/utils.py:2625, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, use_model_defaults, custom_generate, **kwargs)
   2624     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2625     result = self._sample(
   2626         input_ids,
   2627         logits_processor=prepared_logits_processor,
   2628         stopping_criteria=prepared_stopping_criteria,
   2629         generation_config=generation_config,
   2630         synced_gpus=synced_gpus,
   2631         streamer=streamer,
   2632         **model_kwargs,
   2633     )
   2635 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   2636     # 11. interleave input_ids with `num_beams` additional sequences per batch

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/generation/utils.py:3606, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   3605 if is_prefill:
-> 3606     outputs = self(**model_inputs, return_dict=True)
   3607     is_prefill = False

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py:743, in Qwen2_5_VLForConditionalGeneration.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
    723 def forward(
    724     self,
    725     input_ids: torch.LongTensor = None,
   (...)    741     **kwargs: Unpack[KwargsForCausalLM],
    742 ) -> Union[tuple, Qwen2_5_VLCausalLMOutputWithPast]:
--> 743     return Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/utils/generic.py:943, in can_return_tuple.<locals>.wrapper(self, *args, **kwargs)
    942 try:
--> 943     output = func(self, *args, **kwargs)
    944     if is_requested_to_return_tuple or (is_configured_to_return_tuple and is_top_level_module):

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py:566, in Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
    562 output_hidden_states = (
    563     output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    564 )
--> 566 outputs = self.model(
    567     input_ids=input_ids,
    568     pixel_values=pixel_values,
    569     pixel_values_videos=pixel_values_videos,
    570     image_grid_thw=image_grid_thw,
    571     video_grid_thw=video_grid_thw,
    572     second_per_grid_ts=second_per_grid_ts,
    573     position_ids=position_ids,
    574     attention_mask=attention_mask,
    575     past_key_values=past_key_values,
    576     inputs_embeds=inputs_embeds,
    577     use_cache=use_cache,
    578     output_attentions=output_attentions,
    579     output_hidden_states=output_hidden_states,
    580     return_dict=True,
    581     cache_position=cache_position,
    582     **kwargs,
    583 )
    585 hidden_states = outputs[0]

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py:1291, in Qwen2_5_VLModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
   1290 attention_mask_tensor = torch.diagonal(attention_mask_tensor[:, 0], dim1=1, dim2=2)
-> 1291 attention_mask_tensor = attention_mask_tensor / torch.finfo(attention_mask_tensor.dtype).min
   1292 attention_mask_tensor = (1.0 - attention_mask_tensor).int()

TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 trainer.train()

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/trainer.py:2206, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2204         hf_hub_utils.enable_progress_bars()
   2205 else:
-> 2206     return inner_training_loop(
   2207         args=args,
   2208         resume_from_checkpoint=resume_from_checkpoint,
   2209         trial=trial,
   2210         ignore_keys_for_eval=ignore_keys_for_eval,
   2211     )

File <string>:321, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File <string>:28, in _unsloth_training_step(self, model, inputs, num_items_in_batch)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/trl/extras/profiling.py:98, in profiling_decorator.<locals>.wrapper(self, *args, **kwargs)
     95 @functools.wraps(func)
     96 def wrapper(self, *args, **kwargs):
     97     with profiling_context(self, func.__name__):
---> 98         return func(self, *args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/UnslothGRPOTrainer.py:1613, in _UnslothGRPOTrainer._prepare_inputs(self, generation_batch)
   1610 generate_every = self.args.steps_per_generation * self.num_iterations
   1611 if self._step % generate_every == 0 or self._buffered_inputs is None:
   1612     # self._buffered_inputs=None can occur when resuming from a checkpoint
-> 1613     generation_batch = self._generate_and_score_completions(generation_batch)
   1614     if self.use_vision : generation_batch['pixel_values']=generation_batch['pixel_values'].view(generation_batch['prompt_ids'].size(0), -1, generation_batch['pixel_values'].size(1)) # (batch_size * n_patches, dim embedding)->(batch_size,n_patches,dim embeddding)
   1615     generation_batch = shuffle_tensor_dict(generation_batch)

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/UnslothGRPOTrainer.py:1804, in _UnslothGRPOTrainer._generate_and_score_completions(self, inputs)
   1798     with (
   1799         FSDP.summon_full_params(self.model_wrapped, recurse=False)
   1800         if self.is_fsdp_enabled
   1801         else nullcontext()
   1802     ):
   1803         if self.use_vision : prompt_completion_ids = unwrapped_model.generate(prompt_ids, attention_mask=prompt_mask,pixel_values = pixel_values,image_grid_thw=image_grid_thw, generation_config=self.generation_config)
-> 1804         else : prompt_completion_ids = unwrapped_model.generate(prompt_ids, attention_mask=prompt_mask, generation_config=self.generation_config)
   1806 # Compute prompt length and extract completion ids
   1807 prompt_length = prompt_ids.size(1)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/unsloth/models/rl.py:70, in PatchRL.<locals>.unsloth_unwrap_model_for_generation.<locals>.generate_with_clone(*args, **kwargs)
     69 def generate_with_clone(*args, **kwargs):
---> 70     out = original_generate(*args, **kwargs)
     71     if isinstance(out, torch.Tensor):
     72         return out.clone()

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/peft/peft_model.py:1968, in PeftModelForCausalLM.generate(self, *args, **kwargs)
   1966     with self._enable_peft_forward_hooks(*args, **kwargs):
   1967         kwargs = {k: v for k, v in kwargs.items() if k not in self.special_peft_forward_args}
-> 1968         outputs = self.base_model.generate(*args, **kwargs)
   1969 else:
   1970     outputs = self.base_model.generate(**kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/unsloth/models/vision.py:232, in unsloth_base_fast_generate(self, *args, **kwargs)
    230     kwargs.pop("prompt_lookup_num_tokens", None)
    231     with torch.inference_mode(), autocaster:
--> 232         output = self._old_generate(*args, **kwargs)
    233 finally:
    234     pass

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/generation/utils.py:2625, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, use_model_defaults, custom_generate, **kwargs)
   2617     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2618         input_ids=input_ids,
   2619         expand_size=generation_config.num_return_sequences,
   2620         is_encoder_decoder=self.config.is_encoder_decoder,
   2621         **model_kwargs,
   2622     )
   2624     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2625     result = self._sample(
   2626         input_ids,
   2627         logits_processor=prepared_logits_processor,
   2628         stopping_criteria=prepared_stopping_criteria,
   2629         generation_config=generation_config,
   2630         synced_gpus=synced_gpus,
   2631         streamer=streamer,
   2632         **model_kwargs,
   2633     )
   2635 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   2636     # 11. interleave input_ids with `num_beams` additional sequences per batch
   2637     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2638         input_ids=input_ids,
   2639         expand_size=generation_config.num_beams,
   2640         is_encoder_decoder=self.config.is_encoder_decoder,
   2641         **model_kwargs,
   2642     )

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/generation/utils.py:3606, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   3603 model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
   3605 if is_prefill:
-> 3606     outputs = self(**model_inputs, return_dict=True)
   3607     is_prefill = False
   3608 else:

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1757 # If we don't have any hooks, we want to skip the rest of the logic in
   1758 # this function, and just call forward.
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None
   1765 called_always_called_hooks = set()

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py:743, in Qwen2_5_VLForConditionalGeneration.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
    723 def forward(
    724     self,
    725     input_ids: torch.LongTensor = None,
   (...)    741     **kwargs: Unpack[KwargsForCausalLM],
    742 ) -> Union[tuple, Qwen2_5_VLCausalLMOutputWithPast]:
--> 743     return Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/utils/generic.py:943, in can_return_tuple.<locals>.wrapper(self, *args, **kwargs)
    940     set_attribute_for_modules(self, "_is_top_level_module", False)
    942 try:
--> 943     output = func(self, *args, **kwargs)
    944     if is_requested_to_return_tuple or (is_configured_to_return_tuple and is_top_level_module):
    945         output = output.to_tuple()

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py:566, in Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
    561 output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    562 output_hidden_states = (
    563     output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    564 )
--> 566 outputs = self.model(
    567     input_ids=input_ids,
    568     pixel_values=pixel_values,
    569     pixel_values_videos=pixel_values_videos,
    570     image_grid_thw=image_grid_thw,
    571     video_grid_thw=video_grid_thw,
    572     second_per_grid_ts=second_per_grid_ts,
    573     position_ids=position_ids,
    574     attention_mask=attention_mask,
    575     past_key_values=past_key_values,
    576     inputs_embeds=inputs_embeds,
    577     use_cache=use_cache,
    578     output_attentions=output_attentions,
    579     output_hidden_states=output_hidden_states,
    580     return_dict=True,
    581     cache_position=cache_position,
    582     **kwargs,
    583 )
    585 hidden_states = outputs[0]
    586 logits = EMPTY_LOGITS

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1757 # If we don't have any hooks, we want to skip the rest of the logic in
   1758 # this function, and just call forward.
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None
   1765 called_always_called_hooks = set()

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py:1291, in Qwen2_5_VLModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
   1289 if attention_mask_tensor is not None and attention_mask_tensor.ndim == 4:
   1290     attention_mask_tensor = torch.diagonal(attention_mask_tensor[:, 0], dim1=1, dim2=2)
-> 1291     attention_mask_tensor = attention_mask_tensor / torch.finfo(attention_mask_tensor.dtype).min
   1292     attention_mask_tensor = (1.0 - attention_mask_tensor).int()
   1294 # Calculate RoPE index once per generation in the pre-fill stage only.
   1295 # When compiling, we can't check tensor values thus we check only input length
   1296 # It is safe to assume that `length!=1` means we're in pre-fill because compiled
   1297 # models currently cannot do asssisted decoding

TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo'

@GAD-cell
Copy link
Author

GAD-cell commented Jul 11, 2025

This was the error that I am getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/unsloth/models/vision.py:227, in unsloth_base_fast_generate(self, *args, **kwargs)
    226     with torch.inference_mode(), autocaster:
--> 227         output = self._old_generate(*args, **kwargs)
    228 except:

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    115 with ctx_factory():
--> 116     return func(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/generation/utils.py:2625, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, use_model_defaults, custom_generate, **kwargs)
   2624     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2625     result = self._sample(
   2626         input_ids,
   2627         logits_processor=prepared_logits_processor,
   2628         stopping_criteria=prepared_stopping_criteria,
   2629         generation_config=generation_config,
   2630         synced_gpus=synced_gpus,
   2631         streamer=streamer,
   2632         **model_kwargs,
   2633     )
   2635 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   2636     # 11. interleave input_ids with `num_beams` additional sequences per batch

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/generation/utils.py:3606, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   3605 if is_prefill:
-> 3606     outputs = self(**model_inputs, return_dict=True)
   3607     is_prefill = False

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py:743, in Qwen2_5_VLForConditionalGeneration.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
    723 def forward(
    724     self,
    725     input_ids: torch.LongTensor = None,
   (...)    741     **kwargs: Unpack[KwargsForCausalLM],
    742 ) -> Union[tuple, Qwen2_5_VLCausalLMOutputWithPast]:
--> 743     return Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/utils/generic.py:943, in can_return_tuple.<locals>.wrapper(self, *args, **kwargs)
    942 try:
--> 943     output = func(self, *args, **kwargs)
    944     if is_requested_to_return_tuple or (is_configured_to_return_tuple and is_top_level_module):

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py:566, in Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
    562 output_hidden_states = (
    563     output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    564 )
--> 566 outputs = self.model(
    567     input_ids=input_ids,
    568     pixel_values=pixel_values,
    569     pixel_values_videos=pixel_values_videos,
    570     image_grid_thw=image_grid_thw,
    571     video_grid_thw=video_grid_thw,
    572     second_per_grid_ts=second_per_grid_ts,
    573     position_ids=position_ids,
    574     attention_mask=attention_mask,
    575     past_key_values=past_key_values,
    576     inputs_embeds=inputs_embeds,
    577     use_cache=use_cache,
    578     output_attentions=output_attentions,
    579     output_hidden_states=output_hidden_states,
    580     return_dict=True,
    581     cache_position=cache_position,
    582     **kwargs,
    583 )
    585 hidden_states = outputs[0]

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py:1291, in Qwen2_5_VLModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
   1290 attention_mask_tensor = torch.diagonal(attention_mask_tensor[:, 0], dim1=1, dim2=2)
-> 1291 attention_mask_tensor = attention_mask_tensor / torch.finfo(attention_mask_tensor.dtype).min
   1292 attention_mask_tensor = (1.0 - attention_mask_tensor).int()

TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 trainer.train()

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/trainer.py:2206, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2204         hf_hub_utils.enable_progress_bars()
   2205 else:
-> 2206     return inner_training_loop(
   2207         args=args,
   2208         resume_from_checkpoint=resume_from_checkpoint,
   2209         trial=trial,
   2210         ignore_keys_for_eval=ignore_keys_for_eval,
   2211     )

File <string>:321, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File <string>:28, in _unsloth_training_step(self, model, inputs, num_items_in_batch)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/trl/extras/profiling.py:98, in profiling_decorator.<locals>.wrapper(self, *args, **kwargs)
     95 @functools.wraps(func)
     96 def wrapper(self, *args, **kwargs):
     97     with profiling_context(self, func.__name__):
---> 98         return func(self, *args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/UnslothGRPOTrainer.py:1613, in _UnslothGRPOTrainer._prepare_inputs(self, generation_batch)
   1610 generate_every = self.args.steps_per_generation * self.num_iterations
   1611 if self._step % generate_every == 0 or self._buffered_inputs is None:
   1612     # self._buffered_inputs=None can occur when resuming from a checkpoint
-> 1613     generation_batch = self._generate_and_score_completions(generation_batch)
   1614     if self.use_vision : generation_batch['pixel_values']=generation_batch['pixel_values'].view(generation_batch['prompt_ids'].size(0), -1, generation_batch['pixel_values'].size(1)) # (batch_size * n_patches, dim embedding)->(batch_size,n_patches,dim embeddding)
   1615     generation_batch = shuffle_tensor_dict(generation_batch)

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/UnslothGRPOTrainer.py:1804, in _UnslothGRPOTrainer._generate_and_score_completions(self, inputs)
   1798     with (
   1799         FSDP.summon_full_params(self.model_wrapped, recurse=False)
   1800         if self.is_fsdp_enabled
   1801         else nullcontext()
   1802     ):
   1803         if self.use_vision : prompt_completion_ids = unwrapped_model.generate(prompt_ids, attention_mask=prompt_mask,pixel_values = pixel_values,image_grid_thw=image_grid_thw, generation_config=self.generation_config)
-> 1804         else : prompt_completion_ids = unwrapped_model.generate(prompt_ids, attention_mask=prompt_mask, generation_config=self.generation_config)
   1806 # Compute prompt length and extract completion ids
   1807 prompt_length = prompt_ids.size(1)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/unsloth/models/rl.py:70, in PatchRL.<locals>.unsloth_unwrap_model_for_generation.<locals>.generate_with_clone(*args, **kwargs)
     69 def generate_with_clone(*args, **kwargs):
---> 70     out = original_generate(*args, **kwargs)
     71     if isinstance(out, torch.Tensor):
     72         return out.clone()

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/peft/peft_model.py:1968, in PeftModelForCausalLM.generate(self, *args, **kwargs)
   1966     with self._enable_peft_forward_hooks(*args, **kwargs):
   1967         kwargs = {k: v for k, v in kwargs.items() if k not in self.special_peft_forward_args}
-> 1968         outputs = self.base_model.generate(*args, **kwargs)
   1969 else:
   1970     outputs = self.base_model.generate(**kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/unsloth/models/vision.py:232, in unsloth_base_fast_generate(self, *args, **kwargs)
    230     kwargs.pop("prompt_lookup_num_tokens", None)
    231     with torch.inference_mode(), autocaster:
--> 232         output = self._old_generate(*args, **kwargs)
    233 finally:
    234     pass

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/generation/utils.py:2625, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, use_model_defaults, custom_generate, **kwargs)
   2617     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2618         input_ids=input_ids,
   2619         expand_size=generation_config.num_return_sequences,
   2620         is_encoder_decoder=self.config.is_encoder_decoder,
   2621         **model_kwargs,
   2622     )
   2624     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2625     result = self._sample(
   2626         input_ids,
   2627         logits_processor=prepared_logits_processor,
   2628         stopping_criteria=prepared_stopping_criteria,
   2629         generation_config=generation_config,
   2630         synced_gpus=synced_gpus,
   2631         streamer=streamer,
   2632         **model_kwargs,
   2633     )
   2635 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   2636     # 11. interleave input_ids with `num_beams` additional sequences per batch
   2637     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2638         input_ids=input_ids,
   2639         expand_size=generation_config.num_beams,
   2640         is_encoder_decoder=self.config.is_encoder_decoder,
   2641         **model_kwargs,
   2642     )

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/generation/utils.py:3606, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   3603 model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
   3605 if is_prefill:
-> 3606     outputs = self(**model_inputs, return_dict=True)
   3607     is_prefill = False
   3608 else:

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1757 # If we don't have any hooks, we want to skip the rest of the logic in
   1758 # this function, and just call forward.
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None
   1765 called_always_called_hooks = set()

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py:743, in Qwen2_5_VLForConditionalGeneration.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
    723 def forward(
    724     self,
    725     input_ids: torch.LongTensor = None,
   (...)    741     **kwargs: Unpack[KwargsForCausalLM],
    742 ) -> Union[tuple, Qwen2_5_VLCausalLMOutputWithPast]:
--> 743     return Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/utils/generic.py:943, in can_return_tuple.<locals>.wrapper(self, *args, **kwargs)
    940     set_attribute_for_modules(self, "_is_top_level_module", False)
    942 try:
--> 943     output = func(self, *args, **kwargs)
    944     if is_requested_to_return_tuple or (is_configured_to_return_tuple and is_top_level_module):
    945         output = output.to_tuple()

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/unsloth_compiled_module_qwen2_5_vl.py:566, in Qwen2_5_VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
    561 output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    562 output_hidden_states = (
    563     output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    564 )
--> 566 outputs = self.model(
    567     input_ids=input_ids,
    568     pixel_values=pixel_values,
    569     pixel_values_videos=pixel_values_videos,
    570     image_grid_thw=image_grid_thw,
    571     video_grid_thw=video_grid_thw,
    572     second_per_grid_ts=second_per_grid_ts,
    573     position_ids=position_ids,
    574     attention_mask=attention_mask,
    575     past_key_values=past_key_values,
    576     inputs_embeds=inputs_embeds,
    577     use_cache=use_cache,
    578     output_attentions=output_attentions,
    579     output_hidden_states=output_hidden_states,
    580     return_dict=True,
    581     cache_position=cache_position,
    582     **kwargs,
    583 )
    585 hidden_states = outputs[0]
    586 logits = EMPTY_LOGITS

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1757 # If we don't have any hooks, we want to skip the rest of the logic in
   1758 # this function, and just call forward.
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None
   1765 called_always_called_hooks = set()

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py:1291, in Qwen2_5_VLModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, second_per_grid_ts, **kwargs)
   1289 if attention_mask_tensor is not None and attention_mask_tensor.ndim == 4:
   1290     attention_mask_tensor = torch.diagonal(attention_mask_tensor[:, 0], dim1=1, dim2=2)
-> 1291     attention_mask_tensor = attention_mask_tensor / torch.finfo(attention_mask_tensor.dtype).min
   1292     attention_mask_tensor = (1.0 - attention_mask_tensor).int()
   1294 # Calculate RoPE index once per generation in the pre-fill stage only.
   1295 # When compiling, we can't check tensor values thus we check only input length
   1296 # It is safe to assume that `length!=1` means we're in pre-fill because compiled
   1297 # models currently cannot do asssisted decoding

TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo'

This is due to the transformers version.
You should use transformers 4.52.4.
I recommend you install all the dependencies from the link I provided on the last comment even if you are not on a colab session. And it should work. :)

@Sweaterdog
Copy link

Ah! Thank you so much. It is working now!

@Sweaterdog
Copy link

One thing I noticed. When I went to run PPO fine tuning for a different model I ended up getting this error if I used this version.

from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=6,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=1,      # Reduce further for stability
        gradient_accumulation_steps=1,      # Effective batch size = 4
        warmup_ratio=0.1,                   # Double the warmup
        num_train_epochs=1,                 
        learning_rate=6e-5,                 # HALF the current rate
        max_grad_norm=0.5,                  # Tighter gradient clipping
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=25,                   
        optim="adamw_8bit",
        weight_decay=0.005,                 # Reduce weight decay
        lr_scheduler_type="cosine",         
        seed=3407,
        output_dir="outputs",
        save_steps=10000                     # More frequent saves
    ),
)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 5
      2 from trl import SFTTrainer
      3 from unsloth import is_bfloat16_supported
----> 5 trainer = SFTTrainer(
      6     model=model,
      7     tokenizer=tokenizer,
      8     train_dataset=dataset,
      9     dataset_text_field="text",
     10     max_seq_length=max_seq_length,
     11     dataset_num_proc=6,
     12     packing=False,
     13     args=TrainingArguments(
     14         per_device_train_batch_size=1,      # Reduce further for stability
     15         gradient_accumulation_steps=1,      # Effective batch size = 4
     16         warmup_ratio=0.1,                   # Double the warmup
     17         num_train_epochs=1,                 
     18         learning_rate=6e-5,                 # HALF the current rate
     19         max_grad_norm=0.5,                  # Tighter gradient clipping
     20         fp16=not is_bfloat16_supported(),
     21         bf16=is_bfloat16_supported(),
     22         logging_steps=25,                   
     23         optim="adamw_8bit",
     24         weight_decay=0.005,                 # Reduce weight decay
     25         lr_scheduler_type="cosine",         
     26         seed=3407,
     27         output_dir="outputs",
     28         save_steps=10000                     # More frequent saves
     29     ),
     30 )

File ~/Desktop/Coding_Projects/Unsloth/.venv/lib/python3.12/site-packages/unsloth/trainer.py:209, in _backwards_compatible_trainer.<locals>.new_init(self, *args, **kwargs)
    207     kwargs["args"] = config
    208 pass
--> 209 original_init(self, *args, **kwargs)

File ~/Desktop/Coding_Projects/Unsloth/unsloth_compiled_cache/UnslothSFTTrainer.py:1005, in UnslothSFTTrainer.__init__(self, model, args, data_collator, train_dataset, eval_dataset, processing_class, compute_loss_func, compute_metrics, callbacks, optimizer_cls_and_kwargs, preprocess_logits_for_metrics, peft_config, formatting_func, **kwargs)
    987 def __init__(
    988     self,
    989     model,
   (...)   1002     **kwargs
   1003 ):
   1004     if args is None: args = UnslothSFTConfig()
-> 1005     self.use_vision = args.use_vision
   1006     use_bf16 = getattr(args, 'bf16', False)
   1007     if type(use_bf16) is not bool: use_bf16 = False

AttributeError: 'TrainingArguments' object has no attribute 'use_vision'

And when I add use_vision and set it to none, I get an error about an unexpected argument, which is use_vision. Mind you this is also all on a Language only model using FastLanguageModel

@Larry-Gan
Copy link

@GAD-cell Does this support G3emma 3n? It seems it's not compatible with the newest transformers version that's needed for 3n:
GAD-cell/vlm-grpo#13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants