Skip to content

RuntimeError When Saving Phi 3.5 Vision Due to Shared Tensors #223

@jjbuck

Description

@jjbuck

I’m trying to fine-tune Phi 3.5 Vision using transformers. However, I’m running into an issue trying to save the model during or after training. See below for a minimal reproducible example.

My example below seems to be essentially what's happening in the official "cookbook" example: https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/vision_finetuning/finetune_hf_trainer_docvqa.py#L482-L485.

However, I also see from this other example (https://github.com/microsoft/Phi-3CookBook/blob/6566572c38d53f384801a09dabdd26ad4f7bf76a/code/04.Finetuning/Phi-3-vision-Trainingscript.py#L256) that safe_serialization=False is used....is that strictly required? The example from finetune_hf_trainer_docvqa.py doesn't seem to use it, and it's not clear to me how that works successfully.

Does anyone have any pointers? This issue has been reported in a few other locations, but I haven't come across any solutions - see below.

  1. Saving Phi 3 vision fails due to tensor sharing huggingface/transformers#32354
  2. https://discuss.huggingface.co/t/using-trainer-to-save-a-bartforsequenceclassification-model/81606
  3. https://discuss.huggingface.co/t/runtimeerror-when-saving-phi-3-5-vision-due-to-shared-tensors/116457/1 (My own post on the HF forums earlier today)

The error suggests “saving using safe_serialization=False”…but I’m not sure what the implications of that are.

Minimal Reproducible Example

from transformers import AutoModelForCausalLM
model_id = "microsoft/Phi-3.5-vision-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto"
)
model.save_pretrained("out")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/AWSBedrockScienceModelDistillationTraining/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2958, in save_pretrained
    raise RuntimeError(
RuntimeError: The weights trying to be saved contained shared tensors [{'model.embed_tokens.weight', 'model.vision_embed_tokens.wte.weight'}] that are mismatching the transformers base configuration. Try saving using `safe_serialization=False` or remove this tensor sharing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions