Skip to content

Conversation

openvino-dev-samples
Copy link
Contributor

No description provided.

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @openvino-dev-samples

return causal_mask


def _glm4v_update_causal_mask(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? I was thinking that we can remove _glm4v_update_causal_mask and _glm4v_prepare_4d_causal_attention_mask_with_cache_position as we should be compatible with create_causal_mask https://github.com/huggingface/transformers/blob/v4.53.0/src/transformers/models/glm4v/modular_glm4v.py#L982 (#1377 will soon be merged)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, there is accuracy issue on conversion torch.vmap, which is included in create_casual_mask.

def init_model_configs():
if "open_clip" not in TasksManager._LIBRARY_TO_SUPPORTED_MODEL_TYPES:
TasksManager._LIBRARY_TO_SUPPORTED_MODEL_TYPES["open_clip"] = {}
TasksManager._CUSTOM_CLASSES[("pt", "glm4v", "image-text-to-text")] = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here why not used AutoModelForImageTextToText directly to load all the image-text-to-text task models?
https://github.com/huggingface/transformers/blob/5dba4bc7b2c1ef517ed44bba76bb70b59001c737/src/transformers/models/auto/modeling_auto.py#L941

"phi4_multimodal": _OVPhi4MMForCausalLM,
"llama4": _OVLlama4ForCausalLM,
"glm4v": _OVGlm4vForCausalLM,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind adding a test with a tiny random model ?

)

if input_name == "seqlens":
return torch.tensor([grid_t * grid_h * grid_w], dtype=torch.int64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question : do we need to generate seqlens in the input generator or should we infer it directly in the patch instead ? (given hidden_states for example) ?

Copy link
Contributor Author

@openvino-dev-samples openvino-dev-samples Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix Sorry, I dont understand what "infer it directly in the patch instead" means ? could you show me a link of example code ? thanks

[grid_t * grid_h * grid_w, 2], max_value=grid_h, framework=framework, dtype=int_dtype
)

if input_name == "grid_thw":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question shouldn't it be inferred in the patch ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its directly from forward function

dim = self.embed_dim // self.num_heads // 2
return self.random_float_tensor([grid_h * grid_t * grid_w, dim], framework=framework, dtype=float_dtype)

if input_name == "image_type_ids":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like DummyGlm4vVisionEmbedInputGenerator could inherit from DummyQwen2VLVisionEmbedInputGenerator as they are both very close (just need to overwrite generate and just add image_type_ids ) what do you think ?

f"Initialization model for {self.config.model_type} required at least transformers >= 4.45"
)

def get_rope_index(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind adding a link to the original code https://github.com/huggingface/transformers/blob/v4.53.3/src/transformers/models/glm4v/modular_glm4v.py#L1014 (and highlight any modifications if any)

@openvino-dev-samples openvino-dev-samples changed the title [OpenVINO][Draft] Add support for GLM-4.1V-9B-Thinkin [OpenVINO] Add support for GLM-4.1V-9B-Thinkin Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants