Skip to content

Python demos requirement incompatibility #3205

Open
@RH-steve-grubb

Description

@RH-steve-grubb

Describe the bug
There's still one more issue caused by the transformers upgrade aimed at the 2025.1 release. If you run a test program that is designed to confirm compatibility between the transformers library and the Intel-optimized optimum.intel.openvino you get a traceback:

Traceback (most recent call last):
File "//./smoke-2.py", line 34, in
output_ids = model.generate(input_ids, attention_mask=attention_mask, max_length=40)
File "/usr/local/lib64/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/transformers/generation/utils.py", line 2092, in generate
self._prepare_cache_for_generation(
File "/usr/local/lib/python3.9/site-packages/transformers/generation/utils.py", line 1714, in _prepare_cache_for_generation
if not self._supports_default_dynamic_cache():
File "/usr/local/lib/python3.9/site-packages/transformers/generation/utils.py", line 1665, in _supports_default_dynamic_cache
self._supports_cache_class
AttributeError: 'OVModelForCausalLM' object has no attribute '_supports_cache_class'

The _supports_cache_class attribute was introduced recently (transformers 4.42.x), and the Optimum-Intel (OVModelForCausalLM) class hasn't implemented support for the latest caching API introduced by transformers. Upstream noticed this and added support in the optimum 1.18.1 release.

So, the requirements should be optimum[diffusers]==1.18.1. Would upgrading optimum cause any other problems?

To Reproduce
Run the following program in the image after installing the demos/python_demos/requirements.txt python modules.

from optimum.intel.openvino import OVModelForCausalLM
from transformers import AutoTokenizer

# Model name compatible with OpenVINO optimizations
model_name = "gpt2"

# Load tokenizer (Transformers API)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# Load optimized model (Optimum Intel API with OpenVINO backend)
model = OVModelForCausalLM.from_pretrained(model_name, export=True)

# Prepare input text
prompt = "Testing transformers and optimum.intel integration"
inputs = tokenizer(prompt, return_tensors="pt", padding=True)
input_ids = inputs.input_ids
attention_mask = inputs.attention_mask

# Generate output (testing both transformers tokenization & OpenVINO inference)
output_ids = model.generate(input_ids, attention_mask=attention_mask, max_length=40)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Prompt:", prompt)
print("Generated text:", generated_text)

Expected behavior
The program should output something. And it does with optimum==1.18.1.

Configuration
OVMS 2025.1

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions