Skip to content

Issue WIth Running Phi3 vision on NPU #2858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Harsha0056 opened this issue Apr 2, 2025 · 3 comments
Open

Issue WIth Running Phi3 vision on NPU #2858

Harsha0056 opened this issue Apr 2, 2025 · 3 comments
Assignees
Labels
category: NPU OpenVINO NPU plugin PSE Escalate to PSE for further investigate support_request

Comments

@Harsha0056
Copy link

Harsha0056 commented Apr 2, 2025

I installed the NPU drivers as per the instructions, created the OpenVINO version of Phi 3 Vision model, set the device to NPU, and attempted to run the model. Initially, in Task Manager, I observed the model utilizing the NPU, reaching 100% usage. However, it immediately threw the following error:

RuntimeError:
Check 'prompt_ids.get_size() >= tokenized_history.size()' failed at C:\Jenkins\workspace\private-ci\ie\build-windows-vs2022\b\repos\openvino.genai\src\cpp\src\visual_language\pipeline.cpp:201:
Prompt IDs size is less than tokenized history size.

Expected behavior
Run Phi3 vision on NPU

Laptop Specs
Processor :Intel(R) Core(TM) Ultra 7 165U 2.10 GHz
RAM : 16GB
NPU : Intel(R)AI Boost

Screenshots

Image

Image

Image

@brmarkus
Copy link

brmarkus commented Apr 2, 2025

EDIT: OK, I see, looks like you are using "HWINfo64".
May I ask which tool you use to visualize the NPU load, as shown in your screenshot?

Image

Do you use this notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/phi-3-vision/phi-3-vision.ipynb ?

Which of the two models have you selected in the drop-down menu?
"microsoft/Phi-3.5-vision-instruct",
"microsoft/Phi-3-vision-128k-instruct",

Have you enabled the checkbox "Compress model" to apply model compression?

Which version of the NPU driver do you have installed? Have you rebooted your system afterwards?
Which version of this notebooks repository do you use?
Have you created a new Python virtual environment?

I just cloned the current version of this repo, created a new virtual environment and started the Jupyter notebook - downloading and compression will take while; using the first model from dropdown "microsoft/Phi-3.5-vision-instruct" (default value) and kept the checkbox for model-compression (default value).

Keep monitoring the system memory consumption while compression and conversion. Are you sure it finished successfully?
(my system has 64GB system memory and memory usage almost uses the max memory)

@Harsha0056
Copy link
Author

EDIT: OK, I see, looks like you are using "HWINfo64". May I ask which tool you use to visualize the NPU load, as shown in your screenshot?

Image

Do you use this notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/phi-3-vision/phi-3-vision.ipynb ?

Which of the two models have you selected in the drop-down menu? "microsoft/Phi-3.5-vision-instruct", "microsoft/Phi-3-vision-128k-instruct",

Have you enabled the checkbox "Compress model" to apply model compression?

Which version of the NPU driver do you have installed? Have you rebooted your system afterwards? Which version of this notebooks repository do you use? Have you created a new Python virtual environment?

I just cloned the current version of this repo, created a new virtual environment and started the Jupyter notebook - downloading and compression will take while; using the first model from dropdown "microsoft/Phi-3.5-vision-instruct" (default value) and kept the checkbox for model-compression (default value).

Keep monitoring the system memory consumption while compression and conversion. Are you sure it finished successfully? (my system has 64GB system memory and memory usage almost uses the max memory)

Hi @brmarkus

  1. Yes, I was using HWInfo.
  2. I was using Phi-3 Vision 128K from the same notebook
  3. I selected the checkbox for weight compression.
  4. I installed the NPU driver version 32.0.100.3714 about a month ago. Since it worked for the Phi-3 LLM on NPU, I didn't update to the newer drivers.
  5. I created a new Python environment, cloned the notebooks again, and also deleted the existing Phi-3 Vision model from the cache.
  6. The compression was successful, as I tested it on both the CPU and iGPU.
  7. I encountered this issue when setting the device to NPU.

@brmarkus
Copy link

brmarkus commented Apr 2, 2025

Ok, thank you.
I started with "microsoft/Phi-3.5-vision-instruct" - tested CPU and GPU successfully.
But when using NPU I get another exception, when doing "result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)":

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[23], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Exception from src\plugins\intel_npu\src\plugin\npuw\just_sync_infer_request.cpp:659:
Failed to compile. No more devices are left!

Let me now try "microsoft/Phi-3-vision-128k-instruct" as well - will take a while.
UPDATE:
Now also downloaded, compressed and converted model "microsoft/Phi-3-vision-128k-instruct".
Inference works with CPU and GPU.

However, when using NPU, in the step "pipe = ov_genai.VLMPipeline(model_dir, device.value)" I now get an exception:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 4
      1 import openvino_genai as ov_genai
----> 4 pipe = ov_genai.VLMPipeline(model_dir, device.value)

RuntimeError: Exception from src\inference\src\cpp\core.cpp:129:
Exception from src\inference\src\dev\plugin.cpp:58:
Check 'unregistered_parameters.str().empty()' failed at src\core\src\model.cpp:60:
Model references undeclared parameters: opset1::Parameter past_key_values.9.value () -> (f16[1,32,96,1151])

This is also different than your shown exception.

My environment:

  • MS-Win-11-Pro
  • Intel Core Ultra 7 155H
  • NPU driver version 32.0.100.3714 (17.01.2025); I haven't checked whether there is a newer version agailable...
  • 64GB system memory
  • Python v3.12.4

Someone from OpenVINO-Notebook team need to have a closer look.

UPDATE:
After testing CPU and GPU successfully, now I had shut-down the Jupyter server and started again, ran all cells again (manually), selecting "Phi-3-vision-128k-instruct" - and were able to run until the last step.
Then I get the same exception like you:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[11], line 14
     11 prompt = "What is unusual on this picture? Give me full explanation."
     12 print("Answer:")
---> 14 result = pipe.generate(prompt=prompt, images=rgbs, generation_config=config, streamer=streamer)

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Check '*roi_begin <= *max_dim' failed at src\inference\src\dev\make_tensor.cpp:33

@avitial avitial added support_request category: NPU OpenVINO NPU plugin labels Apr 7, 2025
@YuChern-Intel YuChern-Intel added the PSE Escalate to PSE for further investigate label May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin PSE Escalate to PSE for further investigate support_request
Projects
None yet
Development

No branches or pull requests

6 participants