Skip to content

Issue with running Flux.1.dev on iGPU #2926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stsxxx opened this issue May 8, 2025 · 14 comments
Open

Issue with running Flux.1.dev on iGPU #2926

stsxxx opened this issue May 8, 2025 · 14 comments

Comments

@stsxxx
Copy link

stsxxx commented May 8, 2025

Describe the bug
I'm trying to run the Flux.1.dev model using fp16 on the integrated GPU of an Intel® Core™ Ultra 7 165H. I attempted to generate an image with 50 inference steps, but one single step takes an extremely long time (forever) to complete.
Additionally, I ran Stable Diffusion XL in fp16 under the same setup (50 steps, 1024×1024), and it completed in about 3 minutes per image, which is significantly faster than Flux.1.dev but still slow.

Expected behavior
I want to verify whether my Intel integrated GPU (iGPU) is correctly activated and being used during inference. Could you guide me on how to check its status and ensure it's being utilized properly in my setup? Or is the performance I'm seeing simply expected for this hardware?

Screenshots
The code i am using:
Image

**Installation instructions
I am not using the notebook.
Models have all been converted usingoptimum-cli export openvino
GPU and NPU can be detected:
Image

** Environment information **
openai==1.77.0
opencv-python== 4.11.0.86
openvino==2025.1.0
openvino-genai==2025.1.0.0
optimum==1.25.0.dev0
optimum-intel==1.23.0.dev0+590692f

@brmarkus
Copy link

brmarkus commented May 8, 2025

Instead of a screenshot could you show the used code as text, please? That would make reproduction much easier.
Is the code based on another sample or based on a Jupyter-Notebook?

Can you provide more details about your environment, OperatingSystem, version information?

@stsxxx
Copy link
Author

stsxxx commented May 8, 2025

Thank you for your reply. The code is shown below.
It's based on a simple diffusion model example from the OpenVINO repository.
I'm using Ubuntu 24.04.2 LTS, Python 3.12.3, torch 2.7.0


from pathlib import Path
import argparse
import openvino_genai
from PIL import Image
from tqdm import tqdm
import sys
import openvino as ov

seed = 42
num_inference_steps = 50
random_generator = openvino_genai.TorchGenerator(seed)
pbar = tqdm(total=num_inference_steps)

def callback(step, num_steps, latent):
    if num_steps != pbar.total:
        pbar.reset(num_steps)
    pbar.update(1)
    sys.stdout.flush()
    return False

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('model_dir')
    parser.add_argument('prompt')
    args = parser.parse_args()

    # device = 'GPU'  # GPU can be used as well
    pipe = openvino_genai.Text2ImagePipeline(args.model_dir, device="GPU")
    print(pipe)

    result = pipe.generate(args.prompt, num_inference_steps=num_inference_steps, generator=random_generator, callback=callback, height=1024, width=1024)
    # pipe = ov_genai.Image2ImagePipeline(model_dir, device.value)
    pbar.close()

    final_image = Image.fromarray(result.data[0])
    final_image.save("output.png")

if '__main__' == __name__:
    main()

command: python3 test.py "model directory" "cyberpunk cityscape like Tokyo and New York with tall buildings at dusk, golden hour, cinematic lighting"

@brmarkus
Copy link

brmarkus commented May 8, 2025

I just tried to run the Jupyter notebook under
https://github.com/openvinotoolkit/openvino_notebooks/blob/e5a8aa127c9464a356a6767d2fb62b88ed21be3c/notebooks/flux.1-image-generation/flux.1-image-generation.ipynb

but it fails for me with

!huggingface-cli download {ov_model_id} --local-dir {model_dir}

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/OpenVINO/FLUX.1-dev-int4-ov/revision/main
Repository Not Found for url: https://huggingface.co/api/models/OpenVINO/FLUX.1-dev-int4-ov/revision/main.

(being logged into HuggingFace, accepted license agreement, access-token provided)

EDIT: According to this issue #2792 it was working in March this year.

@eaidova was the model recently moved, renamed, removed, do you know? Or do my HuggingFace credentials (I'm located in Europe/Germany) not allow to access the model?
In HuggingFace, I see "Gated model You have been granted access to this model".

@eaidova
Copy link
Collaborator

eaidova commented May 8, 2025

@brmarkus dev model was never uploaed to huggingface hub under openvino account, it has license agreement limitations for that unfortunately, we have only schnell
https://huggingface.co/OpenVINO/FLUX.1-schnell-int4-ov

@stsxxx
Copy link
Author

stsxxx commented May 8, 2025

I was using optimum-cli export openvino --model black-forest-labs/FLUX.1-dev --task text-to-image --weight-format fp16 ov_model_flux/ to convert huggingface model into OpenVINO IR format and load the model from ov_model_flux/ directory by calling pipe = openvino_genai.Text2ImagePipeline('ov_model_flux/', device="GPU").

@brmarkus
Copy link

brmarkus commented May 8, 2025

@brmarkus dev model was never uploaed to huggingface hub under openvino account, it has license agreement limitations for that unfortunately, we have only schnell https://huggingface.co/OpenVINO/FLUX.1-schnell-int4-ov

Ok, thank you - now I initiated download, conversion and compression using the Jupyter-Notebook for the model "black-forest-labs/FLUX.1-schnell".
This is going to take a while.

Then I will try to reproduce inference using CPU and GPU from my MS-Win11-Pro, 64GB system memory, Intel Core Ultra 7 155H, using the query "cyberpunk cityscape like Tokyo and New York with tall buildings at dusk, golden hour, cinematic lighting".

@brmarkus
Copy link

brmarkus commented May 8, 2025

With the default parameters:

Pipeline settings
Input text: cyberpunk cityscape like Tokyo and New York with tall buildings at dusk, golden hour, cinematic lighting
Image size: 256 x 256
Seed: 42
Number of steps: 4

With the default checkbox "Use compressed models" activated.

Using the CPU
Progress-bar:
100% 4/4 [01:44<00:00, 22.83s/it]

Image

@brmarkus
Copy link

brmarkus commented May 8, 2025

Using the GPU, same parameters, same checkboxes, using "black-forest-labs/FLUX.1-schnell":

Progress-bar:
100% 4/4 [00:21<00:00,  2.98s/it]

Image

Task-Manager showing GPU-utilization:

Image

@stsxxx your code uses "num_inference_steps = 50", while the Juypter-Notebook uses "Number of steps: 4" only.

Is there a bigger difference between "FLUX.1-schnell" and "FLUX.1-dev"?

@brmarkus
Copy link

brmarkus commented May 8, 2025

I think the term "Non-Commercial Use Only" wouldn't allow me to use "FLUX.1-dev"...

@stsxxx do you see similar values in your environment when using "FLUX.1-schnell" instead, to compare and check regarding your initial question "how to check its status and ensure it's being utilized properly in my setup"?

@stsxxx
Copy link
Author

stsxxx commented May 8, 2025

@brmarkus Thank you for your reply.
The reason I’m using FLUX.1-dev in FP16 with 50 total steps is due to my experimental setup. I’ll give FLUX.1-schnell with INT4 a try, but it would be ideal if FLUX.1-dev is supported.

Also, if possible, could you try running Stable Diffusion XL in FP16 with image size 1024x1024? Since I was able to run it successfully, we could use it as a comparison to check whether my GPU is being utilized correctly.

It’s late night on my end, so I’ll run FLUX.1-schnell tomorrow. Thank you again for your help!

@stsxxx
Copy link
Author

stsxxx commented May 9, 2025

@brmarkus
Hi, i tried to run Flux.1-schnell according to the code provided here: https://huggingface.co/OpenVINO/FLUX.1-schnell-fp16-ov. I used only 4 steps, but the issue persists—it hasn't completed even after 30 minutes.

@brmarkus
Copy link

brmarkus commented May 9, 2025

Would you have a chance to run the Juypiter notebook

https://github.com/openvinotoolkit/openvino_notebooks/blob/e5a8aa127c9464a356a6767d2fb62b88ed21be3c/notebooks/flux.1-image-generation/flux.1-image-generation.ipynb
?
The notebook uses a compressed and quantized variant in OpenVINO IR format. That could REALLY really make a difference!!

On my CPU (and MS-Win11-Pro, 64GB RAM), I got "4/4 [01:44<00:00, 22.83s/it]" - less than 2 minutes for 4 iterations.
And on the GPU I got "4/4 [00:21<00:00,  2.98s/it]" - less than 30 seconds for 4 iterations.

@stsxxx
Copy link
Author

stsxxx commented May 9, 2025

I'll give it a try. Does this mean that other model formats aren't supported here? Even in the provided notebook, it mentions that you can use FLUX.1-dev by simply switching. I'm using the weights from your model hub, but it's not working—so I believe there may be another issue at play.

@brmarkus
Copy link

OpenVINO supports different formats (like ONNX and others), but there is also an optimized format called IntermediateRepresentation "IR".
Plus, in this case, INT4 is used, but also INT8 or FP16 or FP32 (or BF16) could be used.
In addition the modell got compressed.
You could use tools like "model_analyzer" to compare the different variants.
Depending on the underlying hardware, specific (CPU-)instructions are used.
In the IR format, there are two files, a XML- and a BIN-file. Please replace both when switching models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants