DLL load failed while importing cuda_utils: The specified module could not be found.

Got this error:

D:\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-flash-attention
Adding extra search path checkpoints D:\webui_forge_cu121_torch231\webui\models\Stable-diffusion
Adding extra search path configs D:\webui_forge_cu121_torch231\webui\models\Stable-diffusion
Adding extra search path vae D:\webui_forge_cu121_torch231\webui\models\VAE
Adding extra search path vae_approx D:\webui_forge_cu121_torch231\webui\models\VAE-approx
Adding extra search path loras D:\webui_forge_cu121_torch231\webui\models\Lora
Adding extra search path loras D:\webui_forge_cu121_torch231\webui\models\LyCORIS
Adding extra search path hypernetworks D:\webui_forge_cu121_torch231\webui\models\hypernetworks
Adding extra search path diffusers D:\webui_forge_cu121_torch231\webui\models\diffusers
Adding extra search path controlnet D:\webui_forge_cu121_torch231\webui\models\ControlNet
Adding extra search path clip D:\webui_forge_cu121_torch231\webui\models\text_encoder
Adding extra search path embeddings D:\webui_forge_cu121_torch231\webui\embeddings
Adding extra search path upscale_models D:\webui_forge_cu121_torch231\webui\models\ESRGAN
Adding extra search path upscale_models D:\webui_forge_cu121_torch231\webui\models\RealESRGAN
Adding extra search path upscale_models D:\webui_forge_cu121_torch231\webui\models\SwinIR
[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2025-05-01 14:50:32.336
** Platform: Windows
** Python version: 3.12.9 (tags/v3.12.9:fdb8142, Feb  4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)]
** Python executable: D:\ComfyUI_windows_portable\python_embeded\python.exe
** ComfyUI Path: D:\ComfyUI_windows_portable\ComfyUI
** ComfyUI Base Folder Path: D:\ComfyUI_windows_portable\ComfyUI
** User directory: D:\ComfyUI_windows_portable\ComfyUI\user
** ComfyUI-Manager config path: D:\ComfyUI_windows_portable\ComfyUI\user\default\ComfyUI-Manager\config.ini
** Log path: D:\ComfyUI_windows_portable\ComfyUI\user\comfyui.log

Prestartup times for custom nodes:
   1.5 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-manager

Checkpoint files will always be loaded safely.
Total VRAM 24564 MB, total RAM 65129 MB
pytorch version: 2.7.0+cu128
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
Using Flash Attention
Python version: 3.12.9 (tags/v3.12.9:fdb8142, Feb  4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)]
ComfyUI version: 0.3.30
ComfyUI frontend version: 1.17.11
[Prompt Server] web root: D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\comfyui_frontend_package\static
Traceback (most recent call last):
  File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 2128, in load_custom_node
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1132, in get_code
  File "<frozen importlib._bootstrap_external>", line 1190, in get_data
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\ComfyUI_windows_portable\\ComfyUI\\custom_nodes\\ComfyUI\\__init__.py'

Cannot import D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI module for custom nodes: [Errno 2] No such file or directory: 'D:\\ComfyUI_windows_portable\\ComfyUI\\custom_nodes\\ComfyUI\\__init__.py'
### Loading: ComfyUI-Manager (V3.31.13)
[ComfyUI-Manager] network_mode: public
### ComfyUI Revision: 163 [a97f2f85] *DETACHED | Released on '2025-04-24'
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
PyTorch version 2.7.0+cu128 available.
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json

INFO  ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
Optimum library found. GPTQ model loading enabled (requires suitable backend).
HiDream: Successfully registered with ComfyUI memory management
--------------------------------------------------
HiDream Sampler Node Initialized
Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']
--------------------------------------------------

Import times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py
   0.0 seconds (IMPORT FAILED): D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI
   0.1 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-manager
   0.8 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH ComfyRegistry Data: 5/83
FETCH ComfyRegistry Data: 10/83
FETCH ComfyRegistry Data: 15/83
FETCH ComfyRegistry Data: 20/83
FETCH ComfyRegistry Data: 25/83
FETCH ComfyRegistry Data: 30/83
FETCH ComfyRegistry Data: 35/83
FETCH ComfyRegistry Data: 40/83
got prompt
Failed to validate prompt for output 2:
* HiDreamSamplerAdvanced 19:
  - Value 77.0 bigger than max of 5.0: llama_weight
  - Value 256 bigger than max of 218: max_length_openclip
Output will be ignored
Successfully parsed resolution: 1024x1024
Using fixed resolution: 1024x1024 (1024 × 1024 (Square))
HiDream: Initial VRAM usage: 0.00 MB
Loading model for dev-nf4...
--- Loading Model Type: dev-nf4 ---
Model Path: azaneko/HiDream-I1-Dev-nf4
NF4: True, Requires BNB: False, Requires GPTQ deps: True
Using Uncensored LLM: None
(Start VRAM: 0.00 MB)
Cache check for key: dev-nf4_standard
Cache contains: []

[1a] Preparing LLM (GPTQ): ModelCloud/Meta-Llama-3.1-8B-Instruct-gptq-4bit
     Setting max memory limit: 9GiB of 24.0GiB
     Using device_map='auto'.
[1b] Loading Tokenizer: ModelCloud/Meta-Llama-3.1-8B-Instruct-gptq-4bit...
D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\huggingface_hub\file_download.py:144: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\fdlou\.cache\huggingface\hub\models--ModelCloud--Meta-Llama-3.1-8B-Instruct-gptq-4bit. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
     Tokenizer loaded.
FETCH ComfyRegistry Data: 45/83
[1c] Loading Text Encoder: ModelCloud/Meta-Llama-3.1-8B-Instruct-gptq-4bit... (May download files)
FETCH ComfyRegistry Data: 50/83
FETCH ComfyRegistry Data: 55/83
FETCH ComfyRegistry Data: 60/83
FETCH ComfyRegistry Data: 65/83
FETCH ComfyRegistry Data: 70/83
FETCH ComfyRegistry Data: 75/83
FETCH ComfyRegistry Data: 80/83
FETCH ComfyRegistry Data [DONE]
[ComfyUI-Manager] default cache updated: https://api.comfy.org/nodes
FETCH DATA from: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json [DONE]
[ComfyUI-Manager] All startup tasks have been completed.
INFO   Kernel: Auto-selection: adding candidate `TritonV2QuantLinear`
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
INFO  Format: Converting `checkpoint_format` from `gptq` to internal `gptq_v2`.
INFO  Format: Converting GPTQ v1 to v2
INFO  Format: Conversion complete: 0.008009910583496094s
INFO  Optimize: `TritonV2QuantLinear` compilation triggered.
✅ Text encoder loaded! (VRAM: 5467.26 MB)

[2] Preparing Transformer from: azaneko/HiDream-I1-Dev-nf4
     Type: NF4
     Loading Transformer... (May download files)
D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\huggingface_hub\file_download.py:144: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\fdlou\.cache\huggingface\hub\models--azaneko--HiDream-I1-Dev-nf4. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
     Moving Transformer to CUDA...
✅ Transformer loaded! (VRAM: 14646.96 MB)

[3] Preparing Scheduler: FlashFlowMatchEulerDiscreteScheduler (Default shift: 6.0)
     Using Scheduler: FlashFlowMatchEulerDiscreteScheduler

[4] Loading Pipeline from: azaneko/HiDream-I1-Dev-nf4
     Passing pre-loaded components...
Fetching 24 files: 100%|█████████████████████████████████████████████████████████████████████████████████████| 24/24 [07:49<00:00, 19.56s/it]
Keyword arguments {'transformer': None} are not expected by HiDreamImagePipeline and will be ignored.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 99.86it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 24.68it/s]
     Pipeline structure loaded.

[5] Finalizing Pipeline...
     Assigning transformer...
     Moving pipeline object to CUDA (final check)...
     Attempting CPU offload for NF4...
     ✅ CPU offload enabled.
✅ Pipeline ready! (VRAM: 12642.45 MB)
Model dev-nf4 loaded & cached!
Selected Shift Value: 0.0 (Override: 0.0, Default: 6.0)
Using model's default scheduler type: FlashFlowMatchEulerDiscreteScheduler with shift=0.0
Creating Generator on: cuda:0

--- Starting Generation ---
Model: dev-nf4, Res: 1024x1024, Steps: 28, CFG: 0.0, Shift: 0.0, Seed: 42
Using standard sequence lengths: CLIP-L: 77, OpenCLIP: 150, T5: 256, Llama: 256
Skipping pipe.to(cuda:0) (CPU offload enabled).
Executing pipeline inference...
!!! ERROR during execution: DLL load failed while importing cuda_utils: The specified module could not be found.
Traceback (most recent call last):
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hidreamsampler.py", line 679, in generate
    pipeline_output = pipe(
                      ^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hi_diffusers\pipelines\hidream_image\pipeline_hidream_image.py", line 646, in __call__
    ) = self.encode_prompt(
        ^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hi_diffusers\pipelines\hidream_image\pipeline_hidream_image.py", line 331, in encode_prompt
    prompt_embeds, pooled_prompt_embeds = self._encode_prompt(
                                          ^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hi_diffusers\pipelines\hidream_image\pipeline_hidream_image.py", line 480, in _encode_prompt
    llama3_prompt_embeds = self._get_llama3_prompt_embeds(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hi_diffusers\pipelines\hidream_image\pipeline_hidream_image.py", line 278, in _get_llama3_prompt_embeds
    outputs = self.text_encoder_4(
              ^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\utils\generic.py", line 965, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\utils\deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 821, in forward
    outputs: BaseModelOutputWithPast = self.model(
                                       ^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\utils\generic.py", line 965, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 571, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 318, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 252, in forward
    query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\gptqmodel\nn_modules\qlinear\tritonv2.py", line 146, in forward
    out = QuantLinearFunction.apply(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\autograd\function.py", line 575, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\amp\autocast_mode.py", line 510, in decorate_fwd
    return fwd(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\gptqmodel\nn_modules\triton_utils\dequant.py", line 134, in forward
    output = quant_matmul(input, qweight, scales, qzeros, g_idx, bits, pack_bits, maxq)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\gptqmodel\nn_modules\triton_utils\dequant.py", line 125, in quant_matmul
    W = dequant(input.dtype, qweight, scales, qzeros, g_idx, bits, pack_bits, maxq)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\gptqmodel\nn_modules\triton_utils\dequant.py", line 109, in dequant
    dequant_kernel[grid](
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\jit.py", line 345, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\autotuner.py", line 171, in run
    ret = self.fn.run(
          ^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\jit.py", line 607, in run
    device = driver.active.get_current_device()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
                ^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\driver.py", line 9, in _create_driver
    return actives[0]()
           ^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\backends\nvidia\driver.py", line 412, in __init__
    self.utils = CudaUtils()  # TODO: make static
                 ^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\backends\nvidia\driver.py", line 90, in __init__
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\backends\nvidia\driver.py", line 72, in compile_module_from_src
    mod = importlib.util.module_from_spec(spec)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 813, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1293, in create_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
ImportError: DLL load failed while importing cuda_utils: The specified module could not be found.
Original image dimensions: 1024x1024, aspect ratio: 1.000
Selected target resolution: 1024x1024
Processed to: 1024x1024 (divisible by 16)
HiDream: Initial VRAM usage: 12682.58 MB
Clearing img2img cache before loading dev-nf4...
  Removing 'dev-nf4'...
Cache cleared.
Loading model for dev-nf4 img2img...
--- Loading Model Type: dev-nf4 ---
Model Path: azaneko/HiDream-I1-Dev-nf4
NF4: True, Requires BNB: False, Requires GPTQ deps: True
Using Uncensored LLM: True
(Start VRAM: 48.45 MB)
Cache check for key: dev-nf4_uncensored
Cache contains: []

[1a] Preparing Uncensored LLM (GPTQ): shuttercat/DarkIdol-Llama3.1-NF4-GPTQ
     Setting max memory limit: 9GiB of 24.0GiB
     Using device_map='auto'.
[1b] Loading Tokenizer: shuttercat/DarkIdol-Llama3.1-NF4-GPTQ...
D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\huggingface_hub\file_download.py:144: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\fdlou\.cache\huggingface\hub\models--shuttercat--DarkIdol-Llama3.1-NF4-GPTQ. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
     Tokenizer loaded.
[1c] Loading Text Encoder: shuttercat/DarkIdol-Llama3.1-NF4-GPTQ... (May download files)
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [03:56<00:00, 118.40s/it]
INFO   Kernel: Auto-selection: adding candidate `TritonV2QuantLinear`
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.18s/it]
INFO  Format: Converting `checkpoint_format` from `gptq` to internal `gptq_v2`.
INFO  Format: Conversion complete: 0.003000974655151367s
✅ Text encoder loaded! (VRAM: 5515.72 MB)

[2] Preparing Transformer from: azaneko/HiDream-I1-Dev-nf4
     Type: NF4
     Loading Transformer... (May download files)
     Moving Transformer to CUDA...
✅ Transformer loaded! (VRAM: 14695.41 MB)

[3] Preparing Scheduler: FlashFlowMatchEulerDiscreteScheduler (Default shift: 6.0)
     Using Scheduler: FlashFlowMatchEulerDiscreteScheduler

[4] Loading Pipeline from: azaneko/HiDream-I1-Dev-nf4
     Passing pre-loaded components...
Keyword arguments {'transformer': None} are not expected by HiDreamImagePipeline and will be ignored.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 153.62it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 33.98it/s]
     Pipeline structure loaded.

[5] Finalizing Pipeline...
     Assigning transformer...
     Moving pipeline object to CUDA (final check)...
     Attempting CPU offload for NF4...
     ✅ CPU offload enabled.
✅ Pipeline ready! (VRAM: 12690.91 MB)
Creating img2img pipeline from loaded txt2img pipeline...
Model dev-nf4 loaded & cached for img2img!
Selected Shift Value: 0.0 (Override: 0.0, Default: 6.0)
Using model's default scheduler: FlashFlowMatchEulerDiscreteScheduler with shift=0.0
Creating Generator on: cuda:0

--- Starting Img2Img Generation ---
Model: dev-nf4 (uncensored), Input Size: 1024x1024
Denoising: 0.8000000000000002, Steps: 28, CFG: 0.0, Shift: 0.0, Seed: 532986874756016
Skipping pipe.to(cuda:0) (CPU offload enabled).
Executing pipeline inference...
!!! ERROR during execution: DLL load failed while importing cuda_utils: The specified module could not be found.
Traceback (most recent call last):
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hidreamsampler.py", line 1472, in generate
    output_images = pipe(
                    ^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hi_diffusers\pipelines\hidream_image\pipeline_hidream_image_to_image.py", line 96, in __call__
    ) = self.encode_prompt(
        ^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hi_diffusers\pipelines\hidream_image\pipeline_hidream_image.py", line 331, in encode_prompt
    prompt_embeds, pooled_prompt_embeds = self._encode_prompt(
                                          ^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hi_diffusers\pipelines\hidream_image\pipeline_hidream_image.py", line 480, in _encode_prompt
    llama3_prompt_embeds = self._get_llama3_prompt_embeds(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_HiDream-Sampler\hi_diffusers\pipelines\hidream_image\pipeline_hidream_image.py", line 278, in _get_llama3_prompt_embeds
    outputs = self.text_encoder_4(
              ^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\utils\generic.py", line 965, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\utils\deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 821, in forward
    outputs: BaseModelOutputWithPast = self.model(
                                       ^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\utils\generic.py", line 965, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 571, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 318, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 252, in forward
    query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\accelerate\hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\gptqmodel\nn_modules\qlinear\tritonv2.py", line 146, in forward
    out = QuantLinearFunction.apply(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\autograd\function.py", line 575, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\amp\autocast_mode.py", line 510, in decorate_fwd
    return fwd(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\gptqmodel\nn_modules\triton_utils\dequant.py", line 134, in forward
    output = quant_matmul(input, qweight, scales, qzeros, g_idx, bits, pack_bits, maxq)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\gptqmodel\nn_modules\triton_utils\dequant.py", line 125, in quant_matmul
    W = dequant(input.dtype, qweight, scales, qzeros, g_idx, bits, pack_bits, maxq)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\gptqmodel\nn_modules\triton_utils\dequant.py", line 109, in dequant
    dequant_kernel[grid](
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\jit.py", line 345, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\autotuner.py", line 171, in run
    ret = self.fn.run(
          ^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\jit.py", line 607, in run
    device = driver.active.get_current_device()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
                ^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\runtime\driver.py", line 9, in _create_driver
    return actives[0]()
           ^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\backends\nvidia\driver.py", line 412, in __init__
    self.utils = CudaUtils()  # TODO: make static
                 ^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\backends\nvidia\driver.py", line 90, in __init__
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\triton\backends\nvidia\driver.py", line 72, in compile_module_from_src
    mod = importlib.util.module_from_spec(spec)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 813, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1293, in create_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
ImportError: DLL load failed while importing cuda_utils: The specified module could not be found.
Prompt executed in 1402.83 seconds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DLL load failed while importing cuda_utils: The specified module could not be found. #135

ComfyUI-Manager: installing dependencies done.

Loading: ComfyUI-Manager (V3.31.13)

ComfyUI Revision: 163 [a97f2f85] *DETACHED | Released on '2025-04-24'

INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
Optimum library found. GPTQ model loading enabled (requires suitable backend).
HiDream: Successfully registered with ComfyUI memory management

HiDream Sampler Node Initialized
Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DLL load failed while importing cuda_utils: The specified module could not be found. #135

Description

ComfyUI-Manager: installing dependencies done.

Loading: ComfyUI-Manager (V3.31.13)

ComfyUI Revision: 163 [a97f2f85] *DETACHED | Released on '2025-04-24'

INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness. Optimum library found. GPTQ model loading enabled (requires suitable backend). HiDream: Successfully registered with ComfyUI memory management

HiDream Sampler Node Initialized Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
Optimum library found. GPTQ model loading enabled (requires suitable backend).
HiDream: Successfully registered with ComfyUI memory management

HiDream Sampler Node Initialized
Available Models: ['full-nf4', 'dev-nf4', 'fast-nf4', 'full', 'dev', 'fast']