Skip to content

Manually converted Qwen2.5-Coder-0.5B-Instruct does not work, but onnx-community/Qwen2.5-0.5B-Instruct does #1415

@visitsb

Description

@visitsb

System Info

Test environment

Component Version
transformers.js ^3.7.2
Chrome browser Latest

Environment used to convert HF model to ONNX-

Component Version
transformers.js main
scripts\convert.py requirements.txt

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I am trying to run the example webgpu-chat from the main branch to simply generate a text output within the browser.

When I use onnx-community/Qwen2.5-0.5B-Instruct onnx model, it works and I get an output from the model.

However, when I use convert.py from scripts folder to convert HF Qwen2.5-Coder-0.5B-Instruct into onnx, and use the custom model, then there is exception and no output is generated.

Reproduction

I cloned the HF model locally to a new folder custom and successfully generated the ONNX using below command-

python -m scripts.convert --model-id custom/Qwen2.5-Coder-0.5B-Instruct --quantize --task text-generation

Below is the code snippet I used-

import {pipeline } from '@huggingface/transformers';const model_id = "onnx-community/Qwen2.5-0.5B-Instruct";const generator = await pipeline("text-generation", model_id, { dtype: "fp16", device: "webgpu" });const text = 'def hello_world():';const output = await generator(text, { max_new_tokens: 45,temperature: 0.5,top_k: 0.5,do_sample: false});console.debug(output[0].generated_text);
  • onnx-community/Qwen2.5-Coder-0.5B-Instruct works correctly and I get an output.
  • custom/Qwen2.5-Coder-0.5B-Instruct throws an exception from await generator with a message 2374903832 :-/

I also tried an alternative code snippet-

import {AutoTokenizer,AutoModelForCausalLM} from '@huggingface/transformers';const model_id = "onnx-community/Qwen2.5-0.5B-Instruct";const tokenizer = await AutoTokenizer.from_pretrained(model_id);const model = await AutoModelForCausalLM.from_pretrained(model_id, {dtype: 'fp16',device: 'webgpu' });const generated_text = tokenizer.decode(outputs[0], { skip_special_tokens: true });
console.debug(generated_text);
  • Again onnx-community/Qwen2.5-0.5B-Instruct works correctly and I get an output.
  • But using custom/Qwen2.5-0.5B-Instruct gives below runtime error :-/
Uncaught (in promise) RuntimeError: Aborted(). Build with -sASSERTIONS for more info.
    at L (ort-wasm-simd-threaded.jsep.mjs:22:101)
    at tb (ort-wasm-simd-threaded.jsep.mjs:60:98)
    at ort-wasm-simd-threaded.jsep.wasm:0x1365fa
    at ort-wasm-simd-threaded.jsep.wasm:0x404b9
    at ort-wasm-simd-threaded.jsep.wasm:0x75f9b
    at ort-wasm-simd-threaded.jsep.wasm:0xd24b
    at ort-wasm-simd-threaded.jsep.wasm:0x1bdf9
    at ort-wasm-simd-threaded.jsep.wasm:0x9d8cc
    at ort-wasm-simd-threaded.jsep.wasm:0xed51eb
    at ort-wasm-simd-threaded.jsep.wasm:0x114f19

Expected behavior-

  • ONNX models generated using scripts/convert.py should work without errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingv4

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions