[Feature]: Add support for Guided Decoding

## Overview

In our roadmap, we plan to support **guided decoding** in 2025 Q1 as shown here (#71).

**Currently:**

I have tested `vllm/examples/offline_inference/structured_outputs.py` directly on NPU device and the experiment results showed that guided decoding is natively supported on NPU with `outlines` backend.

Plus, I have analysed the code in vLLM and have found that the tensors related to guide logits computation are all on `npu` device, which have also  demonstrated that guided decoding is natively supported on NPU.

However, there are still some problems need to be fixed, such as [<u>incomplete json output</u>](https://github.com/vllm-project/vllm/issues/13683) and [<u>inference speed is too slow</u>](https://github.com/vllm-project/vllm/issues/13821).

Feel free to feedback your issues when using guided decoding with vllm-ascend, and we will try to fix them if we can.

## Usage

**_Coming soon ..._**

## Roadmap

### Community news

  - https://github.com/vllm-project/vllm/pull/17503
  - https://github.com/vllm-project/vllm/pull/17085
  - https://github.com/vllm-project/vllm/pull/15975
  - https://github.com/vllm-project/vllm/pull/15490
  - https://github.com/vllm-project/vllm/pull/14702
  - Add support for xgrammar backend on aarch64:
    - https://github.com/vllm-project/vllm/pull/14590
  - Add support for reasoning model (DeepSeek-R1):
    - https://github.com/vllm-project/vllm/pull/14114
    - https://github.com/vllm-project/vllm/pull/12955

### Adaptation for vllm-ascend

  - [5/N] Refactor for structured output module:
    - [x] https://github.com/vllm-project/vllm/pull/16748
    - [x] https://github.com/vllm-project/vllm/pull/16578
    - [x] https://github.com/vllm-project/vllm-ascend/pull/531
    - [x] https://github.com/vllm-project/vllm-ascend/pull/475
    - [x] https://github.com/vllm-project/vllm/pull/16148
  - [2/N] Bugfix for xgrammar backend:
    - [x] https://github.com/vllm-project/vllm/pull/16954
    - [x] https://github.com/vllm-project/vllm-ascend/pull/555
  - [1/N] Bugfix for guidance backend:
    - [x] https://github.com/vllm-project/vllm/pull/17839

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Add support for Guided Decoding #177

Overview

Usage

Roadmap

Community news

Adaptation for vllm-ascend

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Add support for Guided Decoding #177

Description

Overview

Usage

Roadmap

Community news

Adaptation for vllm-ascend

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions