[RFC]: Response format extensions for structured outputs

### Motivation.

Currently, users can provide additional constraints format via `extra_body` in OpenAI client:

```python
from enum import Enum
from pydantic import BaseModel
from openai import OpenAI

simplified_sql_grammar = """
        root ::= select_statement

        select_statement ::= "SELECT " column " from " table " where " condition

        column ::= "col_1 " | "col_2 "

        table ::= "table_1 " | "table_2 "

        condition ::= column "= " number

        number ::= "1 " | "2 "
    """

prompt = (
        "Generate an SQL query to show the 'username' and 'email'"
        "from the 'users' table."
    )

completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        extra_body={"guided_grammar": simplified_sql_grammar},
```

This also applies with `guided_json`, `structural_tag`, `guided_regex`.

While this is pretty convenient for most developers, these fields are still using v0 terminology wrt guided decoding.

With the upcoming v0 deprecation, I think it is the time to have a usage update with this pattern.


### Proposed Change.

OpenAI already recommends users to use `response_format` with [json_schema](https://platform.openai.com/docs/guides/structured-outputs?api-mode=chat)

Given that we already supports `structural_tag` via `response_format` ([example](https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_structured_outputs_structural_tag.html?h=structural+tags)), I propose an extension to `response_format` for the remainder of the fields

```python
completion = client.chat.completions.create(
  model=model, messages=messages,
  response_format={
    "type": "vllm_regex",
    "regex": r"\w+@\w+\.com\n"
  }
)

completion = client.chat.completions.create(
  model=model, messages=messages,
  response_format={
    "type": "vllm_grammar",
    "grammar": """
root ::= select_statement
select_statement ::= "SELECT " column " from " table " where " condition
column ::= "col_1 " | "col_2 "
table ::= "table_1 " | "table_2 "
condition ::= column "= " number
number ::= "1 " | "2 "
"""
  }

completion = client.chat.completions.create(
  model=model, messages=messages,
  response_format={
    "type": "vllm_choice",
    "choice": ["Positive", "Negative", "Neutral"], 
  }
)
```

The previous json_schema + structural tag remains the same.

The field `guided_*` will still works previously, but will reserved only for more advance usage and won't be documented.

### Feedback Period.

1 week for revision, 2-3 days for implementations plan (mostly frontend + protocol updates)

We can also add a debug log recommending using this new pattern for all existing usage of `guided_*` (so that we won't break production)

### CC List.

@russellb @mgoin @simon-mo @hmellor 

### Any Other Things.

This is mostly frontend changes

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Response format extensions for structured outputs #19097

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Response format extensions for structured outputs #19097

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions