Create a server client object in Outlines #1541

RobinPicard · 2025-04-14T14:44:09Z

RobinPicard
Apr 14, 2025
Maintainer

LLM users often like to use a langage model that runs in a server and to which they then make requests (they use vLLM for instance). An issue is that making requests to a server is quite inconvenient as you need to handle requests etc.

@rlouf suggested we create an object in Outlines that would be used to call an llm running on a server. This object would be similar to a Model as it would also be called with a prompt and an output type and would return the completion. On top of providing convenience to users, this feature would help strengthen Outlines' position as the best interface to use LLMs.

A few questions on this topic:

Do you think it's a good idea?
Should we make this object async by default as users who rely on a server often do so to benefit from asynchronicity?
Should we rely on the openai sdk for implementation so we can delegate to it some of the operations?

cpfiffer · 2025-04-14T16:36:41Z

cpfiffer
Apr 14, 2025
Collaborator

To clarify, this is running a server that Outlines handles, or do we assume the user has an external vLLM server they're managing?

General idea notes

If we assume the user has their own server, I think this is a great idea. I've found best practices to be separating inference from logic, such that it's easier to just swap your inference backend from a dev server with a tiny model to a production inference server by just changing the server URL and model.

This also would allow us to replicate the actual value of Instructor, which is serving a consistent API across model providers.

I think we would want to target vLLM first. As you mentioned, vLLM uses the exact same tools the OpenAI , though vLLM supports additional features like guided_regex, guided_grammar, guided_regex, etc. vLLM is OpenAI compatible. I wrote a guide on doing with this LM Studio, which is also OpenAI compliant.

I would delegate most or all of the API to the openai SDK, as it is likely to be the source of truth for most of the OpenAI-compliant implementations. More and more inference providers are standardizing around the "Messages" API that OpenAI uses, and I don't see a reason for that to change.

In principle we could just have a class like Server or something that generalizes the OpenAI API for structured outputs.

Async vs. sync

I would provide a sync client first and then an async variant separately, though I feel like the async variant is a lower priority, mostly because sync is easier to work with. That said, I'm not a big async user, so opinions welcome there.

@RobinPicard, it seems like you have a preference for async-first, is that right?

0 replies

cpfiffer · 2025-04-17T17:22:09Z

cpfiffer
Apr 17, 2025
Collaborator

It also occurs to me that we could maybe create a separate package to fake OpenAI compatibility, similar to our vllm wrapper but across all model types. We'd basically have a FastAPI wrapper that managed a few common arguments, but primarily guided_json, guided_grammar, etc. I could imagine it looking like

import outlines
import outlines_server
import transformers

MODEL_NAME = "TheBloke/Mistral-7B-OpenOrca-AWQ"

model = outlines.from_transformers(
    transformers.AutoModelForCausalLM.from_pretrained(MODEL_NAME),
    transformers.AutoTokenizer.from_pretrained(MODEL_NAME)
)

outlines_server.serve(model)

which would handle incoming schemas/grammars/etc. This would let us provide a sufficient OpenAI wrapper around any of our backends.

0 replies

rlouf · 2025-04-17T18:07:13Z

rlouf
Apr 17, 2025
Maintainer

We want this to be as simple as possible. In the spirit of our other model integrations the user would be in charge of passing the client (OpenAI client for vLLM for instance) and we would only do the output type translation. I think that should make the integrations very low maintenance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a server client object in Outlines #1541

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Create a server client object in Outlines #1541

Uh oh!

RobinPicard Apr 14, 2025 Maintainer

Replies: 3 comments

Uh oh!

cpfiffer Apr 14, 2025 Collaborator

General idea notes

Async vs. sync

Uh oh!

cpfiffer Apr 17, 2025 Collaborator

Uh oh!

rlouf Apr 17, 2025 Maintainer

RobinPicard
Apr 14, 2025
Maintainer

cpfiffer
Apr 14, 2025
Collaborator

cpfiffer
Apr 17, 2025
Collaborator

rlouf
Apr 17, 2025
Maintainer