Force the model to write some tokens mid-generation?

### Feature request

Here’s an example:

User: Hello make a python function for something
Assistant: Here’s an function for that:

def function():
pass
<codetests> ← This is a line we tuned the model to generate
import pytest
assert foo == bar
</codetests> ← Execute the tests right after this token was predicted
Result: tests **succeeded** ← THIS is the forced tokens, we also tuned the model to generate this
Ok, looks like the function is working…

EDIT:

The LLM is trained to respond with the same block given above, however since LLMs are bad at detecting when they have done a mistake they will lean towards saying succeeded for everything.
However after the inference pass for the token “succeeded” there will be a probablity distribution e.g.

succeeded 0.5
failed 0.3
etc.

So I want to “force” the model to pick failed (or succeeded) even though it is a less likely token. Seems like something very simple, but there is no support.

### Motivation

In case you didn't realize the point already, doing this could be opensource LLMs becoming significantly better for agentic workflows. unlike stopping generation, calling tools, and otherwise creating delays, this works right between inference passes. **Agentic workflows for proprietary LLMs can add up costs FAST.**

### Your contribution

I'm not used to this codebase. It seems very complex. but the feature is very simple. Maybe is someone could give me pointers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Force the model to write some tokens mid-generation? #37771

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Force the model to write some tokens mid-generation? #37771

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions