Skip to content

Force the model to write some tokens mid-generation? #37771

Open
@blazgocompany

Description

@blazgocompany

Feature request

Here’s an example:

User: Hello make a python function for something
Assistant: Here’s an function for that:

def function():
pass
← This is a line we tuned the model to generate
import pytest
assert foo == bar
← Execute the tests right after this token was predicted
Result: tests succeeded ← THIS is the forced tokens, we also tuned the model to generate this
Ok, looks like the function is working…

EDIT:

The LLM is trained to respond with the same block given above, however since LLMs are bad at detecting when they have done a mistake they will lean towards saying succeeded for everything.
However after the inference pass for the token “succeeded” there will be a probablity distribution e.g.

succeeded 0.5
failed 0.3
etc.

So I want to “force” the model to pick failed (or succeeded) even though it is a less likely token. Seems like something very simple, but there is no support.

Motivation

In case you didn't realize the point already, doing this could be opensource LLMs becoming significantly better for agentic workflows. unlike stopping generation, calling tools, and otherwise creating delays, this works right between inference passes. Agentic workflows for proprietary LLMs can add up costs FAST.

Your contribution

I'm not used to this codebase. It seems very complex. but the feature is very simple. Maybe is someone could give me pointers

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions