Description
Feature request
Here’s an example:
User: Hello make a python function for something
Assistant: Here’s an function for that:
def function():
pass
← This is a line we tuned the model to generate
import pytest
assert foo == bar
← Execute the tests right after this token was predicted
Result: tests succeeded ← THIS is the forced tokens, we also tuned the model to generate this
Ok, looks like the function is working…
EDIT:
The LLM is trained to respond with the same block given above, however since LLMs are bad at detecting when they have done a mistake they will lean towards saying succeeded for everything.
However after the inference pass for the token “succeeded” there will be a probablity distribution e.g.
succeeded 0.5
failed 0.3
etc.
So I want to “force” the model to pick failed (or succeeded) even though it is a less likely token. Seems like something very simple, but there is no support.
Motivation
In case you didn't realize the point already, doing this could be opensource LLMs becoming significantly better for agentic workflows. unlike stopping generation, calling tools, and otherwise creating delays, this works right between inference passes. Agentic workflows for proprietary LLMs can add up costs FAST.
Your contribution
I'm not used to this codebase. It seems very complex. but the feature is very simple. Maybe is someone could give me pointers