Open
Description
Description
I'm trying to use GenerateREADME
and maximize the underlying LLM's context window. But unfortunately I can't figure out easily what that magical value is, because model_max_tokens
isn't the length of the final input sent to the LLM.
For instance, I'm trying to consume the entire 128k context window. And I'm doing a bunch of trials:
patchwork GenerateREADME ... model_max_tokens=128_000
===>Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 255511 tokens
patchwork GenerateREADME ... model_max_tokens=64_000
===>Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 191511 tokens
patchwork GenerateREADME ... model_max_tokens=30_000
===>Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 157511 tokens
So I need to keep guessing.
Proposed solution
Have an option to e.g. set model_max_tokens=-1
, which would mean the maximum window allowed by the underlying LLM, once all the other tokens you're sending under the hood are accounted for.
Alternatives considered
n/a
Metadata
Metadata
Assignees
Labels
No labels