Skip to content

Allow model_max_tokens to be set to whatever the LLM maximum is #1233

Open
@iuliaturc

Description

@iuliaturc

Description

I'm trying to use GenerateREADME and maximize the underlying LLM's context window. But unfortunately I can't figure out easily what that magical value is, because model_max_tokens isn't the length of the final input sent to the LLM.

For instance, I'm trying to consume the entire 128k context window. And I'm doing a bunch of trials:

  1. patchwork GenerateREADME ... model_max_tokens=128_000 ===> Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 255511 tokens
  2. patchwork GenerateREADME ... model_max_tokens=64_000 ===> Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 191511 tokens
  3. patchwork GenerateREADME ... model_max_tokens=30_000 ===> Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 157511 tokens

So I need to keep guessing.

Proposed solution

Have an option to e.g. set model_max_tokens=-1, which would mean the maximum window allowed by the underlying LLM, once all the other tokens you're sending under the hood are accounted for.

Alternatives considered

n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions