You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Identify and resolve discrepancies in prompt token calculation for chat completion requests in the vLLM client. Currently, there's a consistent mismatch between the actual usage reported by vLLM server and estimated token count (~100–150 additional tokens) through custom tokenizer.
This issue will track improvements separately from the PR #43 to avoid scope creep.
The text was updated successfully, but these errors were encountered:
One of the other ideas here would be to convert the current tokenizer component to a more specific token counter component. This way we can have different kinds of token counters - based on API response and configuration - and simplify the vllm client code that can be extended to other model clients. As an example,
Identify and resolve discrepancies in prompt token calculation for chat completion requests in the vLLM client. Currently, there's a consistent mismatch between the actual usage reported by vLLM server and estimated token count (~100–150 additional tokens) through custom tokenizer.
This issue will track improvements separately from the PR #43 to avoid scope creep.
The text was updated successfully, but these errors were encountered: