Calculate Accurate Prompt Tokens for Chat Completions in vLLM Client #45

vivekk16 · 2025-03-31T15:01:50Z

Identify and resolve discrepancies in prompt token calculation for chat completion requests in the vLLM client. Currently, there's a consistent mismatch between the actual usage reported by vLLM server and estimated token count (~100–150 additional tokens) through custom tokenizer.

This issue will track improvements separately from the PR #43 to avoid scope creep.

vivekk16 · 2025-04-02T15:14:32Z

Kindly assign this to me.
@SachinVarghese I'd appreciate your inputs on this implementation.

SachinVarghese · 2025-04-03T13:58:51Z

Sounds good @vivekk16

One of the other ideas here would be to convert the current tokenizer component to a more specific token counter component. This way we can have different kinds of token counters - based on API response and configuration - and simplify the vllm client code that can be extended to other model clients. As an example,

class CustomTokenCounter:
    def __init__(self):
        pass

    def count_tokens(self, api_type: APIType, input:str, output:str) -> (input_token_count int, output_token_count int):
        pass

vivekk16 mentioned this issue Mar 31, 2025

Add Custom Tokenizer #43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate Accurate Prompt Tokens for Chat Completions in vLLM Client #45

Calculate Accurate Prompt Tokens for Chat Completions in vLLM Client #45

vivekk16 commented Mar 31, 2025

vivekk16 commented Apr 2, 2025

SachinVarghese commented Apr 3, 2025

Calculate Accurate Prompt Tokens for Chat Completions in vLLM Client #45

Calculate Accurate Prompt Tokens for Chat Completions in vLLM Client #45

Comments

vivekk16 commented Mar 31, 2025

vivekk16 commented Apr 2, 2025

SachinVarghese commented Apr 3, 2025