Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

This repository contains the code for the paper "Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives" by Ander Artola Velasco, Stratis Tsirtsis, Nastaran Okati and Manuel Gomez-Rodriguez.

Paper abstract

State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the price users pay for an output provided by a model depends on the number of tokens the model uses to generate it—they pay a fixed price per token. In this work, we show that this pricing mechanism creates a financial incentive for providers to strategize and misreport the (number of) tokens a model used to generate an output, and users cannot prove, or even know, whether a provider is overcharging them. However, we also show that, if an unfaithful provider is obliged to be transparent about the generative process used by the model, misreporting optimally without raising suspicion is hard. Nevertheless, as a proof-of-concept, we introduce an efficient heuristic algorithm that allows providers to significantly overcharge users without raising suspicion, highlighting the vulnerability of users under the current pay-per-token pricing mechanism. Further, to completely eliminate the financial incentive to strategize, we introduce a simple incentive-compatible token pricing mechanism. Under this mechanism, the price users pay for an output provided by a model depends on the number of characters of the output—they pay a fixed price per character. Along the way, to illustrate and complement our theoretical results, we conduct experiments with several large language models from the Llama , Gemma and Mistral families, and input prompts from the LMSYS Chatbot Arena platform

Dependencies

All the experiments were performed using Python 3.11.2. In order to create a virtual environment and install the project dependencies you can run the following commands:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

All the experiments were performed using Python 3.11.2. In order to create a virtual environment and install the project dependencies you can run the following commands:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Repository structure

├── data
    └──LMSYS.txt
├── figures
    ├──fixed_string
    └── heur

├── notebooks
├── outputs
    ├──cpt
    ├──fixed_string
    └── heuristic
├── scripts
    ├──script_slurm_heur.sh
│   └──script_slurm_lmsys.sh
└── src
    ├──heuristic_misreporting.py
    ├──LMSYS_generation.py
    ├──tokenizations_fixex_plausible.py
    ├──tokenizations_fixed.py
    ├── tokenizations.py
    └── utils.py

data contains the processed set of LMSYS prompts used
figures contains all the figures presented in the paper.
notebooks contains python notebooks to generate all the figures included in the paper:
- plots_fixed.ipynb plots Figure 1.
- plots_heur.ipynb plots all LMSYS experiment figures.
- process_ds.ipynb builds the LMSYS dataset.
- cpt.ipynb returns the number of characters per token from LMSYS generations.
- appendix_example.ipynb generates the examples in Appendix C.2.
outputs intermediate output files generated by the experiments' scripts and analyzed in the notebooks. They can be generated using the scripts in the src folder.
- cpt contains answers generated to the LMSYS prompts to estimate the number of character-per-token.
- fixed_string contains the results of tokenizations_fixed_plausible.py used to generated Figure 1. That is, it contains counts of plausible tokenizations for the strings language models and causal inference.
- heuristic contains the results of running the heuristic algorithm heuristic_misreporting.py.
scripts contains a set of scripts used to run all the experiments presented in the paper.
src contains all the code necessary to reproduce the results in the paper. Specifically:
- heuristic_misreporting.py is the main script used to create all figures (except Figure 1) in the paper. It implements the misreporting heuristic based on token indices, runs it on prompts (taken from the LMSYS dataset) for multiple iterations, determining the plausibility in the last step, and returns the number of plausible longer tokenizations found.
- tokenizations_fixed_plausible.py is used to create the data for Figure 1 in the paper. It computes all tokenizations of an output string, and computes all top-p/k plausible tokenizations, given a prompt.
- tokenizations_fixed.py computes all tokenizations of an output string, and determines if the longest is also the most likely, given a prompt.
- tokenizations.py contains auxiliary functions for tokenization operations, including finding all possible tokenizations of a string, computing the cumulative autoregressive probability of a token sequence, or verifying if a token sequence is top-p/k plausible.
- utils.py contains auxiliary functions.

Instructions

Downloading the models

Our experiments use LLMs from the Llama, Gemma and Mistral families, which are "gated" models, that is, they require licensing to use. You can request to access it at: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct, https://huggingface.co/google/gemma-3-4b-it and https://huggingface.co/mistralai/Ministral-8B-Instruct-2410. Once you have access, you can download any model in the Llama, Gemma and Mistral families. Then, before running the scripts you need to authenticate with your Hugging Face account by running huggingface-cli login in the terminal. Each model should be downloaded to the models/ folder.

Fixed string experiment

The script tokenizations_fixed_plausible.py generates the output needed to reproduce Figure 1 in the paper. It returns for a given output string (and prompt) the number of top-p/k plausible tokenizations. To reproduce the figure, run the notebook plots_fixed.ipynb.

LMSYS experiment

The script heuristic_misreporting.py generates the output needed to reproduce all figures (except Figure 1). You can run it in your local python environment or use the Slurm submission script on a cluster, using script_slurm_heur.sh with your particular machine specifications. Using script_slurm_heur.sh to run the scripts automatically uses the LMSYS prompts in the file LMSYS.txt. You can use the flags --model to set a specific model, such as meta-llama/Llama-3.2-1B-Instruct, the flag --temperature to set the temperature, --p to set top-p parameter, --prompts to use a list of string as prompts and splits to select how many iterations of the heuristic should be used. To reproduce all the figures, run the notebook plots_heur.ipynb.

Contact & attribution

In case you have questions about the code, you identify potential bugs or you would like us to include additional functionalities, feel free to open an issue or contact Ander Artola Velasco.

If you use parts of the code in this repository for your own research, please consider citing:

@misc{velasco2025llmoverchargingyoutokenization,
      title={Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives}, 
      author={Ander Artola Velasco and Stratis Tsirtsis and Nastaran Okati and Manuel Gomez-Rodriguez},
      year={2025},
      eprint={2505.21627},
      archivePrefix={arXiv},
      primaryClass={cs.GT},
      url={https://arxiv.org/abs/2505.21627}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

Paper abstract

Dependencies

Repository structure

Instructions

Downloading the models

Fixed string experiment

LMSYS experiment

Contact & attribution

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
figures		figures
notebooks		notebooks
outputs		outputs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Networks-Learning/token-pricing

Folders and files

Latest commit

History

Repository files navigation

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

Paper abstract

Dependencies

Repository structure

Instructions

Downloading the models

Fixed string experiment

LMSYS experiment

Contact & attribution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages