Skip to content

Repository for the paper "Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives", Arxiv 2025

Notifications You must be signed in to change notification settings

Networks-Learning/token-pricing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

This repository contains the code for the paper "Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives" by Ander Artola Velasco, Stratis Tsirtsis, Nastaran Okati and Manuel Gomez-Rodriguez.

Paper abstract

State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the price users pay for an output provided by a model depends on the number of tokens the model uses to generate it—they pay a fixed price per token. In this work, we show that this pricing mechanism creates a financial incentive for providers to strategize and misreport the (number of) tokens a model used to generate an output, and users cannot prove, or even know, whether a provider is overcharging them. However, we also show that, if an unfaithful provider is obliged to be transparent about the generative process used by the model, misreporting optimally without raising suspicion is hard. Nevertheless, as a proof-of-concept, we introduce an efficient heuristic algorithm that allows providers to significantly overcharge users without raising suspicion, highlighting the vulnerability of users under the current pay-per-token pricing mechanism. Further, to completely eliminate the financial incentive to strategize, we introduce a simple incentive-compatible token pricing mechanism. Under this mechanism, the price users pay for an output provided by a model depends on the number of characters of the output—they pay a fixed price per character. Along the way, to illustrate and complement our theoretical results, we conduct experiments with several large language models from the Llama , Gemma and Mistral families, and input prompts from the LMSYS Chatbot Arena platform

Dependencies

All the experiments were performed using Python 3.11.2. In order to create a virtual environment and install the project dependencies you can run the following commands:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

All the experiments were performed using Python 3.11.2. In order to create a virtual environment and install the project dependencies you can run the following commands:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Repository structure

├── data
    └──LMSYS.txt
├── figures
    ├──fixed_string
    └── heur

├── notebooks
├── outputs
    ├──cpt
    ├──fixed_string
    └── heuristic
├── scripts
    ├──script_slurm_heur.sh
│   └──script_slurm_lmsys.sh
└── src
    ├──heuristic_misreporting.py
    ├──LMSYS_generation.py
    ├──tokenizations_fixex_plausible.py
    ├──tokenizations_fixed.py
    ├── tokenizations.py
    └── utils.py
  • data contains the processed set of LMSYS prompts used
  • figures contains all the figures presented in the paper.
  • notebooks contains python notebooks to generate all the figures included in the paper:
    • plots_fixed.ipynb plots Figure 1.
    • plots_heur.ipynb plots all LMSYS experiment figures.
    • process_ds.ipynb builds the LMSYS dataset.
    • cpt.ipynb returns the number of characters per token from LMSYS generations.
    • appendix_example.ipynb generates the examples in Appendix C.2.
  • outputs intermediate output files generated by the experiments' scripts and analyzed in the notebooks. They can be generated using the scripts in the src folder.
    • cpt contains answers generated to the LMSYS prompts to estimate the number of character-per-token.
    • fixed_string contains the results of tokenizations_fixed_plausible.py used to generated Figure 1. That is, it contains counts of plausible tokenizations for the strings language models and causal inference.
    • heuristic contains the results of running the heuristic algorithm heuristic_misreporting.py.
  • scripts contains a set of scripts used to run all the experiments presented in the paper.
  • src contains all the code necessary to reproduce the results in the paper. Specifically:
    • heuristic_misreporting.py is the main script used to create all figures (except Figure 1) in the paper. It implements the misreporting heuristic based on token indices, runs it on prompts (taken from the LMSYS dataset) for multiple iterations, determining the plausibility in the last step, and returns the number of plausible longer tokenizations found.
    • tokenizations_fixed_plausible.py is used to create the data for Figure 1 in the paper. It computes all tokenizations of an output string, and computes all top-p/k plausible tokenizations, given a prompt.
    • tokenizations_fixed.py computes all tokenizations of an output string, and determines if the longest is also the most likely, given a prompt.
    • tokenizations.py contains auxiliary functions for tokenization operations, including finding all possible tokenizations of a string, computing the cumulative autoregressive probability of a token sequence, or verifying if a token sequence is top-p/k plausible.
    • utils.py contains auxiliary functions.

Instructions

Downloading the models

Our experiments use LLMs from the Llama, Gemma and Mistral families, which are "gated" models, that is, they require licensing to use. You can request to access it at: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct, https://huggingface.co/google/gemma-3-4b-it and https://huggingface.co/mistralai/Ministral-8B-Instruct-2410. Once you have access, you can download any model in the Llama, Gemma and Mistral families. Then, before running the scripts you need to authenticate with your Hugging Face account by running huggingface-cli login in the terminal. Each model should be downloaded to the models/ folder.

Fixed string experiment

The script tokenizations_fixed_plausible.py generates the output needed to reproduce Figure 1 in the paper. It returns for a given output string (and prompt) the number of top-p/k plausible tokenizations. To reproduce the figure, run the notebook plots_fixed.ipynb.

LMSYS experiment

The script heuristic_misreporting.py generates the output needed to reproduce all figures (except Figure 1). You can run it in your local python environment or use the Slurm submission script on a cluster, using script_slurm_heur.sh with your particular machine specifications. Using script_slurm_heur.sh to run the scripts automatically uses the LMSYS prompts in the file LMSYS.txt. You can use the flags --model to set a specific model, such as meta-llama/Llama-3.2-1B-Instruct, the flag --temperature to set the temperature, --p to set top-p parameter, --prompts to use a list of string as prompts and splits to select how many iterations of the heuristic should be used. To reproduce all the figures, run the notebook plots_heur.ipynb.

Contact & attribution

In case you have questions about the code, you identify potential bugs or you would like us to include additional functionalities, feel free to open an issue or contact Ander Artola Velasco.

If you use parts of the code in this repository for your own research, please consider citing:

@misc{velasco2025llmoverchargingyoutokenization,
      title={Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives}, 
      author={Ander Artola Velasco and Stratis Tsirtsis and Nastaran Okati and Manuel Gomez-Rodriguez},
      year={2025},
      eprint={2505.21627},
      archivePrefix={arXiv},
      primaryClass={cs.GT},
      url={https://arxiv.org/abs/2505.21627}, 
}

About

Repository for the paper "Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives", Arxiv 2025

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published