Skip to content

ContextWindowHelper.cs

suncloudsmoon edited this page Feb 4, 2025 · 1 revision

ContextWindowHelper

Overview

The ContextWindowHelper class provides helper methods to:

  • Determine the context window size (number of tokens) for various language models.
  • Split text into token-based chunks.
  • Convert between character counts and token counts using simple heuristics.

Supported model providers include OpenAI and Ollama.

Supported Providers

  • OpenAI: Uses pre-defined dictionaries for known model identifiers and aliases.
  • Ollama: Retrieves model details from a remote API endpoint.

Classes and Enumerations

ContextWindowHelper

  • Namespace: LLMHelperFunctions
  • Purpose: Offers functions to obtain model context window sizes and to process text for tokenization.

Public Members

  • Enum: ModelProvider
    Identifiers for supported model providers:

    • OpenAI
    • Ollama
  • Method: GetContextWindow(ModelProvider provider, Uri endpoint, string model)
    Asynchronously retrieves the context window size (in tokens) for the specified model.

    • Parameters:
      • provider: The model provider (e.g. ModelProvider.OpenAI or ModelProvider.Ollama).
      • endpoint: The endpoint URI for accessing model information (used for Ollama).
      • model: The identifier or alias for the model.
    • Returns: A task that resolves to an integer representing the context window.
    • Exceptions:
      • ArgumentException if the OpenAI model is unknown.
      • OllamaException if the context window cannot be retrieved from an Ollama provider.
      • NotImplementedException if the specified provider is unsupported.
  • Method: Chunkify(string content, int numTokens)
    Splits a string into chunks based on an estimated number of tokens.

    • Parameters:
      • content: The text to be split.
      • numTokens: The target number of tokens per chunk.
    • Returns: An enumerable of string chunks.
  • Method: CharToTokenCount(int charCount)
    Estimates token count based on the given character count.

    • Note: Uses a heuristic of roughly 1 token per 4 characters.
  • Method: TokenToCharCount(int tokenCount)
    Estimates character count from a token count.

    • Note: Uses a heuristic of roughly 4 characters per token.

Internal Members

  • Class: ContextLenCacheSystem
    Provides a simple caching mechanism for context window values, organized by provider and model name.
    • Methods:
      • Cache(ModelProvider provider, string model, int contextWindow): Caches the context window value.
      • TryGetContextWindow(ModelProvider provider, string model, out int contextWindow): Attempts to retrieve a cached value.
      • CheckModelProviderValidity(ModelProvider provider): Ensures that only supported providers are used.

Usage Examples

Getting the Context Window Size

using System;
using System.Threading.Tasks;
using LLMHelperFunctions;

public class Example
{
    public async Task Run()
    {
        // Example endpoint URI (required for Ollama calls)
        Uri endpoint = new Uri("https://your-ollama-api-endpoint.com/");
        string modelName = "gpt-4"; // Can also be an alias

        // Retrieve context window size for an OpenAI model
        int contextWindow = await ContextWindowHelper.GetContextWindow(
            ContextWindowHelper.ModelProvider.OpenAI,
            endpoint,
            modelName
        );

        Console.WriteLine($"Context window size: {contextWindow} tokens");
    }
}
Clone this wiki locally