-
Notifications
You must be signed in to change notification settings - Fork 0
ContextWindowHelper.cs
suncloudsmoon edited this page Feb 4, 2025
·
1 revision
The ContextWindowHelper
class provides helper methods to:
- Determine the context window size (number of tokens) for various language models.
- Split text into token-based chunks.
- Convert between character counts and token counts using simple heuristics.
Supported model providers include OpenAI and Ollama.
- OpenAI: Uses pre-defined dictionaries for known model identifiers and aliases.
- Ollama: Retrieves model details from a remote API endpoint.
-
Namespace:
LLMHelperFunctions
- Purpose: Offers functions to obtain model context window sizes and to process text for tokenization.
-
Enum:
ModelProvider
Identifiers for supported model providers:OpenAI
Ollama
-
Method:
GetContextWindow(ModelProvider provider, Uri endpoint, string model)
Asynchronously retrieves the context window size (in tokens) for the specified model.-
Parameters:
-
provider
: The model provider (e.g.ModelProvider.OpenAI
orModelProvider.Ollama
). -
endpoint
: The endpoint URI for accessing model information (used for Ollama). -
model
: The identifier or alias for the model.
-
- Returns: A task that resolves to an integer representing the context window.
-
Exceptions:
-
ArgumentException
if the OpenAI model is unknown. -
OllamaException
if the context window cannot be retrieved from an Ollama provider. -
NotImplementedException
if the specified provider is unsupported.
-
-
Parameters:
-
Method:
Chunkify(string content, int numTokens)
Splits a string into chunks based on an estimated number of tokens.-
Parameters:
-
content
: The text to be split. -
numTokens
: The target number of tokens per chunk.
-
- Returns: An enumerable of string chunks.
-
Parameters:
-
Method:
CharToTokenCount(int charCount)
Estimates token count based on the given character count.- Note: Uses a heuristic of roughly 1 token per 4 characters.
-
Method:
TokenToCharCount(int tokenCount)
Estimates character count from a token count.- Note: Uses a heuristic of roughly 4 characters per token.
-
Class:
ContextLenCacheSystem
Provides a simple caching mechanism for context window values, organized by provider and model name.-
Methods:
-
Cache(ModelProvider provider, string model, int contextWindow)
: Caches the context window value. -
TryGetContextWindow(ModelProvider provider, string model, out int contextWindow)
: Attempts to retrieve a cached value. -
CheckModelProviderValidity(ModelProvider provider)
: Ensures that only supported providers are used.
-
-
Methods:
using System;
using System.Threading.Tasks;
using LLMHelperFunctions;
public class Example
{
public async Task Run()
{
// Example endpoint URI (required for Ollama calls)
Uri endpoint = new Uri("https://your-ollama-api-endpoint.com/");
string modelName = "gpt-4"; // Can also be an alias
// Retrieve context window size for an OpenAI model
int contextWindow = await ContextWindowHelper.GetContextWindow(
ContextWindowHelper.ModelProvider.OpenAI,
endpoint,
modelName
);
Console.WriteLine($"Context window size: {contextWindow} tokens");
}
}