GitHub - zkpranav/llm-scheduler: A simple batch scheduler for LLM calls

This repository provides a simple LLM batch scheduler for the Langchain ecosystem.
It serves as a guide and is not intended for production use.

Design Overview

In-memory FIFO queue to store LLM requests.
Condition variables for signaling within the queue.
Futures to signal task completion.
Timeout to prevent starvation.

In a production environment, these mechanisms would typically be replaced with message brokers and an optional persistence layer.

Self-Hosted Model Considerations

For self-hosted models, certain components—such as the KV cache—scale linearly with batch size and sequence length. Even with optimizations like Multi-Query Attention (MQA) and Grouped Query Attention (GQA) the linear relationship remains.

Potential Bottlenecks

Memory Capacity – Large batches or long sequences can quickly exhaust available memory.
Memory Bandwidth – High demand can lead to performance degradation.

Account for these constraints when designing the batching policy.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Design Overview

Self-Hosted Model Considerations

Potential Bottlenecks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zkpranav/llm-scheduler

Folders and files

Latest commit

History

Repository files navigation

Design Overview

Self-Hosted Model Considerations

Potential Bottlenecks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages