Skip to content

m-doughty/LLM_Chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM::Chat

Introduction

LLM::Chat is a module for inferencing large language models.

It automatically manages pruning old messages, retaining the system prompt (:sysprompt) & other sticky (:sticky) messages, and inserting messages at depth (:depth).

Example Usage

This is an implementation of a terminal-based conversational loop with LLM::Chat:

#!/usr/bin/env raku

use lib 'lib';

use LLM::Chat::Backend::OpenAICommon;
use LLM::Chat::Backend::Settings;
use LLM::Chat::Conversation;
use LLM::Chat::Conversation::Message;
use LLM::Chat::Template::MistralV7;
use LLM::Chat::TokenCounter;
use Tokenizers;

## EDIT THESE TO MATCH YOUR ENVIRONMENT
constant $API_URL     = 'http://192.168.1.193:5001/v1';
constant $MAX_TOKENS  = 1024;
constant $MAX_CONTEXT = 32768;

my @conversation = (
	LLM::Chat::Conversation::Message.new(
		role      => 'system',
		content   => 'You are a helpful assistant.',
		sysprompt => True
	);
);

my $template      = LLM::Chat::Template::MistralV7.new;

my $tokenizer     = Tokenizers.new-from-json(
	slurp('t/fixtures/tokenizer.json')
);

my $token-counter = LLM::Chat::TokenCounter.new(
	tokenizer => $tokenizer,
	template  => $template,
);

my $settings = LLM::Chat::Backend::Settings.new(
	max_tokens => $MAX_TOKENS,
	max_context => $MAX_CONTEXT,
);

my $con = LLM::Chat::Conversation.new(
	token-counter  => $token-counter,
	context-budget => $MAX_CONTEXT - $MAX_TOKENS,
);

my $backend = LLM::Chat::Backend::OpenAICommon.new(
	api_url  => $API_URL,
	settings => $settings,
);

loop {
	my @lines;
	say "Enter your input. Type 'DONE' on a line by itself when finished:\n";

	loop {
		print "> ";
		my $line = $*IN.get // last;
		last if $line.trim eq 'DONE';
		@lines.push: $line;
	}

	last if @lines.elems == 0;
	@conversation.push: LLM::Chat::Conversation::Message.new(
		role    => 'user',
		content => @lines.join("\n"),
	);
	my @prompt = $con.prepare-for-inference(@conversation);
	my $resp   = $backend.chat-completion-stream(@prompt);

	my $last;
	loop {
		my $new = $resp.latest.subst(/^$last/, '');
		$last   = $resp.latest;

		print $new if $new ne "";
		last if $resp.is-done;
		sleep(0.1);
	}

	print "\n";

	print "ERROR: {$resp.err}\n" if !$resp.is-success;
}

See examples/* and t/* for more usage examples.

Current Support

Inference Types

  • Chat completion (with or without streaming)
  • Text completion (with or without streaming)

API Types

  • OpenAI compatible (most backends) - LLM::Chat::Backend::OpenAICommon
  • KoboldCpp (additional samplers & cancel function) - LLM::Chat::Backend::KoboldCpp

To implement more API types, just extend LLM::Chat::Backend.

Chat Templates

  • ChatML (LLM::Chat::Template::ChatML)
  • Gemma 2 (LLM::Chat::Template::Gemma2)
  • Llama 3 (LLM::Chat::Template::Llama3)
  • Llama 4 (LLM::Chat::Template::Llama4)
  • Mistral V7 (LLM::Chat::Template::MistralV7)

To implement more chat templates, just extend LLM::Chat::Template. You will need a correct chat template for accurate context shifting and/or text completion.

Planned

  • Tool Calling
  • VLM Capabilities
  • More APIs & templates
  • Automatically fetching tokenizers & chat template from HF model identifiers

Contributing

Pull requests and issues welcome.

License

Artistic License 2.0 (C) 2025 Matt Doughty <matt@apogee.guru>

The file at t/fixtures/tokenizer.json is (C) 2025 Mistral AI.

It is extracted from Mistral Nemo, which is an Apache 2.0 licensed model.

About

Simple framework for LLM inferencing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages