llama-cpp
Here are 363 public repositories matching this topic...
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
-
Updated
Mar 7, 2026 - TypeScript
Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. Local VLM video analysis with Qwen, DeepSeek, SmolVLM, LLaVA, MiniMax. LLM-powered agentic security camera agent — watches, understands, remembers & guards your home via Telegram, Discord or Slack. Pluggable AI skills. OpenAI, Google, Anthropic or local AI. Runs on Mac Mini & AI PC.
-
Updated
Mar 7, 2026 - JavaScript
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
-
Updated
Mar 6, 2026 - TypeScript
The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phone. Supports text-to-text, vision, text-to-image
-
Updated
Mar 7, 2026 - TypeScript
Build and run AI agents using Docker Compose. A collection of ready-to-use examples for orchestrating open-source LLMs, tools, and agent runtimes.
-
Updated
Dec 12, 2025 - TypeScript
LLama.cpp rust bindings
-
Updated
Jun 27, 2024 - Rust
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
-
Updated
May 21, 2025 - Python
This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.
-
Updated
Jul 12, 2024 - Python
Complete offline AI ecosystem for Android: Chat (GGUF/LLMs), Images (Stable Diffusion 1.5), Voice (TTS/STT), and Knowledge (RAG Data-Packs), zero subscriptions, no data harvesting. Open-source privacy-first AI on your terms.
-
Updated
Mar 7, 2026 - Kotlin
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
-
Updated
Feb 11, 2026 - Go
Local ML voice chat using high-end models.
-
Updated
Mar 5, 2026 - C++
Improve this page
Add a description, image, and links to the llama-cpp topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llama-cpp topic, visit your repo's landing page and select "manage topics."