-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the Voxtral-AI-Demo-Local-Interface wiki!
Voxtral‑AI‑Demo‑Local‑Interface is an open-source demonstration interface for Voxtral, Mistral AI’s next-generation speech understanding model. The repository offers a local GUI to interact with Voxtral-Mini/Small models for tasks such as transcription, question-answering, summarization, translation, and function-calling from spoken input.
- Purpose: Provides developers a runnable interface to test Voxtral’s capabilities locally, without requiring cloud APIs.
- Underlying Models: Targets both versions of Voxtral — Voxtral Mini (3 B) for edge/local use, and Voxtral Small (≈24 B) for production-scale tasks (source).
-
Features Demonstrated:
- Real-time speech-to-text transcription
- Audio-based question answering and summarization using large (~32k tokens) context windows
- Voice-triggered function-calling (e.g., “add this to to-do list”)
- Automatic multilingual language detection
- Speech translation capabilities
- Unified Voice Interface: Combines transcription and semantic understanding in a single model pipeline.
- Benchmark Performance: Outperforms Whisper Large-v3 and rivals GPT-4o Mini/Gemini 2.5 Flash in ASR, multilingual tasks, and speech translation (source).
- Low Cost & Open License: Apache 2.0 licensed. More affordable than many proprietary APIs (source).
- Long-Form Context: Handles ~30 minutes of transcription and ~40 minutes for summarization and QA via a 32k token window.
“The Voxtral models are capable of real-world interactions and downstream actions such as summaries, answers, analysis, and insights.”
“They are also cost-effective, with Voxtral Mini Transcribe outperforming OpenAI Whisper for less than half the price.”
— Reddit Users (source)
- Install dependencies: Python, local GPU tools, and environment setup.
- Download model weights from Hugging Face (Mini or Small).
- Launch the local demo UI via terminal or GUI (e.g., Streamlit).
-
Interact via microphone to:
- Transcribe voice
- Ask questions like “What is this audio about?”
- Generate summaries, translations, or voice-triggered actions
Model Variant | Use Case | Parameters | License |
---|---|---|---|
Voxtral Mini (3 B) | Local/edge deployment | ~3B | Apache 2.0 |
Voxtral Small (~24 B) | Production-scale usage | ~24B | Apache 2.0 |
Both are available from Hugging Face.
The Voxtral‑AI‑Demo‑Local‑Interface project demonstrates how to locally deploy advanced voice-AI systems. It eliminates the need for separate ASR and LLM modules by integrating transcription, summarization, QA, and translation into a single workflow.