Skip to content

Navigation Menu

Appearance settings

GitHub Copilot
Write better code with AI

GitHub Spark New
Build and deploy intelligent apps

GitHub Models New
Manage and compare prompts

GitHub Advanced Security
Find and fix vulnerabilities

Actions
Automate any workflow
Codespaces
Instant dev environments

Issues
Plan and track work

Code Review
Manage code changes

Discussions
Collaborate outside of code

Code Search
Find more, search less
Explore

Why GitHub

Documentation

GitHub Skills

Blog
Integrations

GitHub Marketplace
View all features
By company size

Enterprises

Small and medium teams

Startups

Nonprofits
By use case

DevSecOps

DevOps

CI/CD

View all use cases
By industry

Healthcare

Financial services

Manufacturing

Government

View all industries
View all solutions
Topics

AI

DevOps

Security

Software Development

View all
Explore

Learning Pathways

Events & Webinars

Ebooks & Whitepapers

Customer Stories

Partners

Executive Insights
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories

Topics

Trending

Collections
Enterprise platform
AI-powered developer platform
Available add-ons

GitHub Advanced Security
Enterprise-grade security features

Copilot for business
Enterprise-grade AI features

Premium Support
Enterprise-grade 24/7 support
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

QuentinFuxa / WhisperLiveKit Public

Notifications You must be signed in to change notification settings
Fork 545
Star 6.4k

Code
Issues 54
Pull requests 12
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: QuentinFuxa/WhisperLiveKit

Releases · QuentinFuxa/WhisperLiveKit

0.2.8

02 Sep 19:29

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.2.8 Latest

Latest

Dependency and Compatibility Changes

Removed Triton <3 requirement
Tested compatibility with Python 3.14 and 3.15

Performance Improvements

Simulstreaming backend now defaults to MLX-Whisper (if available) or Faster-Whisper (if available) encoders, paired with Whisper cross-attention and decoder using an AlignAtt policy, for increased speed. Can be disabled using --disable-fast-encoder
Encoders are loaded once and shared in Simulstreaming, reducing vRAM usage
Only the decoder of Whisper is loaded when using a different encoder, reducing vRAM usage

Frontend Enhancements

Added a microphone picker
Loads the UI as a single inline HTML file (instead of separate CSS, JS, SVGs and HTML files) for simplified deployment

Bug Fixes and Improvements

Resolved warmup error when no connection is provided or when the language is set to auto
Added pip timeout and retries in Dockerfile when installing Torch/TorchVision/TorchAudio
Fixed issue where an exception is raised when language is set to 'auto' and task is set to 'translation'
Enabled auto-detection of language for warmup if not specified

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

orielhaim, nakheel77, AsadAlihp, and davidgumberg reacted with heart emoji

All reactions

❤️ 4 reactions

4 people reacted

0.2.7

27 Aug 16:29

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.2.7

0.2.7: Diarization Improvements

New default backend: Sortformer is now the default diarization backend, replacing Diart
6x faster processing: Reduced latency from ~2s to ~0.3s on CPU
Significantly improved speaker detection (Constraint: Currently supports up to 4 speakers maximum)
Shared model loading: A single Sortformer model SortformerDiarization is now shared across users and instances to reduce memory footprint. Speaker caches, frames, etc. are handled per user in SortformerDiarizationOnline
Enhanced alignment: Improved time and token synchronization between transcription and diarization results

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

21sean, PLeonLopes, kimjy925, djrtk, and DsopQ reacted with thumbs up emoji

All reactions

👍 5 reactions

5 people reacted

0.2.6

21 Aug 12:35

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.2.6

Voice Activity Control (VAC) by Default: VAC is now enabled by default to improve transcription accuracy by filtering out non-speech segments before processing transcription & diarization. You can disable it with the --no-vac flag.
Simulstreaming Backend Enhancements:
- The simulstreaming backend is now the default transcription backend.
- Improved timestamp accuracy for audio segments longer than 30 seconds.
- Backends models are now recycled to optimize resource usage, by removing whisper hooks at the end of a transcription
- Added the ability to preload multiple backend models using the --preloaded_model_count argument, when several users are espected
Diarization with Silences: The diart diarization backend now correctly handles pauses and silences, improving speaker turn detection.
Time Handling: Aligned time handling between the backend and the frontend for better synchronization.
WebSocket Communication: Buffering is disabled during silent periods.
Default Model: The default model is now base.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

0.2.5

13 Aug 08:24

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.2.5

Build & Dependencies

Migrated to pyproject.toml - Replaced setup.py with PEP-recommended packaging bda72b8
Removed NumPy version constraint - No longer restricted to numpy < 2.0.0 197293e

Backend Architecture

Refactored SimulStreaming backend separation - Improved architecture to allow multiple users to share the same backend Whisper model instance d098af3 197293e
Enhanced performance monitoring - Lag metrics now update every 0.1 seconds and are independent of token emission frequency 2bbdc70
Reduced hallucinations - SimulStreaming is now less likely to generate false transcriptions during silent periods 87b9ed6

Frontend Improvements

Enhanced silence indicators - Now displays three distinct types of silences:
- Model-detected silences ([BLANK_AUDIO])
- Token emission gaps
- End-of-transcription silences
  38b4ebe
Dark theme support - Added dark mode 4e56130
Improved UX during transcription by @davidgumberg
- Screen no longer goes to sleep while transcribing 7f93c4b
- Auto-scroll to latest transcription text 3b96fb8

Screenshot 2025-08-11 at 17 53 50

Contributors

davidgumberg

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

davidgumberg and 21sean reacted with heart emoji

All reactions

❤️ 2 reactions

2 people reacted

0.2.4

02 Aug 12:14

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.2.4

Bug Fixes

Diarization Queue Audio Overlap Fixed a bug where diarization_queue was sent the entire self.pcm_buffer on every iteration, instead of just the latest chunk. PR by @choomegan (commit)
License Display Error Fixed dual license warning display when using simulstreaming backend. 46efbdf

Enhancements

Improved Punctuation Splitting for Diarization Enhanced the use_punctuation_split logic to improve diarization results. Commits: 3ad3683, 5b9977c, 56114d3
Deployment Guide Update Fixed and clarified the Deployment Guide in the README. PR by @luisla-rivas (commit)
Architecture Update e40b5a3
Dockerfile Improvements Updated Dockerfile to install build-essential and update the PyTorch version. - (Idea from @callumgarven) (commit)

Core Updates

Update to latest version of SimulStreaming Fixes warmup with >30s audio files
SimulStreaming Whisper Core Update Updated SimulStreaming whisper core from version 20230918 to 20250625. Solves tensor mismatch on some gpus due to triton version - Commits: 8e056cb, 4cfed6e

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

Arlen54 reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

0.2.2

04 Jul 15:10

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.2.2

New:

Replace ffmpeg-python with raw ffmpeg calls:
- Fixes systematic crashes after 9 minutes on some machines
- Improves reboot and restart handling
- Allows ffmpeg to restart without crashing the server on conversion errors
Update to latest SimulWhisper:
- Adds compatibility with English-only models
- Infers word-level timestamps for better diarization alignment
- Other improvements: https://github.com/ufal/SimulStreaming/commits/main/
Prevent buffer from growing indefinitely when no tokens are created
Fix Hugging Face token file handling in Docker
Remove default 8000 port in WebSocket when no port is provided

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

0.2.1

27 Jun 12:16

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.2.1

New SimulStreaming backend for transcription. Associated preprint: https://arxiv.org/abs/2506.17077

Up to 5 time faster on tiny model: #134 (comment)
Requires to install pip install whisperlivekit[simulstreaming]. Dual licensed: https://github.com/ufal/SimulStreaming?tab=readme-ov-file#-licence-and-contributions
Use it with --backend simulstreaming
SimulStreaming limitations for now:
- No buffer preview is available
- Diarization maybe be less precise
- Punctuation can be less accurate
- English-only model (tiny.en, base.en, medium.en) are not compatible for now

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

0.1.9

19 Jun 14:44

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.1.9

Faster Diarization, Smarter Speaker Splitting

Faster diarization, with buffering logic and fixed-size audio chunks, now aligned with --min-chunk-size for improved real-time performance
Punctuation-based speaker splitting (beta): enables more natural transitions using --punctuation-split
Custom diarization models: use --segmentation-model and --embedding-model to specify alternate backends. See here to get a list of available models

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

CQQ-biu reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

0.1.8

16 Jun 14:58

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.1.8

Changed

TranscriptionEngine (ex WhisperLiveKit ) can now be initialized with parameters directly via its constructor (e.g., TranscriptionEngine(backend="faster-whisper", model="small"), for greater flexibility for programmatic use in addition to command-line argument parsing.

Moved

New module whisperlivekit.parse_args for handling command-line argument parsing.
New module whisperlivekit.web.web_interface for serving the web interface HTML.

0.1.7 -> 0.1.8: 993a835

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

0.1.7

28 May 11:36

QuentinFuxa

Compare

Choose a tag to compare

Loading

0.1.7

Changelog - v0.1.7

Bugs Fixes

Fixed #127 : Transcription with VAC is functional again

Enhancements

Backend logs now update the lag value even when no audio is detected
Added explicit error message when ffmpeg is not found
Frontend now indicates when no audio is detected

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.