Donkey TTS :: Real-Time Text-to-Speech Streaming with XTTS

This repository provides a FastAPI-based service for real-time text-to-speech (TTS) streaming using the XTTS model from Coqui TTS. It allows you to synthesize speech in various languages, cloning the voice of pre-loaded speakers from .wav files.

Features

Real-time Streaming: Delivers audio as an MP3 stream, enabling immediate playback.
Voice Cloning: Uses XTTS to clone voices from provided speaker audio samples.
Multi-language Support: Synthesizes speech in multiple languages supported by XTTS.
Paragraph and Sentence Handling: Splits input text into paragraphs and sentences, generating audio with appropriate pauses.
Speaker Management: Loads speaker audio samples from a designated directory (speakers/) on application startup.
Error Handling and Logging: Provides robust error handling and logging for debugging and monitoring.
FastAPI Integration: Built with FastAPI for high performance and ease of use.

Prerequisites

Python 3.10
CUDA-enabled GPU (recommended for performance)
PyTorch
Coqui TTS (TTS)
FastAPI
Pydantic
SoundFile
Pydub
Transformers

Installation

Clone the repository:

git clone https://github.com/dirkjanbuter/donkey-tts.git
cd donkey-tts

Create a virtual environment (recommended):

sudo apt install python3.10 python3.10-venv # On debian/ ubuntu
python3.10 -m venv venv
source venv/bin/activate  # On Linux/macOS
venv\Scripts\activate  # On Windows

Install dependencies:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # For GPU
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu # For CPU
pip install -r requirements.txt

Download the XTTS model:

Download the XTTS model from Coqui TTS model releases and place it in the model/ directory.
Prepare speaker audio samples:

Place .wav files of speaker voice samples in the speakers/ directory. The filenames (excluding the .wav extension) will be used as speaker IDs.
Optional: Docker

docker compose build
docker compose up

Usage

Run the FastAPI application:
```
uvicorn main:app --reload
```
(Replace main with the name of your python file if it is different)
Send a POST request to the /tts_stream/ endpoint:

Use a tool like curl or Postman to send a POST request with the following form data:
- text: The text to be synthesized.
- language: The language of the text.
- speaker_id: The ID of the speaker (filename without .wav).
Example curl commands:
```
curl -X POST -F "text=Hello, this is a test." -F "language=en" -F "speaker_id=yvonta" http://127.0.0.1:8979/tts_stream/ > output.mp3
```
This will save the generated audio stream to output.mp3.
```
curl --connect-timeout 30 --max-time 0 -X POST -F "text=Welcome to Donkey TTS!" -F "language=en" -F "speaker_id=yvonta" http://127.0.0.1:8979/tts_stream/ | mpg123 -q -
```
This wil stream and play the text in realtime

Speaker Management

 Place your speaker's `.wav` files inside the `speakers/` folder.

 The application loads these speakers into memory on startup.

 The filename (without the `.wav` extension) becomes the speaker ID.

Notes

 Ensure you have a compatible GPU and CUDA setup for optimal performance.

 The XTTS model requires significant GPU memory.

 Adjust padding and silence durations in the code as needed for your specific use case.

 Ensure that your speaker files are 24000hz, mono, and in wav format.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bug fixes, feature requests, or improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
-buffer		-buffer
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
install.docker		install.docker
install.sh		install.sh
main-last-working1.py		main-last-working1.py
main-opus.py		main-opus.py
main-org2.py		main-org2.py
main-org3.py		main-org3.py
main-org4.py		main-org4.py
main-org5-preloadeds-peakers.py		main-org5-preloadeds-peakers.py
main-org6.py		main-org6.py
main-org7.py		main-org7.py
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh
stream.sh		stream.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Donkey TTS :: Real-Time Text-to-Speech Streaming with XTTS

Features

Prerequisites

Installation

Usage

Speaker Management

Notes

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dirkjanbuter/donkey-tts

Folders and files

Latest commit

History

Repository files navigation

Donkey TTS :: Real-Time Text-to-Speech Streaming with XTTS

Features

Prerequisites

Installation

Usage

Speaker Management

Notes

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages