A Flask-based web application for realistic voice cloning using OpenVoice v1 and v2. Users can either upload audio or record their voice in-browser, input custom text, select expressive voice styles or accents, and generate natural-sounding cloned audio.
- Upload audio files or use your microphone to record directly.
- OpenVoice v1: voice cloning with style options:
default
,friendly
,cheerful
,excited
,sad
,angry
,terrified
,shouting
,whispering
- OpenVoice v2: accent-aware voice cloning:
American
,British
,Indian
,Australian
,Default
- Real-time audio generation and download/playback support
- Fully client-server integrated with Flask and JS (recorder.js)
.
├── app.py # Flask entry point
├── routes.py # Routing logic
├── requirements.txt # Dependencies
├── style.css # Optional styling overrides
├── templates/
│ └── index.html # Web UI
├── static/
│ ├── recorder.js # Microphone recorder logic
│ ├── uploads/ # Uploaded audio storage
│ └── outputs/ # Generated audio storage
├── services/ # Core voice generation services
│ ├── audio_processing.py
│ ├── openvoice_v1.py
│ └── openvoice_v2.py
├── openvoice/ # Model-related scripts
└── checkpoints/ # Downloaded model weights
├── v1/
│ ├── base_speakers/
│ └── converter/
└── v2/
├── base_speakers/
└── converter/
git clone https://github.com/anshulraj10/ai-voice-replication.git
cd ai-voice-replication
python3 -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
- OpenVoice v1: checkpoints_1226.zip
- OpenVoice v2: checkpoints_v2_0417.zip
Extract into checkpoints/
as follows:
checkpoints/
├── v1/
│ ├── base_speakers/
│ └── converter/
└── v2/
├── base_speakers/
└── converter/
python app.py
Then open your browser at: http://localhost:5000
- Enter the text to clone.
- Upload or record your voice.
- Choose model version:
- V1: pick a style.
- V2: pick an accent.
- Adjust speech speed if needed.
- Click "Generate Voice" to create output.
- Listen or download the generated audio.
- OpenVoice v1 – GitHub
- Supports expressive style-based cloning
- OpenVoice v2 – GitHub
- Adds accent conditioning and improved quality
Both models run locally, ensuring privacy and low latency.
- Flask — backend server
- JavaScript (MediaRecorder API) — for audio recording
- FFmpeg + Librosa + Torchaudio — audio processing
- Torch — for model inference
- dotenv — secret management
We welcome contributions! To contribute:
- Fork this repo
- Create a new branch (
feature/your-feature
) - Commit your changes with clear messages
- Open a pull request with a description
This project is licensed under the MIT License. See the LICENSE file for details.
Created by Anshul Raj
Voice cloning powered by MyShell OpenVoice