A TypeScript application that converts text to speech using Google's Gemini API with native TTS capabilities.
- 🎤 Convert text files to high-quality speech audio
- 🎵 30 different voice options available
- 🌍 Supports multiple languages (including Sinhala)
- 📁 Easy file-based input
- 🔧 TypeScript for better development experience
- ⚡ Uses the latest Gemini 2.5 TTS models
- Node.js 18 or later
- pnpm (recommended) or npm
- A Google Gemini API key
- Clone this repository or download the files
- Install dependencies:
pnpm installOr if you prefer npm:
npm install- Set up your environment variables:
Create a .env file in the root directory:
GEMINI_API_KEY=your_gemini_api_key_hereGet your API key from: https://aistudio.google.com/app/apikey
- Make sure your text is in the
text.txtfile (it's already there with Sinhala text) - Run the application:
# Development mode (with ts-node)
pnpm dev
# Or build and run
pnpm build
pnpm start- The audio file will be generated as
sinhala_text_audio.wav
The application supports 30 different voices:
- Bright: Zephyr, Autonoe
- Upbeat: Puck, Laomedeia
- Firm: Kore, Orus, Alnilam
- Informative: Charon, Rasalgethi
- Excitable: Fenrir
- Youthful: Leda
- Easy-going: Umbriel, Callirrhoe
- Clear: Erinome, Iapetus
- Breezy: Aoede
- Breathy: Enceladus
- Smooth: Algieba, Despina
- Gravelly: Algenib
- Soft: Achernar
- Mature: Gacrux
- Casual: Zubenelgenubi
- Forward: Pulcherrima
- Even: Schedar
- Friendly: Achird
- Lively: Sadachbia
- Knowledgeable: Sadaltager
- Gentle: Vindemiatrix
- Warm: Sulafat
You can modify the voice and output filename in src/index.ts:
await tts.convertFileToSpeech('text.txt', {
voiceName: 'Puck', // Change to any available voice
outputFile: 'my_custom_audio.wav'
});The Gemini TTS API supports 24 languages including:
- English (US, India)
- Arabic (Egyptian)
- German, Spanish, French
- Hindi, Indonesian, Italian
- Japanese, Korean
- Portuguese (Brazil)
- Russian, Dutch, Polish
- Thai, Turkish, Vietnamese
- Romanian, Ukrainian
- Bengali, Marathi, Tamil, Telugu
Note: While Sinhala isn't officially listed, the API may auto-detect and process it.
gemini-tts/
├── src/
│ └── index.ts # Main TypeScript application
├── dist/ # Compiled JavaScript (after build)
├── text.txt # Input text file (Sinhala content)
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
└── README.md # This file
pnpm dev- Run in development mode with ts-nodepnpm build- Compile TypeScript to JavaScriptpnpm start- Run the compiled JavaScriptpnpm clean- Clean the dist directory
The application uses the Gemini 2.5 Flash Preview TTS model:
const response = await this.ai.models.generateContent({
model: "gemini-2.5-flash-preview-tts",
contents: [{
role: "user",
parts: [{ text: `Please read this text aloud: ${text}` }]
}],
config: {
responseModalities: ['AUDIO'],
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: {
voiceName: 'Kore'
}
}
}
}
});The application includes comprehensive error handling for:
- Missing API keys
- File reading errors
- API response errors
- Audio file saving errors
- TTS models accept text-only inputs
- Context window limit of 32k tokens
- Audio output is in WAV format at 24kHz
- Preview feature - may have usage limits
-
"GEMINI_API_KEY is required"
- Make sure you've created a
.envfile with your API key - Verify the API key is correct
- Make sure you've created a
-
"Failed to read file text.txt"
- Ensure the
text.txtfile exists in the root directory - Check file permissions
- Ensure the
-
"No audio data received from Gemini API"
- Check your API key has TTS permissions
- Verify the text isn't too long (32k token limit)
Feel free to submit issues and enhancement requests!
MIT License - see LICENSE file for details.