Skip to content

feat: Implement ElevenLabs Text-to-Speech #2364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

apappascs
Copy link
Contributor

@apappascs apappascs commented Mar 2, 2025

This commit introduces support for ElevenLabs Text-to-Speech (TTS) service within the Spring AI framework.

Key Changes:

  • New Model Module: Added spring-ai-elevenlabs module for ElevenLabs integration.
  • Core Classes:
    • ElevenLabsTextToSpeechModel: Implements TextToSpeechModel and StreamingTextToSpeechModel for interacting with the ElevenLabs API.
    • ElevenLabsTextToSpeechOptions: Configuration options for the ElevenLabs TTS service.
    • ElevenLabsApi: Low-level client for interacting with the ElevenLabs API.
    • ElevenLabsVoicesApi: client for the elevenLabs Voices API
    • Speech, TextToSpeechMessage, TextToSpeechPrompt, TextToSpeechResponse: Data transfer objects.
  • Auto-configuration:
    • ElevenLabsAutoConfiguration: Spring Boot auto-configuration for easy setup.
    • ElevenLabsConnectionProperties: Configuration properties for ElevenLabs connection.
    • ElevenLabsSpeechProperties: Configuration properties for default TTS settings.
  • API Clients: Provides ElevenLabsApi for direct interaction with the ElevenLabs API. Also provides a ElevenLabsVoicesApi.
  • Tests: Includes comprehensive unit and integration tests.
  • Documentation: Added documentation to the Spring AI reference guide, including examples.

Functionality:

  • Text-to-Speech Conversion: Allows users to convert text input into audio using ElevenLabs' high-quality voices.
  • Streaming Support: Supports real-time audio streaming, enabling immediate playback as audio is generated.
  • Configurable Options: Provides flexible configuration options for voice selection, output format, speed, stability, and more.
  • Spring Boot Starter: Includes a Spring Boot starter (spring-ai-elevenlabs-spring-boot-starter) for simplified dependency management and auto-configuration.

Notes:

  • The classes defnined on tts package will be moved to core-package, along with any required refactoring needed to support OpenAi speech api.

Related Issue
#2371

@apappascs
Copy link
Contributor Author

resolves #2371

@apappascs apappascs force-pushed the feat/spring-ai-elevenlabs branch 2 times, most recently from 336dad7 to 288796e Compare March 19, 2025 16:50
@apappascs
Copy link
Contributor Author

Hi @markpollack , I'm following up to see if there's any visibility on the review timeline for this Elevenlabs PR as bandwidth allows?
Would be awesome to have it

@markpollack markpollack self-assigned this Jun 6, 2025
@markpollack markpollack added this to the 1.1.x milestone Jun 6, 2025
@markpollack
Copy link
Member

Now that GA is past us, we can get back to this. Classes such as TextToSpeechModel now should go in the API package, but on first glance this looks great. Will test drive it, but feel free to start in the direction to merge into the current package/module structure.

This commit introduces support for ElevenLabs Text-to-Speech (TTS) service within the Spring AI framework.

**Key Changes:**

-   **New Model Module:** Added `spring-ai-elevenlabs` module for ElevenLabs integration.
-   **Core Classes:**
    -   `ElevenLabsTextToSpeechModel`: Implements `TextToSpeechModel` and `StreamingTextToSpeechModel` for interacting with the ElevenLabs API.
    -   `ElevenLabsTextToSpeechOptions`: Configuration options for the ElevenLabs TTS service.
    -   `ElevenLabsApi`: Low-level client for interacting with the ElevenLabs API.
    -   `ElevenLabsVoicesApi`: client for the elevenLabs Voices API
    -   `Speech`, `TextToSpeechMessage`, `TextToSpeechPrompt`, `TextToSpeechResponse`:  Data transfer objects.
-   **Auto-configuration:**
    -   `ElevenLabsAutoConfiguration`: Spring Boot auto-configuration for easy setup.
    -   `ElevenLabsConnectionProperties`: Configuration properties for ElevenLabs connection.
    -   `ElevenLabsSpeechProperties`:  Configuration properties for default TTS settings.
-   **API Clients:**  Provides `ElevenLabsApi` for direct interaction with the ElevenLabs API.  Also provides a `ElevenLabsVoicesApi`.
-   **Tests:** Includes comprehensive unit and integration tests.
-   **Documentation:** Added documentation to the Spring AI reference guide, including examples.

**Functionality:**

-   **Text-to-Speech Conversion:** Allows users to convert text input into audio using ElevenLabs' high-quality voices.
-   **Streaming Support:** Supports real-time audio streaming, enabling immediate playback as audio is generated.
-   **Configurable Options:** Provides flexible configuration options for voice selection, output format, speed, stability, and more.
-   **Spring Boot Starter:**  Includes a Spring Boot starter (`spring-ai-elevenlabs-spring-boot-starter`) for simplified dependency management and auto-configuration.

**Notes:**
- The classes defnined on tts package will be moved to core-package, along with any required refactoring needed to support OpenAi speech api.

Signed-off-by: Alexandros Pappas <apappascs@gmail.com>
Signed-off-by: Alexandros Pappas <apappascs@gmail.com>
@apappascs apappascs force-pushed the feat/spring-ai-elevenlabs branch from 288796e to 01507f3 Compare June 10, 2025 10:57
Signed-off-by: Alexandros Pappas <apappascs@gmail.com>
Signed-off-by: Alexandros Pappas <apappascs@gmail.com>
Signed-off-by: Alexandros Pappas <apappascs@gmail.com>
Signed-off-by: Alexandros Pappas <apappascs@gmail.com>
* @param webClientBuilder A builder for the Spring WebClient.
* @param responseErrorHandler A custom error handler for API responses.
*/
public ElevenLabsApi(String baseUrl, ApiKey apiKey, MultiValueMap<String, String> headers,
Copy link
Member

@markpollack markpollack Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be private to force use of the builder. Should have probably done the same for OpenAiAudioApi. can perhaps add @deprecated in the 1.0.x branch.

setModelId(model);
}

public String getModelId() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need these getter and setters? why not just normalize usage to getModel and set the json serialization value to be 'modelId'. The same can be said for the property getVoice and 'voiceId'


private final ElevenLabsTextToSpeechOptions options = new ElevenLabsTextToSpeechOptions();

public Builder modelId(String modelId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the builder should focus on the portable option name 'model' and not 'modelId'. The same for prefering 'voice' and not having a builder method for 'voiceId'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or the opposite... add a 'model' builder method... I think I'll go that route in merging the PR and add javadoc to the similar methods saying they do the same things..

@markpollack
Copy link
Member

I've updated the docs a bit and added a couple tests. what a great PR! Thanks!

I've also added support in spring-ai-integration-tests repo to run these ITs.

commited in 9398850

closing now, if there is something you want to change, just re-open and we can discuss @apappascs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants