Skip to content

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

Swap98-Coder/mlx-audio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

MLX Audio Library 🎤

MLX Audio GitHub Release

Welcome to the MLX Audio library! This repository offers a powerful text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) library built on Apple's MLX framework. Designed for Apple Silicon, it provides efficient speech analysis capabilities that leverage the latest advancements in audio processing.

Table of Contents

Features

  • Text-to-Speech (TTS): Convert written text into spoken words with high fidelity.
  • Speech-to-Text (STT): Transcribe spoken language into written text efficiently.
  • Speech-to-Speech (STS): Modify and synthesize speech, allowing for various applications in voice modulation.
  • Optimized for Apple Silicon: Fully utilize the capabilities of Apple’s M1 and M2 chips for superior performance.
  • Multimodal Capabilities: Seamlessly integrate audio processing tasks for a more comprehensive audio experience.
  • Transformer Models: Employ state-of-the-art transformer architectures for improved accuracy in speech recognition and synthesis.

Installation

To get started with MLX Audio, you need to install the library on your Apple Silicon device. Follow these steps:

  1. Clone the Repository:

    git clone https://github.com/Swap98-Coder/mlx-audio.git
  2. Navigate to the Directory:

    cd mlx-audio
  3. Install Dependencies:

    Make sure you have Python 3.8 or higher installed. You can install the required dependencies using pip:

    pip install -r requirements.txt
  4. Download and Execute the Latest Release:

    Visit the Releases section to download the latest version. Follow the instructions in the release notes to execute the library.

Usage

Here’s a quick guide on how to use the MLX Audio library for different functionalities.

Text-to-Speech (TTS)

To convert text to speech, you can use the following code snippet:

from mlx_audio import TextToSpeech

tts = TextToSpeech()
tts.speak("Hello, welcome to the MLX Audio library!")

Speech-to-Text (STT)

For speech recognition, use the STT feature as follows:

from mlx_audio import SpeechToText

stt = SpeechToText()
text = stt.recognize("path/to/audio/file.wav")
print(text)

Speech-to-Speech (STS)

To modify speech, you can utilize the STS feature:

from mlx_audio import SpeechToSpeech

sts = SpeechToSpeech()
modified_audio = sts.transform("path/to/input/audio.wav", effect="robotic")
sts.save(modified_audio, "path/to/output/audio.wav")

Contributing

We welcome contributions to the MLX Audio library! If you would like to help, please follow these steps:

  1. Fork the Repository: Click on the "Fork" button at the top right of the repository page.

  2. Create a Branch: Create a new branch for your feature or bug fix.

    git checkout -b feature/my-feature
  3. Make Changes: Implement your changes and test them thoroughly.

  4. Commit Your Changes:

    git commit -m "Add my feature"
  5. Push to the Branch:

    git push origin feature/my-feature
  6. Open a Pull Request: Go to the original repository and click on "New Pull Request."

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or suggestions, please contact the repository maintainer:

Release Notes

To stay updated with the latest features and fixes, check the Releases section. Download the latest release and follow the instructions to execute the library.

Conclusion

The MLX Audio library provides an efficient and powerful way to handle audio processing tasks on Apple Silicon. Whether you are looking to implement TTS, STT, or STS, this library has you covered. Dive into the code, explore the features, and enhance your applications with advanced speech capabilities.

Thank you for your interest in MLX Audio!

About

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published