Voice-to-Text AI Agent

An autonomous agent for converting M4A voice recordings to text using LLM processing.

Features

Convert M4A audio files to text transcriptions
Preprocess audio for improved transcription quality
Enhance transcriptions using LLM processing
Autonomous workflow with error handling
Simple command-line interface
Optional web interface (with Streamlit)

Installation

Clone the repository:

git clone https://github.com/yourusername/voice-to-text-ai.git
cd voice-to-text-ai

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up API keys:

# Create a .env file with your API keys
echo "OPENAI_API_KEY=your_openai_api_key" > .env

Usage

Command Line Interface

Convert an M4A file to text:

python -m src.main /path/to/audio.m4a

With additional options:

python -m src.main /path/to/audio.m4a --enhance-formatting --fix-punctuation --summarize

Python API

from src.agent.agent import VoiceToTextAgent

# Initialize the agent
agent = VoiceToTextAgent()

# Process a file
result = agent.process_file(
    file_path="/path/to/audio.m4a",
    options={
        "enhance_formatting": True,
        "fix_punctuation": True,
        "summarize": False
    }
)

# Access the result
print(result.text)

Web Interface (Optional)

Start the Streamlit web interface:

streamlit run src/web/app.py

Then open your browser at http://localhost:8501

Dependencies

Python 3.9+
pydub/ffmpeg for audio conversion
OpenAI Whisper for speech recognition
OpenAI API for LLM processing
Pydantic for data validation
Streamlit (optional) for web UI

Project Structure

voice_to_text_ai/
├── src/
│   ├── audio/                # Audio processing modules
│   ├── transcription/        # Transcription modules
│   ├── llm/                  # LLM integration
│   ├── agent/                # Agent logic
│   └── utils/                # Utility functions
├── tests/                    # Test directory
└── examples/                 # Example usage and sample files

Development

Run tests:

pytest

Run linting:

flake8 src tests

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src		src
tests		tests
.gitignore		.gitignore
PLANNING.md		PLANNING.md
README.md		README.md
TASK.md		TASK.md
env.sample		env.sample
requirements.txt		requirements.txt
test_langfuse_connection.py		test_langfuse_connection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice-to-Text AI Agent

Features

Installation

Usage

Command Line Interface

Python API

Web Interface (Optional)

Dependencies

Project Structure

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

safio/voice-to-text-ai

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Text AI Agent

Features

Installation

Usage

Command Line Interface

Python API

Web Interface (Optional)

Dependencies

Project Structure

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages