An autonomous agent for converting M4A voice recordings to text using LLM processing.
- Convert M4A audio files to text transcriptions
- Preprocess audio for improved transcription quality
- Enhance transcriptions using LLM processing
- Autonomous workflow with error handling
- Simple command-line interface
- Optional web interface (with Streamlit)
- Clone the repository:
git clone https://github.com/yourusername/voice-to-text-ai.git
cd voice-to-text-ai
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up API keys:
# Create a .env file with your API keys
echo "OPENAI_API_KEY=your_openai_api_key" > .env
Convert an M4A file to text:
python -m src.main /path/to/audio.m4a
With additional options:
python -m src.main /path/to/audio.m4a --enhance-formatting --fix-punctuation --summarize
from src.agent.agent import VoiceToTextAgent
# Initialize the agent
agent = VoiceToTextAgent()
# Process a file
result = agent.process_file(
file_path="/path/to/audio.m4a",
options={
"enhance_formatting": True,
"fix_punctuation": True,
"summarize": False
}
)
# Access the result
print(result.text)
Start the Streamlit web interface:
streamlit run src/web/app.py
Then open your browser at http://localhost:8501
- Python 3.9+
- pydub/ffmpeg for audio conversion
- OpenAI Whisper for speech recognition
- OpenAI API for LLM processing
- Pydantic for data validation
- Streamlit (optional) for web UI
voice_to_text_ai/
├── src/
│ ├── audio/ # Audio processing modules
│ ├── transcription/ # Transcription modules
│ ├── llm/ # LLM integration
│ ├── agent/ # Agent logic
│ └── utils/ # Utility functions
├── tests/ # Test directory
└── examples/ # Example usage and sample files
Run tests:
pytest
Run linting:
flake8 src tests
MIT