Implement Voice Input Functionality with Multiple STT Engines #29

devin-ai-integration · 2025-06-29T23:40:31Z

Implement Voice Input Functionality with Multiple STT Engines

Summary

This PR implements comprehensive voice input functionality for the Onyx chat interface, allowing users to record voice messages and transcribe them to text using multiple Speech-to-Text (STT) engines. The implementation includes:

Frontend Voice Recording Component: React component with MediaRecorder API integration, real-time audio visualization, and comprehensive error handling
Backend Audio Transcription API: FastAPI endpoints supporting multiple STT engines (OpenAI Whisper, Deepgram, Azure Speech, Web Speech API)
Admin Configuration Interface: Settings page for STT engine selection, API key management, and engine testing
Chat Interface Integration: Microphone button added to ChatInputBar with modal voice recording interface

The implementation follows existing Onyx patterns for component architecture, API design, and admin settings management.

Review & Testing Checklist for Human

End-to-end voice input testing: Record voice message in chat interface and verify transcription works correctly
Backend API functionality: Test /api/audio/transcribe endpoint with audio files and verify STT engine configuration endpoints
Cross-browser compatibility: Test voice recording component in Chrome, Firefox, and Safari (MediaRecorder API support varies)
Admin settings validation: Test STT engine configuration forms, API key validation, and engine testing functionality
Error handling and edge cases: Test microphone permission denial, network failures, and invalid audio formats

Note: The backend had startup issues during development due to AWS credentials, so the voice transcription workflow needs thorough testing once the backend is running properly.

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    subgraph "Frontend Components"
        ChatInputBar["web/src/app/chat/input/ChatInputBar.tsx"]:::major-edit
        VoiceRecording["web/src/components/chat/VoiceRecording.tsx"]:::major-edit
        AudioPage["web/src/app/admin/configuration/audio/page.tsx"]:::minor-edit
        AudioConfig["web/src/app/admin/configuration/audio/AudioConfiguration.tsx"]:::major-edit
        ClientLayout["web/src/components/admin/ClientLayout.tsx"]:::minor-edit
    end
    
    subgraph "API Layer"
        AudioAPI["web/src/services/audioApi.ts"]:::minor-edit
    end
    
    subgraph "Backend"
        AudioRouter["backend/onyx/server/audio/api.py"]:::major-edit
        AudioModels["backend/onyx/server/audio/models.py"]:::major-edit
        MainApp["backend/onyx/main.py"]:::minor-edit
    end
    
    subgraph "Config"
        DockerCompose["deployment/docker_compose/docker-compose.dev.yml"]:::context
    end
    
    ChatInputBar --> VoiceRecording
    VoiceRecording --> AudioAPI
    AudioAPI --> AudioRouter
    AudioConfig --> AudioAPI
    ClientLayout --> AudioPage
    AudioPage --> AudioConfig
    AudioRouter --> AudioModels
    MainApp --> AudioRouter
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit  
        L3["Context/No Edit"]:::context
    end
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF

Notes

Link to Devin run: https://app.devin.ai/sessions/0743a9d6fe034dcf8ff4e491f62d3903
Requested by: @acepgh (Sanjay Akut)
Backend startup issue: The development environment had AWS credentials issues preventing full testing. Added dummy AWS credentials to docker-compose.dev.yml as a temporary fix.
Browser compatibility: MediaRecorder API requires HTTPS in production and has varying support across browsers
STT Engine support: Implemented configuration for OpenAI Whisper, Deepgram, Azure Speech Services, and Web Speech API with engine-specific settings

- Add VoiceRecording component with MediaRecorder API integration - Modify ChatInputBar to include microphone button and recording state - Create backend audio API with support for OpenAI, Deepgram, Azure STT - Add admin configuration page for audio settings - Update AdminSidebar to include audio settings section - Create audio API service layer for frontend integration Co-Authored-By: Sanjay Akut <sanjay@tukatek.com>

…for local development Co-Authored-By: Sanjay Akut <sanjay@tukatek.com>

devin-ai-integration · 2025-06-29T23:40:34Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

devin-ai-integration · 2025-07-08T14:43:13Z

Closing due to inactivity for more than 7 days. Configure here.

devin-ai-integration bot and others added 2 commits June 29, 2025 23:29

Fix audio configuration linting issues and add dummy AWS credentials …

b80d80c

…for local development Co-Authored-By: Sanjay Akut <sanjay@tukatek.com>

devin-ai-integration bot closed this Jul 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Voice Input Functionality with Multiple STT Engines #29

Implement Voice Input Functionality with Multiple STT Engines #29

Uh oh!

devin-ai-integration bot commented Jun 29, 2025 •

edited

Loading

Uh oh!

devin-ai-integration bot commented Jun 29, 2025

Uh oh!

devin-ai-integration bot commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Implement Voice Input Functionality with Multiple STT Engines #29

Implement Voice Input Functionality with Multiple STT Engines #29

Uh oh!

Conversation

devin-ai-integration bot commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implement Voice Input Functionality with Multiple STT Engines

Summary

Review & Testing Checklist for Human

Diagram

Notes

Uh oh!

devin-ai-integration bot commented Jun 29, 2025

🤖 Devin AI Engineer

Uh oh!

devin-ai-integration bot commented Jul 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

devin-ai-integration bot commented Jun 29, 2025 •

edited

Loading