Skip to content

Conversation

devin-ai-integration[bot]
Copy link

@devin-ai-integration devin-ai-integration bot commented Jun 29, 2025

Implement Voice Input Functionality with Multiple STT Engines

Summary

This PR implements comprehensive voice input functionality for the Onyx chat interface, allowing users to record voice messages and transcribe them to text using multiple Speech-to-Text (STT) engines. The implementation includes:

  • Frontend Voice Recording Component: React component with MediaRecorder API integration, real-time audio visualization, and comprehensive error handling
  • Backend Audio Transcription API: FastAPI endpoints supporting multiple STT engines (OpenAI Whisper, Deepgram, Azure Speech, Web Speech API)
  • Admin Configuration Interface: Settings page for STT engine selection, API key management, and engine testing
  • Chat Interface Integration: Microphone button added to ChatInputBar with modal voice recording interface

The implementation follows existing Onyx patterns for component architecture, API design, and admin settings management.

Review & Testing Checklist for Human

  • End-to-end voice input testing: Record voice message in chat interface and verify transcription works correctly
  • Backend API functionality: Test /api/audio/transcribe endpoint with audio files and verify STT engine configuration endpoints
  • Cross-browser compatibility: Test voice recording component in Chrome, Firefox, and Safari (MediaRecorder API support varies)
  • Admin settings validation: Test STT engine configuration forms, API key validation, and engine testing functionality
  • Error handling and edge cases: Test microphone permission denial, network failures, and invalid audio formats

Note: The backend had startup issues during development due to AWS credentials, so the voice transcription workflow needs thorough testing once the backend is running properly.


Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    subgraph "Frontend Components"
        ChatInputBar["web/src/app/chat/input/ChatInputBar.tsx"]:::major-edit
        VoiceRecording["web/src/components/chat/VoiceRecording.tsx"]:::major-edit
        AudioPage["web/src/app/admin/configuration/audio/page.tsx"]:::minor-edit
        AudioConfig["web/src/app/admin/configuration/audio/AudioConfiguration.tsx"]:::major-edit
        ClientLayout["web/src/components/admin/ClientLayout.tsx"]:::minor-edit
    end
    
    subgraph "API Layer"
        AudioAPI["web/src/services/audioApi.ts"]:::minor-edit
    end
    
    subgraph "Backend"
        AudioRouter["backend/onyx/server/audio/api.py"]:::major-edit
        AudioModels["backend/onyx/server/audio/models.py"]:::major-edit
        MainApp["backend/onyx/main.py"]:::minor-edit
    end
    
    subgraph "Config"
        DockerCompose["deployment/docker_compose/docker-compose.dev.yml"]:::context
    end
    
    ChatInputBar --> VoiceRecording
    VoiceRecording --> AudioAPI
    AudioAPI --> AudioRouter
    AudioConfig --> AudioAPI
    ClientLayout --> AudioPage
    AudioPage --> AudioConfig
    AudioRouter --> AudioModels
    MainApp --> AudioRouter
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit  
        L3["Context/No Edit"]:::context
    end
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • Link to Devin run: https://app.devin.ai/sessions/0743a9d6fe034dcf8ff4e491f62d3903
  • Requested by: @acepgh (Sanjay Akut)
  • Backend startup issue: The development environment had AWS credentials issues preventing full testing. Added dummy AWS credentials to docker-compose.dev.yml as a temporary fix.
  • Browser compatibility: MediaRecorder API requires HTTPS in production and has varying support across browsers
  • STT Engine support: Implemented configuration for OpenAI Whisper, Deepgram, Azure Speech Services, and Web Speech API with engine-specific settings

devin-ai-integration bot and others added 2 commits June 29, 2025 23:29
- Add VoiceRecording component with MediaRecorder API integration
- Modify ChatInputBar to include microphone button and recording state
- Create backend audio API with support for OpenAI, Deepgram, Azure STT
- Add admin configuration page for audio settings
- Update AdminSidebar to include audio settings section
- Create audio API service layer for frontend integration

Co-Authored-By: Sanjay Akut <sanjay@tukatek.com>
…for local development

Co-Authored-By: Sanjay Akut <sanjay@tukatek.com>
Copy link
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Author

Closing due to inactivity for more than 7 days. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants