-
Notifications
You must be signed in to change notification settings - Fork 1k
Add C API for ten-vad #2379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C API for ten-vad #2379
Conversation
WalkthroughSupport for the "ten-vad" voice activity detection model was added across the C API, C++ API, and example programs. Workflow tests were updated to include "ten-vad" scenarios. Configuration structs and logic now handle both "silero-vad" and "ten-vad" models, with conditional runtime selection and parameter initialization based on available model files. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ExampleApp
participant FileSystem
participant SherpaOnnxAPI
User->>ExampleApp: Run with input WAV and VAD models
ExampleApp->>FileSystem: Check for silero_vad.onnx
alt silero-vad exists
ExampleApp->>SherpaOnnxAPI: Initialize with silero-vad config
else ten-vad.onnx exists
ExampleApp->>SherpaOnnxAPI: Initialize with ten-vad config
else
ExampleApp->>User: Print error and exit
end
ExampleApp->>SherpaOnnxAPI: Process audio using selected VAD
Possibly related PRs
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive C API support for the ten-vad voice activity detection model, extending the existing VAD functionality beyond the current silero-vad implementation. The changes enable developers to use ten-vad as an alternative VAD model through both C and C++ APIs.
- Adds TenVadModelConfig structures and configuration handling for ten-vad integration
- Updates example applications to support both silero-vad and ten-vad with automatic model detection
- Extends CI/CD workflows to test ten-vad functionality alongside existing silero-vad tests
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| sherpa-onnx/csrc/ten-vad-model.cc | Fixes float literal precision in LogMel function |
| sherpa-onnx/c-api/cxx-api.h | Adds TenVadModelConfig struct and FileExists function declaration |
| sherpa-onnx/c-api/cxx-api.cc | Implements ten-vad configuration mapping and FileExists wrapper |
| sherpa-onnx/c-api/c-api.h | Adds C struct definitions for ten-vad and reorganizes function declarations |
| sherpa-onnx/c-api/c-api.cc | Implements ten-vad configuration parsing with default values |
| c-api-examples/vad-whisper-c-api.c | Updates example to support both VAD models with automatic detection |
| c-api-examples/vad-sense-voice-c-api.c | Updates example to support both VAD models with automatic detection |
| c-api-examples/vad-moonshine-c-api.c | Updates example to support both VAD models with automatic detection |
| .github/workflows/c-api.yaml | Adds CI test jobs for ten-vad integration with all example applications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
c-api-examples/vad-sense-voice-c-api.c (1)
107-121: Consider refactoring common VAD configuration logic.While the current implementation is correct, the identical VAD configuration logic across all three example files could benefit from a shared utility function to reduce code duplication and improve maintainability.
Consider creating a helper function like:
void configure_vad_model(SherpaOnnxVadModelConfig* config, const char* vad_filename, int32_t use_silero_vad, int32_t use_ten_vad) { if (use_silero_vad) { config->silero_vad.model = vad_filename; config->silero_vad.threshold = 0.25; config->silero_vad.min_silence_duration = 0.5; config->silero_vad.min_speech_duration = 0.5; config->silero_vad.max_speech_duration = 10; config->silero_vad.window_size = 512; } else if (use_ten_vad) { config->ten_vad.model = vad_filename; config->ten_vad.threshold = 0.25; config->ten_vad.min_silence_duration = 0.5; config->ten_vad.min_speech_duration = 0.5; config->ten_vad.max_speech_duration = 10; config->ten_vad.window_size = 256; } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.github/workflows/c-api.yaml(5 hunks)c-api-examples/vad-moonshine-c-api.c(4 hunks)c-api-examples/vad-sense-voice-c-api.c(4 hunks)c-api-examples/vad-whisper-c-api.c(4 hunks)sherpa-onnx/c-api/c-api.cc(1 hunks)sherpa-onnx/c-api/c-api.h(2 hunks)sherpa-onnx/c-api/cxx-api.cc(2 hunks)sherpa-onnx/c-api/cxx-api.h(2 hunks)sherpa-onnx/csrc/ten-vad-model.cc(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
c-api-examples/vad-whisper-c-api.c (2)
sherpa-onnx/c-api/c-api.h (1)
SherpaOnnxFileExists(75-75)sherpa-onnx/c-api/c-api.cc (2)
SherpaOnnxFileExists(2031-2033)SherpaOnnxFileExists(2031-2031)
c-api-examples/vad-moonshine-c-api.c (2)
sherpa-onnx/c-api/c-api.h (1)
SherpaOnnxFileExists(75-75)sherpa-onnx/c-api/c-api.cc (2)
SherpaOnnxFileExists(2031-2033)SherpaOnnxFileExists(2031-2031)
c-api-examples/vad-sense-voice-c-api.c (2)
sherpa-onnx/c-api/c-api.h (1)
SherpaOnnxFileExists(75-75)sherpa-onnx/c-api/c-api.cc (2)
SherpaOnnxFileExists(2031-2033)SherpaOnnxFileExists(2031-2031)
🔇 Additional comments (28)
sherpa-onnx/csrc/ten-vad-model.cc (1)
324-324: LGTM! Good type consistency improvement.Explicitly marking the literal as
1e-10fensures type consistency sincelogf()takes float arguments and the rest of the expression uses float literals.sherpa-onnx/c-api/c-api.cc (1)
1036-1049: LGTM! Consistent implementation following established patterns.The ten-vad configuration initialization properly mirrors the existing silero-vad pattern, using the same macro approach for default value handling. The different window_size default (256 vs 512) appears intentional for the ten-vad model characteristics.
sherpa-onnx/c-api/cxx-api.h (3)
555-562: LGTM! Well-structured configuration struct.The
TenVadModelConfigstruct follows the established pattern ofSileroVadModelConfigwith consistent field types, naming, and appropriate default values that align with the C API implementation.
566-566: LGTM! Consistent API extension.Adding the
ten_vadmember toVadModelConfigproperly extends the API to support the new VAD model while maintaining consistency with the existingsilero_vadstructure.
655-655: LGTM! Useful utility function addition.The
FileExistsfunction declaration provides a clean C++ API wrapper that will be helpful for the runtime model detection functionality described in the AI summary.c-api-examples/vad-whisper-c-api.c (5)
11-16: LGTM! Clear documentation for VAD model downloads.The added comments provide clear instructions for downloading both supported VAD models, improving user experience.
34-37: Good addition of input validation.The file existence check for the input WAV file prevents runtime failures and provides clear error messaging.
39-54: Well-implemented VAD model selection logic.The runtime detection and selection between silero-vad and ten-vad models is correctly implemented with proper priority (silero-vad first) and clear error handling when neither model is available.
104-118: Correct conditional VAD configuration.The model-specific parameter configuration is properly implemented with appropriate window sizes (512 for silero-vad, 256 for ten-vad) and consistent threshold/duration settings across both models.
134-135: Appropriate window size selection.The conditional window size retrieval based on the selected VAD model ensures the processing loop uses the correct parameters for the chosen model.
sherpa-onnx/c-api/cxx-api.cc (2)
658-663: Correct ten_vad configuration implementation.The ten_vad configuration fields are properly copied from the C++ config to the C struct, following the same pattern as the existing silero_vad configuration. All necessary fields (model, threshold, timing parameters, window_size) are included.
768-770: Simple and correct utility function.The FileExists function is properly implemented as a thin wrapper around the C API function, providing a clean C++ interface for file existence checking.
c-api-examples/vad-moonshine-c-api.c (4)
9-14: Consistent documentation across examples.The VAD model download instructions are identical to the whisper example, maintaining consistency across different example programs.
31-51: Consistent implementation of VAD model selection.The file existence checks and VAD model selection logic exactly match the whisper example, ensuring uniform behavior across all example programs.
104-118: Identical VAD configuration pattern.The conditional VAD configuration follows the same pattern as other examples, with appropriate model-specific parameters (window sizes, thresholds) consistently applied.
134-135: Consistent window size handling.The window size selection logic matches the pattern used in other examples, ensuring uniform behavior across all VAD-enabled example programs.
c-api-examples/vad-sense-voice-c-api.c (3)
9-14: Excellent consistency across all examples.The VAD model documentation is identical across all three example programs, providing a uniform user experience regardless of which ASR model is being used.
31-51: Perfect implementation consistency.The VAD model selection logic is identical across all examples, demonstrating excellent code consistency and maintainability.
137-138: Consistent window size implementation.The final example maintains the same window size selection pattern, completing the consistent implementation across all VAD-enabled examples.
sherpa-onnx/c-api/c-api.h (3)
74-76: LGTM: Well-designed utility function.The
SherpaOnnxFileExistsfunction follows established API conventions with clear documentation and appropriate return type semantics. The placement after version functions is logical.
851-874: LGTM: Consistent struct design for ten-vad model.The
SherpaOnnxTenVadModelConfigstruct appropriately mirrors the existing silero VAD configuration, providing type safety while maintaining consistency in field names and documentation. This duplication is standard practice for C APIs.
882-882: LGTM: Proper extension of VAD model configuration.The addition of the
ten_vadmember toSherpaOnnxVadModelConfigcorrectly extends the API to support the new VAD model while maintaining backward compatibility..github/workflows/c-api.yaml (6)
379-379: LGTM: Improved test naming clarity.Renaming the test to specify "silero-vad" improves clarity now that multiple VAD models are supported.
406-432: LGTM: Comprehensive test coverage for ten-vad model.The new test job appropriately mirrors the silero-vad test structure while using the ten-vad model. This ensures consistent test coverage across both VAD implementations.
433-433: LGTM: Consistent test naming improvement.The test name clarification maintains consistency with other VAD test naming changes.
460-486: LGTM: Consistent test coverage extension.The ten-vad + Moonshine test maintains the established pattern and provides necessary coverage for the new VAD model with a different ASR backend.
523-523: LGTM: Maintains naming consistency.The test name update follows the established pattern for clarity across all VAD tests.
562-600: LGTM: Complete test coverage for ten-vad integration.The final ten-vad test with sense-voice completes comprehensive test coverage across all ASR models. The inclusion of diagnostic commands is helpful for troubleshooting.
Summary by CodeRabbit