Skip to content

Conversation

@csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Jul 12, 2025

Summary by CodeRabbit

  • New Features
    • Added support for the ten-vad voice activity detection model alongside existing silero-vad support in C API examples and configuration.
    • Users can now select between ten-vad and silero-vad models for voice activity detection in example applications.
  • Bug Fixes
    • Improved error handling for missing model files in example applications.
  • Documentation
    • Updated comments and usage instructions to reflect ten-vad support and model download options.
  • Chores
    • Enhanced automated tests to cover ten-vad integration with various speech recognition models.

@csukuangfj csukuangfj requested a review from Copilot July 12, 2025 03:57
@coderabbitai
Copy link

coderabbitai bot commented Jul 12, 2025

Walkthrough

Support for the "ten-vad" voice activity detection model was added across the C API, C++ API, and example programs. Workflow tests were updated to include "ten-vad" scenarios. Configuration structs and logic now handle both "silero-vad" and "ten-vad" models, with conditional runtime selection and parameter initialization based on available model files.

Changes

File(s) Change Summary
.github/workflows/c-api.yaml Added workflow jobs for "ten-vad" with Whisper, Moonshine, and sense-voice; renamed existing "silero-vad" jobs.
c-api-examples/vad-moonshine-c-api.c
c-api-examples/vad-sense-voice-c-api.c
c-api-examples/vad-whisper-c-api.c
Example programs now support both "silero-vad" and "ten-vad" models, with dynamic selection and config updates.
sherpa-onnx/c-api/c-api.h Added SherpaOnnxTenVadModelConfig struct, updated main config struct, and declared SherpaOnnxFileExists.
sherpa-onnx/c-api/c-api.cc Initialized vad_config.ten_vad fields in GetVadModelConfig.
sherpa-onnx/c-api/cxx-api.h Added TenVadModelConfig struct, updated VadModelConfig, and declared FileExists function.
sherpa-onnx/c-api/cxx-api.cc Supported ten_vad config in VoiceActivityDetector::Create; implemented FileExists utility.
sherpa-onnx/csrc/ten-vad-model.cc Changed constant from 1e-10 to 1e-10f for float precision in LogMel.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ExampleApp
    participant FileSystem
    participant SherpaOnnxAPI

    User->>ExampleApp: Run with input WAV and VAD models
    ExampleApp->>FileSystem: Check for silero_vad.onnx
    alt silero-vad exists
        ExampleApp->>SherpaOnnxAPI: Initialize with silero-vad config
    else ten-vad.onnx exists
        ExampleApp->>SherpaOnnxAPI: Initialize with ten-vad config
    else
        ExampleApp->>User: Print error and exit
    end
    ExampleApp->>SherpaOnnxAPI: Process audio using selected VAD
Loading

Possibly related PRs

  • k2-fsa/sherpa-onnx#2377: Implements support for the "ten-vad" VAD model, including config structs, model selection, and example updates—directly related at the code level.

Poem

In the land of code where models dwell,
A new VAD hops in—ten-vad as well!
Now silero and ten-vad both can play,
Detecting speech in a clever way.
With configs set and tests anew,
This rabbit cheers, “Great job, crew!”
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive C API support for the ten-vad voice activity detection model, extending the existing VAD functionality beyond the current silero-vad implementation. The changes enable developers to use ten-vad as an alternative VAD model through both C and C++ APIs.

  • Adds TenVadModelConfig structures and configuration handling for ten-vad integration
  • Updates example applications to support both silero-vad and ten-vad with automatic model detection
  • Extends CI/CD workflows to test ten-vad functionality alongside existing silero-vad tests

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
sherpa-onnx/csrc/ten-vad-model.cc Fixes float literal precision in LogMel function
sherpa-onnx/c-api/cxx-api.h Adds TenVadModelConfig struct and FileExists function declaration
sherpa-onnx/c-api/cxx-api.cc Implements ten-vad configuration mapping and FileExists wrapper
sherpa-onnx/c-api/c-api.h Adds C struct definitions for ten-vad and reorganizes function declarations
sherpa-onnx/c-api/c-api.cc Implements ten-vad configuration parsing with default values
c-api-examples/vad-whisper-c-api.c Updates example to support both VAD models with automatic detection
c-api-examples/vad-sense-voice-c-api.c Updates example to support both VAD models with automatic detection
c-api-examples/vad-moonshine-c-api.c Updates example to support both VAD models with automatic detection
.github/workflows/c-api.yaml Adds CI test jobs for ten-vad integration with all example applications

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
c-api-examples/vad-sense-voice-c-api.c (1)

107-121: Consider refactoring common VAD configuration logic.

While the current implementation is correct, the identical VAD configuration logic across all three example files could benefit from a shared utility function to reduce code duplication and improve maintainability.

Consider creating a helper function like:

void configure_vad_model(SherpaOnnxVadModelConfig* config, 
                        const char* vad_filename, 
                        int32_t use_silero_vad, 
                        int32_t use_ten_vad) {
  if (use_silero_vad) {
    config->silero_vad.model = vad_filename;
    config->silero_vad.threshold = 0.25;
    config->silero_vad.min_silence_duration = 0.5;
    config->silero_vad.min_speech_duration = 0.5;
    config->silero_vad.max_speech_duration = 10;
    config->silero_vad.window_size = 512;
  } else if (use_ten_vad) {
    config->ten_vad.model = vad_filename;
    config->ten_vad.threshold = 0.25;
    config->ten_vad.min_silence_duration = 0.5;
    config->ten_vad.min_speech_duration = 0.5;
    config->ten_vad.max_speech_duration = 10;
    config->ten_vad.window_size = 256;
  }
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da9f303 and e362da4.

📒 Files selected for processing (9)
  • .github/workflows/c-api.yaml (5 hunks)
  • c-api-examples/vad-moonshine-c-api.c (4 hunks)
  • c-api-examples/vad-sense-voice-c-api.c (4 hunks)
  • c-api-examples/vad-whisper-c-api.c (4 hunks)
  • sherpa-onnx/c-api/c-api.cc (1 hunks)
  • sherpa-onnx/c-api/c-api.h (2 hunks)
  • sherpa-onnx/c-api/cxx-api.cc (2 hunks)
  • sherpa-onnx/c-api/cxx-api.h (2 hunks)
  • sherpa-onnx/csrc/ten-vad-model.cc (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
c-api-examples/vad-whisper-c-api.c (2)
sherpa-onnx/c-api/c-api.h (1)
  • SherpaOnnxFileExists (75-75)
sherpa-onnx/c-api/c-api.cc (2)
  • SherpaOnnxFileExists (2031-2033)
  • SherpaOnnxFileExists (2031-2031)
c-api-examples/vad-moonshine-c-api.c (2)
sherpa-onnx/c-api/c-api.h (1)
  • SherpaOnnxFileExists (75-75)
sherpa-onnx/c-api/c-api.cc (2)
  • SherpaOnnxFileExists (2031-2033)
  • SherpaOnnxFileExists (2031-2031)
c-api-examples/vad-sense-voice-c-api.c (2)
sherpa-onnx/c-api/c-api.h (1)
  • SherpaOnnxFileExists (75-75)
sherpa-onnx/c-api/c-api.cc (2)
  • SherpaOnnxFileExists (2031-2033)
  • SherpaOnnxFileExists (2031-2031)
🔇 Additional comments (28)
sherpa-onnx/csrc/ten-vad-model.cc (1)

324-324: LGTM! Good type consistency improvement.

Explicitly marking the literal as 1e-10f ensures type consistency since logf() takes float arguments and the rest of the expression uses float literals.

sherpa-onnx/c-api/c-api.cc (1)

1036-1049: LGTM! Consistent implementation following established patterns.

The ten-vad configuration initialization properly mirrors the existing silero-vad pattern, using the same macro approach for default value handling. The different window_size default (256 vs 512) appears intentional for the ten-vad model characteristics.

sherpa-onnx/c-api/cxx-api.h (3)

555-562: LGTM! Well-structured configuration struct.

The TenVadModelConfig struct follows the established pattern of SileroVadModelConfig with consistent field types, naming, and appropriate default values that align with the C API implementation.


566-566: LGTM! Consistent API extension.

Adding the ten_vad member to VadModelConfig properly extends the API to support the new VAD model while maintaining consistency with the existing silero_vad structure.


655-655: LGTM! Useful utility function addition.

The FileExists function declaration provides a clean C++ API wrapper that will be helpful for the runtime model detection functionality described in the AI summary.

c-api-examples/vad-whisper-c-api.c (5)

11-16: LGTM! Clear documentation for VAD model downloads.

The added comments provide clear instructions for downloading both supported VAD models, improving user experience.


34-37: Good addition of input validation.

The file existence check for the input WAV file prevents runtime failures and provides clear error messaging.


39-54: Well-implemented VAD model selection logic.

The runtime detection and selection between silero-vad and ten-vad models is correctly implemented with proper priority (silero-vad first) and clear error handling when neither model is available.


104-118: Correct conditional VAD configuration.

The model-specific parameter configuration is properly implemented with appropriate window sizes (512 for silero-vad, 256 for ten-vad) and consistent threshold/duration settings across both models.


134-135: Appropriate window size selection.

The conditional window size retrieval based on the selected VAD model ensures the processing loop uses the correct parameters for the chosen model.

sherpa-onnx/c-api/cxx-api.cc (2)

658-663: Correct ten_vad configuration implementation.

The ten_vad configuration fields are properly copied from the C++ config to the C struct, following the same pattern as the existing silero_vad configuration. All necessary fields (model, threshold, timing parameters, window_size) are included.


768-770: Simple and correct utility function.

The FileExists function is properly implemented as a thin wrapper around the C API function, providing a clean C++ interface for file existence checking.

c-api-examples/vad-moonshine-c-api.c (4)

9-14: Consistent documentation across examples.

The VAD model download instructions are identical to the whisper example, maintaining consistency across different example programs.


31-51: Consistent implementation of VAD model selection.

The file existence checks and VAD model selection logic exactly match the whisper example, ensuring uniform behavior across all example programs.


104-118: Identical VAD configuration pattern.

The conditional VAD configuration follows the same pattern as other examples, with appropriate model-specific parameters (window sizes, thresholds) consistently applied.


134-135: Consistent window size handling.

The window size selection logic matches the pattern used in other examples, ensuring uniform behavior across all VAD-enabled example programs.

c-api-examples/vad-sense-voice-c-api.c (3)

9-14: Excellent consistency across all examples.

The VAD model documentation is identical across all three example programs, providing a uniform user experience regardless of which ASR model is being used.


31-51: Perfect implementation consistency.

The VAD model selection logic is identical across all examples, demonstrating excellent code consistency and maintainability.


137-138: Consistent window size implementation.

The final example maintains the same window size selection pattern, completing the consistent implementation across all VAD-enabled examples.

sherpa-onnx/c-api/c-api.h (3)

74-76: LGTM: Well-designed utility function.

The SherpaOnnxFileExists function follows established API conventions with clear documentation and appropriate return type semantics. The placement after version functions is logical.


851-874: LGTM: Consistent struct design for ten-vad model.

The SherpaOnnxTenVadModelConfig struct appropriately mirrors the existing silero VAD configuration, providing type safety while maintaining consistency in field names and documentation. This duplication is standard practice for C APIs.


882-882: LGTM: Proper extension of VAD model configuration.

The addition of the ten_vad member to SherpaOnnxVadModelConfig correctly extends the API to support the new VAD model while maintaining backward compatibility.

.github/workflows/c-api.yaml (6)

379-379: LGTM: Improved test naming clarity.

Renaming the test to specify "silero-vad" improves clarity now that multiple VAD models are supported.


406-432: LGTM: Comprehensive test coverage for ten-vad model.

The new test job appropriately mirrors the silero-vad test structure while using the ten-vad model. This ensures consistent test coverage across both VAD implementations.


433-433: LGTM: Consistent test naming improvement.

The test name clarification maintains consistency with other VAD test naming changes.


460-486: LGTM: Consistent test coverage extension.

The ten-vad + Moonshine test maintains the established pattern and provides necessary coverage for the new VAD model with a different ASR backend.


523-523: LGTM: Maintains naming consistency.

The test name update follows the established pattern for clarity across all VAD tests.


562-600: LGTM: Complete test coverage for ten-vad integration.

The final ten-vad test with sense-voice completes comprehensive test coverage across all ASR models. The inclusion of diagnostic commands is helpful for troubleshooting.

@csukuangfj csukuangfj merged commit ceb1bc5 into k2-fsa:master Jul 12, 2025
118 of 228 checks passed
@csukuangfj csukuangfj deleted the c-api-ten-vad branch July 12, 2025 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant