-
Notifications
You must be signed in to change notification settings - Fork 903
Support returning the current speech segment for VAD. #2397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support returning the current speech segment for VAD. #2397
Conversation
WalkthroughA new method and property were introduced in the VoiceActivityDetector C++ and Python APIs to provide access to the current speech segment as soon as speech is detected, rather than only after the segment completes. Documentation was updated, and an example script was modified to demonstrate the new capability, though the demonstration code is currently disabled. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant VAD (VoiceActivityDetector)
participant AudioBuffer
User->>VAD: AcceptWaveform(audio_chunk)
VAD->>VAD: Process audio, update cur_segment_
User->>VAD: IsSpeechDetected()
alt Speech detected
User->>VAD: CurrentSpeechSegment()
VAD-->>User: Return current speech segment (start, samples)
end
User->>VAD: Front() (when segment completes)
VAD-->>User: Return finalized speech segment
Assessment against linked issues
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
🧰 Additional context used🧬 Code Graph Analysis (1)sherpa-onnx/python/csrc/voice-activity-detector.cc (1)
🔇 Additional comments (10)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR exposes a new current_segment
property in VoiceActivityDetector
to allow retrieving the ongoing speech segment before speech ends.
- Declare and implement
CurrentSpeechSegment
in the C++ core - Bind
current_segment
in the Python extension with documentation - Add a commented example in the Python script demonstrating real-time usage
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
sherpa-onnx/python/csrc/voice-activity-detector.cc | Added Python binding and docstring for current_segment |
sherpa-onnx/csrc/voice-activity-detector.h | Declared CurrentSpeechSegment and added comments |
sherpa-onnx/csrc/voice-activity-detector.cc | Implemented cur_segment_ updates and CurrentSpeechSegment |
python-api-examples/generate-subtitles.py | Provided example usage guarded by if False |
Comments suppressed due to low confidence (3)
sherpa-onnx/python/csrc/voice-activity-detector.cc:26
- [nitpick] The long raw-string doc comment is hard to read and has inconsistent numbering and grammar. Consider reformatting into a concise, bulleted docstring (or use Doxygen tags) with clear sentences and consistent punctuation.
R"(
sherpa-onnx/csrc/voice-activity-detector.h:44
- [nitpick] The comment for
CurrentSpeechSegment
is vague. Document that it returns an empty segment (e.g.start = -1
, empty samples) when no speech is detected, and specify the return semantics clearly in Doxygen style.
SpeechSegment CurrentSpeechSegment() const;
sherpa-onnx/csrc/voice-activity-detector.h:44
- There's no existing unit test for the new
CurrentSpeechSegment
API. Please add tests covering both when speech is ongoing and when no speech is detected to validate the returned segment.
SpeechSegment CurrentSpeechSegment() const;
cur_segment_.start = -1; | ||
cur_segment_.samples.clear(); |
Copilot
AI
Jul 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearing cur_segment_
is duplicated in multiple methods (Reset
, Flush
, and non-speech branch). Consider extracting a private helper like ClearCurrentSegment()
to reduce duplication.
cur_segment_.start = -1; | |
cur_segment_.samples.clear(); | |
ClearCurrentSegment(); |
Copilot uses AI. Check for mistakes.
if False: | ||
# If you want to process the speech segment as soon as | ||
# speech is detected, you can use | ||
current_segment = vad.current_segment | ||
if len(current_segment.samples) > 0: | ||
print( | ||
f"speech starts at {current_segment.start/16000} seconds: ", | ||
f"duration {len(current_segment.samples)/16000} seconds", | ||
) |
Copilot
AI
Jul 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The example code is guarded by if False
, so it never runs. If this is meant for documentation, convert it to a commented block or set the condition to if True
for a runnable demonstration.
if False: | |
# If you want to process the speech segment as soon as | |
# speech is detected, you can use | |
current_segment = vad.current_segment | |
if len(current_segment.samples) > 0: | |
print( | |
f"speech starts at {current_segment.start/16000} seconds: ", | |
f"duration {len(current_segment.samples)/16000} seconds", | |
) | |
# Example: If you want to process the speech segment as soon as | |
# speech is detected, you can use the following code: | |
# current_segment = vad.current_segment | |
# if len(current_segment.samples) > 0: | |
# print( | |
# f"speech starts at {current_segment.start/16000} seconds: ", | |
# f"duration {len(current_segment.samples)/16000} seconds", | |
# ) |
Copilot uses AI. Check for mistakes.
Fixes #2396
With this PR, you don't need to wait until the speaker stops speaking to get the current speech segment.
CC @livefantasia
Summary by CodeRabbit
New Features
Documentation