fix: use original segment length for vad_end time mapping#3703
Open
Acelogic wants to merge 1 commit intoggml-org:masterfrom
Open
fix: use original segment length for vad_end time mapping#3703Acelogic wants to merge 1 commit intoggml-org:masterfrom
Acelogic wants to merge 1 commit intoggml-org:masterfrom
Conversation
When VAD segments get overlap samples appended for smoother model transitions, the vad_end timestamp was incorrectly calculated using the overlap-extended segment length. This caused a mismatch between vad_end (which included overlap duration) and orig_end (which did not), resulting in skewed time interpolation and inaccurate timestamps for all content within the segment. Compute original_segment_length before overlap is added and use it for vad_end, so the mapping ratio between processed and original time is correct. The full segment_length (with overlap) is still used for the audio buffer copy and offset advancement. Fixes ggml-org#3683
Member
|
I believe this is was resolved by #3711, but please let us know if that is not the case. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vad_endwas calculated using the overlap-extendedsegment_length, butorig_endreflects the original VAD boundary without overlap, creating a ratio mismatch in the time interpolationoriginal_segment_lengthbefore overlap is added and use it forvad_end; the fullsegment_length(with overlap) is still used for the audio buffer copy and offset advancementDetails
When
whisper_full_with_stateprocesses VAD segments, non-final segments getoverlap_samplesappended (line 6705 on master) for smoother model transitions. The bug is thatsegment.vad_endwas set tosamples_to_cs(offset + segment_length)wheresegment_lengthincludes this overlap, whilesegment.orig_endis the raw VAD boundary without overlap.This mismatch means the linear interpolation at lines 6740-6743:
int64_t orig_time = segment.orig_start + (vad_elapsed * orig_total) / vad_total;uses a
vad_totalthat is too large relative toorig_total, causing all timestamps within the segment to be shifted earlier than they should be.The fix moves the boundary clamping before overlap addition and computes
original_segment_lengthfrom the un-extended boundaries. Onlyvad_endchanges; thememcpyandoffsetadvancement still use the overlap-inclusivesegment_lengthso the audio buffer is unaffected.Fixes #3683