Skip to content

fix: use original segment length for vad_end time mapping#3703

Open
Acelogic wants to merge 1 commit intoggml-org:masterfrom
Acelogic:fix/vad-end-overlap-mapping
Open

fix: use original segment length for vad_end time mapping#3703
Acelogic wants to merge 1 commit intoggml-org:masterfrom
Acelogic:fix/vad-end-overlap-mapping

Conversation

@Acelogic
Copy link

Summary

  • Fixes incorrect VAD timestamp mapping when overlap samples are appended to segments
  • vad_end was calculated using the overlap-extended segment_length, but orig_end reflects the original VAD boundary without overlap, creating a ratio mismatch in the time interpolation
  • Compute original_segment_length before overlap is added and use it for vad_end; the full segment_length (with overlap) is still used for the audio buffer copy and offset advancement

Details

When whisper_full_with_state processes VAD segments, non-final segments get overlap_samples appended (line 6705 on master) for smoother model transitions. The bug is that segment.vad_end was set to samples_to_cs(offset + segment_length) where segment_length includes this overlap, while segment.orig_end is the raw VAD boundary without overlap.

This mismatch means the linear interpolation at lines 6740-6743:

int64_t orig_time = segment.orig_start + (vad_elapsed * orig_total) / vad_total;

uses a vad_total that is too large relative to orig_total, causing all timestamps within the segment to be shifted earlier than they should be.

The fix moves the boundary clamping before overlap addition and computes original_segment_length from the un-extended boundaries. Only vad_end changes; the memcpy and offset advancement still use the overlap-inclusive segment_length so the audio buffer is unaffected.

Fixes #3683

When VAD segments get overlap samples appended for smoother model
transitions, the vad_end timestamp was incorrectly calculated using
the overlap-extended segment length. This caused a mismatch between
vad_end (which included overlap duration) and orig_end (which did
not), resulting in skewed time interpolation and inaccurate
timestamps for all content within the segment.

Compute original_segment_length before overlap is added and use it
for vad_end, so the mapping ratio between processed and original
time is correct. The full segment_length (with overlap) is still
used for the audio buffer copy and offset advancement.

Fixes ggml-org#3683
@danbev
Copy link
Member

danbev commented Mar 17, 2026

I believe this is was resolved by #3711, but please let us know if that is not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: vad_end mapping incorrectly includes overlap samples, causing timestamp drift and redundant mapping points

2 participants