Skip to content

Add cross-attention accessor functions for AlignAtt streaming#3660

Open
danielbodart wants to merge 1 commit intoggml-org:masterfrom
danielbodart:add-cross-attention-accessors
Open

Add cross-attention accessor functions for AlignAtt streaming#3660
danielbodart wants to merge 1 commit intoggml-org:masterfrom
danielbodart:add-cross-attention-accessors

Conversation

@danielbodart
Copy link

Summary

  • Adds whisper_decode_with_state_and_aheads() — same as whisper_decode_with_state but saves alignment head cross-attention data during decode
  • Adds whisper_state_get_aheads_cross_qks() — reads the resulting cross-attention tensor from state, copying from GPU/backend to CPU

These enable external callers to implement AlignAtt-style streaming policies by accessing the cross-attention data that is currently only available internally via DTW timestamp computation.

Motivation

Building a streaming speech-to-text system that uses cross-attention analysis (AlignAtt) to decide when to stop decoding and emit partial results. The existing API only exposes DTW timestamps as a post-processing step via whisper_full, but streaming requires access to the raw cross-attention data during manual decode loops.

Details

  • whisper_decode_with_state_and_aheads calls whisper_decode_internal with save_aheads=true, matching how whisper_full internally enables attention head saving for DTW
  • whisper_state_get_aheads_cross_qks returns a float pointer of shape [n_tokens × n_audio_ctx × n_heads], valid until the next call or whisper_free_state
  • Requires dtw_token_timestamps=true in context params (same prerequisite as DTW timestamps)
  • No changes to existing functions or data structures — purely additive

Add whisper_decode_with_state_and_aheads() which saves alignment head
cross-attention data during decode, and whisper_state_get_aheads_cross_qks()
to read the resulting tensor from state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant