Add tdt duration to APIs #2514

bradmurray-dt · 2025-08-21T02:14:05Z

This is a follow up to:
#2495

This should add support to keep and pass durations through to the APIs from the nemo parakeet tdt models.

Examples are included for C and Python.

This code was only tested to the point that I confirmed my test programs ran and gave reasonable output. More in depth reviews and testing would be appreciated.

Summary by CodeRabbit

New Features
- Offline recognition results now include per-token durations (seconds) alongside timestamps, exposed in C, Python, and Go APIs and in JSON output.
- Decoder now uses model-provided durations for TDT models to produce per-token durations.
Documentation
- Added C and Python examples demonstrating offline transcription with Nemo Parakeet models, printing text, tokens, timestamps, and durations.

coderabbitai · 2025-08-21T02:14:15Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Caution

Review failed

The pull request is closed.

Walkthrough

Adds per-token duration support end-to-end (core C++ structs, decoder, conversion, JSON), exposes durations in C API, Python and Go bindings, and adds C/Python example programs plus a CMake target for Nemo Parakeet offline decoding.

Changes

Cohort / File(s)	Summary
Core result structs `sherpa-onnx/csrc/offline-stream.h`, `sherpa-onnx/csrc/offline-transducer-decoder.h`	Add `durations` vector members to OfflineRecognitionResult and OfflineTransducerDecoderResult.
Decoder logic (Nemo TDT) `sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc`	Split logits into token and duration logits; compute per-token duration, record durations, and advance frames by duration (variable step).
Result conversion & serialization `sherpa-onnx/csrc/offline-recognizer-transducer-impl.h`, `sherpa-onnx/csrc/offline-stream.cc`	Convert and scale durations (frame -> seconds); include `"durations"` array in JSON output.
C API `sherpa-onnx/c-api/c-api.h`, `sherpa-onnx/c-api/c-api.cc`	Add `float *durations` to `SherpaOnnxOfflineRecognizerResult`; allocate/populate/free durations when present.
Python binding `sherpa-onnx/python/csrc/offline-stream.cc`	Add read-only `durations` property on `OfflineRecognitionResult` Python class.
Go binding `scripts/go/sherpa_onnx.go`	Add `Durations []float32` to `OfflineRecognizerResult`; populate conditionally when backend provides durations.
C example & build `c-api-examples/nemo-parakeet-c-api.c`, `c-api-examples/CMakeLists.txt`	New C example program and CMake target `nemo-parakeet-c-api` linking `sherpa-onnx-c-api`.
Python example `python-api-examples/offline-nemo-parakeet-decode-file.py`	New Python example demonstrating offline Nemo Parakeet decoding and printing per-token timestamps/durations when available.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as Example (C/Python)
  participant Rec as OfflineRecognizer
  participant Dec as Nemo TDT Greedy Decoder
  participant Res as OfflineRecognitionResult

  User->>App: supply model paths + WAV
  App->>Rec: create stream, feed waveform
  Rec->>Dec: decode frames
  Note over Dec: split logits -> token_logits & duration_logits\ny = argmax(token_logits)\nd = argmax(duration_logits) or 1
  Dec-->>Rec: emit tokens, timestamps, durations (variable step t += d)
  Rec->>Res: build result (text, tokens, timestamps, durations)
  App->>Res: read result
  Res-->>App: return (durations may be NULL/absent)
  App-->>User: print text and per-token timestamps/durations

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add Pascal/Go/C#/Dart API for NeMo Canary ASR models #2367 — touches scripts/go/sherpa_onnx.go; may overlap with Go binding changes (OfflineRecognizerResult fields).

Suggested reviewers

csukuangfj

Poem

Hop-hop, I time each beep and beat,
Tokens march with tiny feet.
Nemo sings, durations fall in line,
Timestamps blink and texts align.
I nibble bugs and toast this feat 🥕⏱️

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fa6064c and 9999519.

📒 Files selected for processing (4)

python-api-examples/offline-nemo-parakeet-decode-file.py (1 hunks)
sherpa-onnx/c-api/c-api.h (2 hunks)
sherpa-onnx/csrc/offline-transducer-decoder.h (1 hunks)
sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (2 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

sherpa-onnx/c-api/c-api.h (1)

609-654: SherpaOnnxOfflineRecognizerResult Struct Change Is an ABI Break

The addition of the durations field before the existing count member in SherpaOnnxOfflineRecognizerResult shifts all subsequent offsets, breaking ABI compatibility with any client binary compiled against the previous layout.

Please address this breaking change by choosing one of the following strategies:

Introduce a new versioned struct and accessor:

Define SherpaOnnxOfflineRecognizerResultV2 (with durations in the desired position).

Add a corresponding factory/getter function (e.g., SherpaOnnxGetOfflineStreamResultV2).

Leave the original SherpaOnnxOfflineRecognizerResult and its API untouched for backward compatibility.

Or, if you’re willing to break existing clients:

Bump the library’s SONAME or major version.

Clearly document the ABI break in the changelog/release notes.

Long-term: adopt a “size/flags at the top” extensibility pattern so future fields can be added without moving offsets of existing members.

Please confirm the intended compatibility guarantees and implement one of the above paths before merging. I can help draft the V2 API if needed.
sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (1)
118-119: Initialize skip to 1 and clamp it defensively to avoid t += 0 edge cases.

If anything goes wrong before skip is set (unexpected early returns in future edits, exceptions), t += skip with skip==0 risks an infinite loop. Initialize and clamp.
-  int32_t skip = 0;
+  int32_t skip = 1;  // avoid t += 0 on the first iteration
@@
-    skip = 1;
-    int32_t duration = 1;
+    skip = 1;
+    int32_t duration = 1;
@@
-      skip = duration;
-      if (skip == 0) skip = 1;
+      skip = std::max(duration, 1);
Also applies to: 146-154

🧹 Nitpick comments (17)

sherpa-onnx/csrc/offline-recognizer-transducer-impl.h (1)
39-39: Duration conversion looks correct; clarify unit contract and guard mismatches.

You scale durations with the same frame_shift_s used for timestamps, which is exactly what we want if src.durations are in post-subsampling frames. However, offline-transducer-decoder.h currently says “frames or seconds,” which could lead to accidental double-scaling if a decoder ever outputs seconds. Recommend tightening the contract (frames-only at decoder layer) and optionally asserting size alignment with tokens/timestamps when durations are present.

Proposed doc tightening (see comment in offline-transducer-decoder.h) and optional alignment check:
   float frame_shift_s = frame_shift_ms / 1000. * subsampling_factor;
   for (auto t : src.timestamps) {
     float time = frame_shift_s * t;
     r.timestamps.push_back(time);
   }
 
-  // Copy durations (if present)
-  for (auto d : src.durations) {
-    r.durations.push_back(d * frame_shift_s);
-  }
+  // Copy durations (if present). Expect src.durations to be in output frames.
+  if (!src.durations.empty()) {
+    // Optional: sanity check to avoid subtle mismatches
+    // (keep as debug-only if you prefer).
+    // assert(src.durations.size() == src.timestamps.size());
+    for (auto d : src.durations) {
+      r.durations.push_back(d * frame_shift_s);  // seconds
+    }
+  }
Also applies to: 70-73
sherpa-onnx/csrc/offline-transducer-decoder.h (1)
23-24: Make durations unit unambiguous (post-subsampling frames).

To avoid accidental double conversion, define durations at the decoder layer in frames (same scale as timestamps). Convert to seconds only in higher layers (Convert()).
-  /// durations[i] contains the duration (in frames or seconds, as appropriate) for tokens[i] (TDT models only)
+  /// durations[i] contains the duration for tokens[i] in output frames
+  /// (post-subsampling). It is converted to seconds by higher layers
+  /// (e.g., Convert() in offline-recognizer-transducer-impl.h).
   std::vector<float> durations;
sherpa-onnx/csrc/offline-stream.h (1)
41-43: Add size invariant to docs for durations.

Durations are optional, but when present they should align 1:1 with tokens (and timestamps). Document this to help downstream consumers.
-  /// durations[i] contains the duration (in seconds) for tokens[i] (TDT models only)
+  /// durations.size() == tokens.size() when non-empty.
+  /// durations[i] contains the duration (in seconds) for tokens[i] (TDT models only).
   std::vector<float> durations;
sherpa-onnx/csrc/offline-stream.cc (1)
399-410: Consider emitting “durations” only when non-empty for stricter backward compatibility.

Always including an empty “durations”: [] is likely fine, but gating on non-empty mirrors how optional fields are usually surfaced and reduces risk for strict consumers comparing schemas.
-  os << "\""
-     << "durations"
-     << "\""
-     << ": ";
-  os << "[";
-  sep = "";
-  for (auto d : durations) {
-    os << sep << std::fixed << std::setprecision(2) << d;
-    sep = ", ";
-  }
-  os << "], ";
+  if (!durations.empty()) {
+    os << "\"durations\": ";
+    os << "[";
+    sep = "";
+    for (auto d : durations) {
+      os << sep << std::fixed << std::setprecision(2) << d;
+      sep = ", ";
+    }
+    os << "], ";
+  }
sherpa-onnx/c-api/c-api.cc (1)

692-699: Durations propagation mirrors timestamps logic correctly.

Size guard against r->count is correct and prevents mismatched copies.

Allocation + std::copy usage is sound, and the destructor covers deallocation.

Minor optional: when count == 0 (Lines 702-706), durations remains nullptr due to the memset earlier; you could also set r->durations = nullptr there for symmetry with timestamps.
sherpa-onnx/c-api/c-api.h (2)
621-623: Doc nit: count comment should cover durations too.

The comment currently says “number of entries in timestamps”. It should mention tokens/timestamps/durations are all length = count.

Apply:
-  // number of entries in timestamps
+  // Number of entries in tokens/timestamps/durations
631-644: JSON doc is missing the new “durations” field.

The comment block documenting the JSON payload doesn’t list durations. Propose adding it:
    *     "tokens": [x, x, x],
-   *     "timestamps": [x, x, x],
+   *     "timestamps": [x, x, x],
+   *     "durations": [x, x, x],
sherpa-onnx/python/csrc/offline-stream.cc (1)

45-49: Python binding for durations – LGTM.

Read-only exposure of durations aligns with timestamps and returns std::vector as a Python list. Optional: add a short property doc (unit = seconds) for timestamps and durations to reduce ambiguity in downstream usage.
python-api-examples/offline-nemo-parakeet-decode-file.py (3)
6-6: Remove duplicate import.

import os is present twice (Lines 4 and 6). Drop one to satisfy linters.
-import os
 import sys
17-19: Validate model file paths too (not just the WAV).

If any of encoder/decoder/joiner/tokens is missing, the example will fail later with a less helpful error. Add a quick check.
 if not os.path.exists(wav_filename):
     print(f"File not found: {wav_filename}")
     sys.exit(1)
+
+for p in (encoder, decoder, joiner, tokens):
+    if not os.path.exists(p):
+        print(f"File not found: {p}")
+        sys.exit(1)
43-48: Use length checks instead of hasattr to decide whether to print tables.

The properties always exist; they may be empty. Using hasattr will always be True. Prefer truthiness/length checks to avoid printing an empty header.
-if hasattr(result, "tokens") and hasattr(result, "timestamps") and hasattr(result, "durations"):
-    print("Token\tTimestamp\tDuration")
-    for token, ts, dur in zip(result.tokens, result.timestamps, result.durations):
-        print(f"{token}\t{ts:.2f}\t{dur:.2f}")
+if result.tokens and result.timestamps:
+    if result.durations:
+        print("Token\tTimestamp\tDuration")
+        for token, ts, dur in zip(result.tokens, result.timestamps, result.durations):
+            print(f"{token}\t{ts:.2f}\t{dur:.2f}")
+    else:
+        print("Token\tTimestamp")
+        for token, ts in zip(result.tokens, result.timestamps):
+            print(f"{token}\t{ts:.2f}")
 else:
     print("Timestamps or durations not available.")
c-api-examples/nemo-parakeet-c-api.c (3)
24-27: Preflight-check all model/token files, not just the WAV.

You only verify the WAV exists. If any ONNX or tokens path is wrong, the example fails later with a generic message. Add explicit checks to fail fast with actionable errors.

Apply this diff:
   const char *provider = "cpu";
 
-  if (!SherpaOnnxFileExists(wav_filename)) {
-    fprintf(stderr, "File not found: %s\n", wav_filename);
-    return -1;
-  }
+  // Preflight checks for all inputs
+  if (!SherpaOnnxFileExists(wav_filename)) {
+    fprintf(stderr, "File not found: %s\n", wav_filename);
+    return -1;
+  }
+  if (!SherpaOnnxFileExists(encoder_filename)) {
+    fprintf(stderr, "File not found: %s\n", encoder_filename);
+    return -1;
+  }
+  if (!SherpaOnnxFileExists(decoder_filename)) {
+    fprintf(stderr, "File not found: %s\n", decoder_filename);
+    return -1;
+  }
+  if (!SherpaOnnxFileExists(joiner_filename)) {
+    fprintf(stderr, "File not found: %s\n", joiner_filename);
+    return -1;
+  }
+  if (!SherpaOnnxFileExists(tokens_filename)) {
+    fprintf(stderr, "File not found: %s\n", tokens_filename);
+    return -1;
+  }
Also applies to: 14-21

28-33: Tighten the WAV parse error message or validate format.

ReadWave returns floats; the failure isn’t necessarily “not a valid mono 16‑bit.” Either make the message generic or actually validate channels/bit-depth before printing a specific claim.
-    fprintf(stderr, "Failed to read or parse %s (not a valid mono 16-bit WAVE file)\n", wav_filename);
+    fprintf(stderr, "Failed to read or parse %s (unsupported/invalid WAV)\n", wav_filename);
74-81: Print whatever timing info is available instead of all-or-nothing.

If tokens exist but only timestamps or durations are present, you currently print “not available” and hide useful data. Print available columns dynamically.
-  if (result->tokens_arr && result->timestamps && result->durations) {
-    printf("Token\tTimestamp\tDuration\n");
-    for (int32_t i = 0; i < result->count; ++i) {
-      printf("%s\t%.2f\t%.2f\n", result->tokens_arr[i], result->timestamps[i], result->durations[i]);
-    }
-  } else {
-    printf("Timestamps or durations not available.\n");
-  }
+  if (result->tokens_arr) {
+    // Header
+    printf("Token");
+    if (result->timestamps) printf("\tTimestamp");
+    if (result->durations) printf("\tDuration");
+    printf("\n");
+    // Rows
+    for (int32_t i = 0; i < result->count; ++i) {
+      printf("%s", result->tokens_arr[i]);
+      if (result->timestamps) printf("\t%.2f", result->timestamps[i]);
+      if (result->durations) printf("\t%.2f", result->durations[i]);
+      printf("\n");
+    }
+  } else {
+    printf("Tokens not available.\n");
+  }
sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (3)
159-160: Make the duration type explicit.

durations is a vector; casting avoids implicit narrowing surprises during overload resolution or template deduction.
-      ans.durations.push_back(duration); // Use the index as duration, matching Python
+      ans.durations.push_back(static_cast<float>(duration));  // index-as-duration, matching Python
104-105: Remove unused variable.

token_durations is never read. Drop it to keep the decoder tight.
-  std::vector<float> token_durations;
142-154: Duration semantics verified consistent across all bindings and JSON

The greedy‐search decoder in sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc records raw frame‐count indices in ans.durations (0..num_durations–1) and uses a separate skip clamp to advance at least one frame.

In sherpa-onnx/csrc/offline-recognizer-transducer-impl.h, each index is converted to seconds via r.durations.push_back(d * frame_shift_s), producing a durations vector in seconds.

The Python binding (offline-stream.cc), C API (c-api.cc) and the JSON serializer (offline-stream.cc) all expose the post-conversion durations in seconds, as documented in offline-stream.h and c-api.h.

There is no need for any additional “+1” or offset: model-output classes map directly to frame counts starting at zero, then clamped at runtime for skip.

Optional: consider clarifying the inline comment at line 159 of the greedy‐search decoder to note that these values are raw frame-count indices (converted downstream), to avoid confusion.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ba13109 and fa6064c.

📒 Files selected for processing (12)

c-api-examples/CMakeLists.txt (1 hunks)
c-api-examples/nemo-parakeet-c-api.c (1 hunks)
python-api-examples/offline-nemo-parakeet-decode-file.py (1 hunks)
scripts/go/sherpa_onnx.go (2 hunks)
sherpa-onnx/c-api/c-api.cc (2 hunks)
sherpa-onnx/c-api/c-api.h (1 hunks)
sherpa-onnx/csrc/offline-recognizer-transducer-impl.h (2 hunks)
sherpa-onnx/csrc/offline-stream.cc (1 hunks)
sherpa-onnx/csrc/offline-stream.h (1 hunks)
sherpa-onnx/csrc/offline-transducer-decoder.h (1 hunks)
sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (3 hunks)
sherpa-onnx/python/csrc/offline-stream.cc (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (4)

python-api-examples/offline-nemo-parakeet-decode-file.py (3)

sherpa-onnx/csrc/offline-recognizer-transducer-impl.h (1)

sherpa_onnx (31-192)

sherpa-onnx/csrc/offline-stream.h (1)

sherpa_onnx (17-107)

sherpa-onnx/csrc/offline-transducer-decoder.h (1)

sherpa_onnx (13-44)

sherpa-onnx/c-api/c-api.cc (1)

sherpa-onnx/csrc/offline-stream.cc (2)

r (250-250)

r (250-250)

c-api-examples/nemo-parakeet-c-api.c (1)

sherpa-onnx/c-api/c-api.cc (22)

SherpaOnnxFileExists (2055-2057)

SherpaOnnxFileExists (2055-2055)

SherpaOnnxReadWave (1439-1456)

SherpaOnnxReadWave (1439-1439)

SherpaOnnxCreateOfflineRecognizer (549-565)

SherpaOnnxCreateOfflineRecognizer (549-550)

SherpaOnnxFreeWave (1480-1485)

SherpaOnnxFreeWave (1480-1480)

SherpaOnnxCreateOfflineStream (580-585)

SherpaOnnxCreateOfflineStream (580-581)

SherpaOnnxDestroyOfflineRecognizer (575-578)

SherpaOnnxDestroyOfflineRecognizer (575-576)

SherpaOnnxAcceptWaveformOffline (598-602)

SherpaOnnxAcceptWaveformOffline (598-600)

SherpaOnnxDecodeOfflineStream (604-608)

SherpaOnnxDecodeOfflineStream (604-606)

SherpaOnnxGetOfflineStreamResult (620-709)

SherpaOnnxGetOfflineStreamResult (620-621)

SherpaOnnxDestroyOfflineRecognizerResult (711-725)

SherpaOnnxDestroyOfflineRecognizerResult (711-712)

SherpaOnnxDestroyOfflineStream (594-596)

SherpaOnnxDestroyOfflineStream (594-594)

sherpa-onnx/csrc/offline-recognizer-transducer-impl.h (1)

sherpa-onnx/csrc/offline-stream.cc (2)

r (250-250)

r (250-250)

🪛 Ruff (0.12.2)

python-api-examples/offline-nemo-parakeet-decode-file.py

6-6: Redefinition of unused os from line 4

Remove definition: os

(F811)

🪛 Flake8 (7.2.0)

python-api-examples/offline-nemo-parakeet-decode-file.py

[error] 6-6: redefinition of unused 'os' from line 4

(F811)

🔇 Additional comments (8)

c-api-examples/CMakeLists.txt (1)

62-64: Example doesn’t use cargs; no extra linkage needed

The nemo-parakeet-c-api.c file contains no #include <cargs.h> directive nor any calls to cargs_* functions. Therefore, linking against the cargs library is unnecessary and the existing:
add_executable(nemo-parakeet-c-api nemo-parakeet-c-api.c)
target_link_libraries(nemo-parakeet-c-api sherpa-onnx-c-api)
is correct.
sherpa-onnx/c-api/c-api.cc (1)

711-717: Destructor now frees durations – LGTM.

delete[] r->durations; is placed alongside other fields, so no leak.
sherpa-onnx/c-api/c-api.h (1)

662-672: Fix offline result doc comment function name

The documentation for freeing an offline recognizer result incorrectly references the online destroy function. Update the comment in sherpa-onnx/c-api/c-api.h:

File: sherpa-onnx/c-api/c-api.h

Line: 663
- ///         SherpaOnnxDestroyOnlineRecognizerResult() to free the returned
+ ///         SherpaOnnxDestroyOfflineRecognizerResult() to free the returned
scripts/go/sherpa_onnx.go (2)

523-530: Go result type gains Durations – LGTM.

Public API is extended in a backward-compatible manner for Go users (new field, no breaking changes). Make sure release notes call out the addition.

876-889: Safe conditional population of Timestamps and Durations – LGTM.

Guards against nil C pointers before slicing avoid panics.

Loops copy with proper float32 casts.

Durations population is independent of timestamps, which is correct.

No changes requested.

c-api-examples/nemo-parakeet-c-api.c (2)

83-87: Good cleanup ordering and completeness.

Result, stream, recognizer, then wave are destroyed in the right order. No leaks.

66-71: No overflow risk on num_samples conversion

The SherpaOnnxWave struct defines num_samples as an int32_t, which matches the int32_t n parameter in SherpaOnnxAcceptWaveformOffline. There is no wider type being implicitly converted, so the overflow concern does not apply. Feel free to disregard the suggested cast/check.

Likely an incorrect or invalid review comment.

sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (1)

138-146: Clean split between token and duration logits looks correct.

Using the first vocab_size as token logits and the tail as duration logits aligns with the TDT output layout. Argmax over the token slice to get y is standard greedy decoding.

sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc

csukuangfj

Thanks!
Left some minor comments.
Otherwise, it looks great!

Copilot

Pull Request Overview

This PR adds duration support to offline recognition results for TDT (Token-Duration-Time) models, specifically targeting Nemo Parakeet models. The feature provides per-token duration information alongside existing timestamps.

Extends the recognition result structure to include duration information across all APIs
Updates the TDT decoder to extract and utilize duration logits from model output
Adds comprehensive examples demonstrating duration usage in both C and Python

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc	Core decoder logic to extract duration from TDT model output
sherpa-onnx/csrc/offline-transducer-decoder.h	Added durations field to decoder result structure
sherpa-onnx/csrc/offline-recognizer-transducer-impl.h	Duration conversion from frames to seconds
sherpa-onnx/csrc/offline-stream.h	Added durations field to recognition result
sherpa-onnx/csrc/offline-stream.cc	JSON serialization support for durations
sherpa-onnx/python/csrc/offline-stream.cc	Python binding for durations property
sherpa-onnx/c-api/c-api.h	C API structure extension for durations
sherpa-onnx/c-api/c-api.cc	C API implementation for duration handling
scripts/go/sherpa_onnx.go	Go API support for durations
python-api-examples/offline-nemo-parakeet-decode-file.py	Python example demonstrating duration usage
c-api-examples/nemo-parakeet-c-api.c	C example demonstrating duration usage
c-api-examples/CMakeLists.txt	Build configuration for new C example

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-21T02:55:20Z

sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc

+        token_logits,
+        std::max_element(token_logits, token_logits + vocab_size)));
+
+    skip = 1;


The variable skip is unconditionally set to 1 on line 144, but then conditionally overridden on line 150. This creates confusion as the initial assignment appears unnecessary. Consider moving the skip = 1 assignment inside the else clause or after the duration calculation.

Suggested change

skip = 1;

Copilot · 2025-08-21T02:55:20Z

sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc

+          duration_logits,
+          std::max_element(duration_logits, duration_logits + num_durations)));
+      skip = duration;
+      if (skip == 0) skip = 1;


The default duration value of 1 should be documented or made into a named constant. This magic number affects the behavior when no duration information is available from the model.

Suggested change

if (skip == 0) skip = 1;

skip = kDefaultDuration;

int32_t duration = kDefaultDuration;

if (num_durations > 0) {

duration = static_cast<int32_t>(std::distance(

duration_logits,

std::max_element(duration_logits, duration_logits + num_durations)));

skip = duration;

if (skip == 0) skip = kDefaultDuration;

Add tdt duration to APIs

fa6064c

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc Outdated Show resolved Hide resolved

csukuangfj requested a review from Copilot August 21, 2025 02:26

This comment was marked as outdated.

Sign in to view

csukuangfj approved these changes Aug 21, 2025

View reviewed changes

bradmurray-dt added 3 commits August 20, 2025 22:34

Remove duplicate import

9c775fd

Remove duplicate declaration

be31b98

Cleanup comments and unused variable

9999519

bradmurray-dt requested a review from Copilot August 21, 2025 02:54

csukuangfj merged commit 06ae4a7 into k2-fsa:master Aug 21, 2025
1 check was pending

Copilot AI reviewed Aug 21, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Aug 25, 2025

Release v1.12.10 #2523

Merged

csukuangfj mentioned this pull request Sep 1, 2025

Fix c api #2545

Merged

coderabbitai bot mentioned this pull request Sep 1, 2025

Fix linking #2546

Merged

-      if (skip == 0) skip = 1;
+    skip = kDefaultDuration;
+    int32_t duration = kDefaultDuration;
+    if (num_durations > 0) {
+      duration = static_cast<int32_t>(std::distance(
+          duration_logits,
+          std::max_element(duration_logits, duration_logits + num_durations)));
+      skip = duration;
+      if (skip == 0) skip = kDefaultDuration;

Add tdt duration to APIs #2514

Add tdt duration to APIs #2514

Uh oh!

Conversation

bradmurray-dt commented Aug 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

csukuangfj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bradmurray-dt commented Aug 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)