Skip to content

Conversation

@bradmurray-dt
Copy link
Contributor

@bradmurray-dt bradmurray-dt commented Aug 21, 2025

This is a follow up to:
#2495

This should add support to keep and pass durations through to the APIs from the nemo parakeet tdt models.

Examples are included for C and Python.

This code was only tested to the point that I confirmed my test programs ran and gave reasonable output. More in depth reviews and testing would be appreciated.

Summary by CodeRabbit

  • New Features

    • Offline recognition results now include per-token durations (seconds) alongside timestamps, exposed in C, Python, and Go APIs and in JSON output.
    • Decoder now uses model-provided durations for TDT models to produce per-token durations.
  • Documentation

    • Added C and Python examples demonstrating offline transcription with Nemo Parakeet models, printing text, tokens, timestamps, and durations.

@coderabbitai
Copy link

coderabbitai bot commented Aug 21, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Caution

Review failed

The pull request is closed.

Walkthrough

Adds per-token duration support end-to-end (core C++ structs, decoder, conversion, JSON), exposes durations in C API, Python and Go bindings, and adds C/Python example programs plus a CMake target for Nemo Parakeet offline decoding.

Changes

Cohort / File(s) Summary
Core result structs
sherpa-onnx/csrc/offline-stream.h, sherpa-onnx/csrc/offline-transducer-decoder.h
Add durations vector members to OfflineRecognitionResult and OfflineTransducerDecoderResult.
Decoder logic (Nemo TDT)
sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc
Split logits into token and duration logits; compute per-token duration, record durations, and advance frames by duration (variable step).
Result conversion & serialization
sherpa-onnx/csrc/offline-recognizer-transducer-impl.h, sherpa-onnx/csrc/offline-stream.cc
Convert and scale durations (frame -> seconds); include "durations" array in JSON output.
C API
sherpa-onnx/c-api/c-api.h, sherpa-onnx/c-api/c-api.cc
Add float *durations to SherpaOnnxOfflineRecognizerResult; allocate/populate/free durations when present.
Python binding
sherpa-onnx/python/csrc/offline-stream.cc
Add read-only durations property on OfflineRecognitionResult Python class.
Go binding
scripts/go/sherpa_onnx.go
Add Durations []float32 to OfflineRecognizerResult; populate conditionally when backend provides durations.
C example & build
c-api-examples/nemo-parakeet-c-api.c, c-api-examples/CMakeLists.txt
New C example program and CMake target nemo-parakeet-c-api linking sherpa-onnx-c-api.
Python example
python-api-examples/offline-nemo-parakeet-decode-file.py
New Python example demonstrating offline Nemo Parakeet decoding and printing per-token timestamps/durations when available.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as Example (C/Python)
  participant Rec as OfflineRecognizer
  participant Dec as Nemo TDT Greedy Decoder
  participant Res as OfflineRecognitionResult

  User->>App: supply model paths + WAV
  App->>Rec: create stream, feed waveform
  Rec->>Dec: decode frames
  Note over Dec: split logits -> token_logits & duration_logits\ny = argmax(token_logits)\nd = argmax(duration_logits) or 1
  Dec-->>Rec: emit tokens, timestamps, durations (variable step t += d)
  Rec->>Res: build result (text, tokens, timestamps, durations)
  App->>Res: read result
  Res-->>App: return (durations may be NULL/absent)
  App-->>User: print text and per-token timestamps/durations
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • csukuangfj

Poem

Hop-hop, I time each beep and beat,
Tokens march with tiny feet.
Nemo sings, durations fall in line,
Timestamps blink and texts align.
I nibble bugs and toast this feat 🥕⏱️

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fa6064c and 9999519.

📒 Files selected for processing (4)
  • python-api-examples/offline-nemo-parakeet-decode-file.py (1 hunks)
  • sherpa-onnx/c-api/c-api.h (2 hunks)
  • sherpa-onnx/csrc/offline-transducer-decoder.h (1 hunks)
  • sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (2 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
sherpa-onnx/c-api/c-api.h (1)

609-654: SherpaOnnxOfflineRecognizerResult Struct Change Is an ABI Break

The addition of the durations field before the existing count member in SherpaOnnxOfflineRecognizerResult shifts all subsequent offsets, breaking ABI compatibility with any client binary compiled against the previous layout.

Please address this breaking change by choosing one of the following strategies:

  • Introduce a new versioned struct and accessor:
    • Define SherpaOnnxOfflineRecognizerResultV2 (with durations in the desired position).
    • Add a corresponding factory/getter function (e.g., SherpaOnnxGetOfflineStreamResultV2).
    • Leave the original SherpaOnnxOfflineRecognizerResult and its API untouched for backward compatibility.
  • Or, if you’re willing to break existing clients:
    • Bump the library’s SONAME or major version.
    • Clearly document the ABI break in the changelog/release notes.
  • Long-term: adopt a “size/flags at the top” extensibility pattern so future fields can be added without moving offsets of existing members.

Please confirm the intended compatibility guarantees and implement one of the above paths before merging. I can help draft the V2 API if needed.

sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (1)

118-119: Initialize skip to 1 and clamp it defensively to avoid t += 0 edge cases.

If anything goes wrong before skip is set (unexpected early returns in future edits, exceptions), t += skip with skip==0 risks an infinite loop. Initialize and clamp.

-  int32_t skip = 0;
+  int32_t skip = 1;  // avoid t += 0 on the first iteration
@@
-    skip = 1;
-    int32_t duration = 1;
+    skip = 1;
+    int32_t duration = 1;
@@
-      skip = duration;
-      if (skip == 0) skip = 1;
+      skip = std::max(duration, 1);

Also applies to: 146-154

🧹 Nitpick comments (17)
sherpa-onnx/csrc/offline-recognizer-transducer-impl.h (1)

39-39: Duration conversion looks correct; clarify unit contract and guard mismatches.

You scale durations with the same frame_shift_s used for timestamps, which is exactly what we want if src.durations are in post-subsampling frames. However, offline-transducer-decoder.h currently says “frames or seconds,” which could lead to accidental double-scaling if a decoder ever outputs seconds. Recommend tightening the contract (frames-only at decoder layer) and optionally asserting size alignment with tokens/timestamps when durations are present.

Proposed doc tightening (see comment in offline-transducer-decoder.h) and optional alignment check:

   float frame_shift_s = frame_shift_ms / 1000. * subsampling_factor;
   for (auto t : src.timestamps) {
     float time = frame_shift_s * t;
     r.timestamps.push_back(time);
   }
 
-  // Copy durations (if present)
-  for (auto d : src.durations) {
-    r.durations.push_back(d * frame_shift_s);
-  }
+  // Copy durations (if present). Expect src.durations to be in output frames.
+  if (!src.durations.empty()) {
+    // Optional: sanity check to avoid subtle mismatches
+    // (keep as debug-only if you prefer).
+    // assert(src.durations.size() == src.timestamps.size());
+    for (auto d : src.durations) {
+      r.durations.push_back(d * frame_shift_s);  // seconds
+    }
+  }

Also applies to: 70-73

sherpa-onnx/csrc/offline-transducer-decoder.h (1)

23-24: Make durations unit unambiguous (post-subsampling frames).

To avoid accidental double conversion, define durations at the decoder layer in frames (same scale as timestamps). Convert to seconds only in higher layers (Convert()).

-  /// durations[i] contains the duration (in frames or seconds, as appropriate) for tokens[i] (TDT models only)
+  /// durations[i] contains the duration for tokens[i] in output frames
+  /// (post-subsampling). It is converted to seconds by higher layers
+  /// (e.g., Convert() in offline-recognizer-transducer-impl.h).
   std::vector<float> durations;
sherpa-onnx/csrc/offline-stream.h (1)

41-43: Add size invariant to docs for durations.

Durations are optional, but when present they should align 1:1 with tokens (and timestamps). Document this to help downstream consumers.

-  /// durations[i] contains the duration (in seconds) for tokens[i] (TDT models only)
+  /// durations.size() == tokens.size() when non-empty.
+  /// durations[i] contains the duration (in seconds) for tokens[i] (TDT models only).
   std::vector<float> durations;
sherpa-onnx/csrc/offline-stream.cc (1)

399-410: Consider emitting “durations” only when non-empty for stricter backward compatibility.

Always including an empty “durations”: [] is likely fine, but gating on non-empty mirrors how optional fields are usually surfaced and reduces risk for strict consumers comparing schemas.

-  os << "\""
-     << "durations"
-     << "\""
-     << ": ";
-  os << "[";
-  sep = "";
-  for (auto d : durations) {
-    os << sep << std::fixed << std::setprecision(2) << d;
-    sep = ", ";
-  }
-  os << "], ";
+  if (!durations.empty()) {
+    os << "\"durations\": ";
+    os << "[";
+    sep = "";
+    for (auto d : durations) {
+      os << sep << std::fixed << std::setprecision(2) << d;
+      sep = ", ";
+    }
+    os << "], ";
+  }
sherpa-onnx/c-api/c-api.cc (1)

692-699: Durations propagation mirrors timestamps logic correctly.

  • Size guard against r->count is correct and prevents mismatched copies.
  • Allocation + std::copy usage is sound, and the destructor covers deallocation.

Minor optional: when count == 0 (Lines 702-706), durations remains nullptr due to the memset earlier; you could also set r->durations = nullptr there for symmetry with timestamps.

sherpa-onnx/c-api/c-api.h (2)

621-623: Doc nit: count comment should cover durations too.

The comment currently says “number of entries in timestamps”. It should mention tokens/timestamps/durations are all length = count.

Apply:

-  // number of entries in timestamps
+  // Number of entries in tokens/timestamps/durations

631-644: JSON doc is missing the new “durations” field.

The comment block documenting the JSON payload doesn’t list durations. Propose adding it:

    *     "tokens": [x, x, x],
-   *     "timestamps": [x, x, x],
+   *     "timestamps": [x, x, x],
+   *     "durations": [x, x, x],
sherpa-onnx/python/csrc/offline-stream.cc (1)

45-49: Python binding for durations – LGTM.

Read-only exposure of durations aligns with timestamps and returns std::vector as a Python list. Optional: add a short property doc (unit = seconds) for timestamps and durations to reduce ambiguity in downstream usage.

python-api-examples/offline-nemo-parakeet-decode-file.py (3)

6-6: Remove duplicate import.

import os is present twice (Lines 4 and 6). Drop one to satisfy linters.

-import os
 import sys

17-19: Validate model file paths too (not just the WAV).

If any of encoder/decoder/joiner/tokens is missing, the example will fail later with a less helpful error. Add a quick check.

 if not os.path.exists(wav_filename):
     print(f"File not found: {wav_filename}")
     sys.exit(1)
+
+for p in (encoder, decoder, joiner, tokens):
+    if not os.path.exists(p):
+        print(f"File not found: {p}")
+        sys.exit(1)

43-48: Use length checks instead of hasattr to decide whether to print tables.

The properties always exist; they may be empty. Using hasattr will always be True. Prefer truthiness/length checks to avoid printing an empty header.

-if hasattr(result, "tokens") and hasattr(result, "timestamps") and hasattr(result, "durations"):
-    print("Token\tTimestamp\tDuration")
-    for token, ts, dur in zip(result.tokens, result.timestamps, result.durations):
-        print(f"{token}\t{ts:.2f}\t{dur:.2f}")
+if result.tokens and result.timestamps:
+    if result.durations:
+        print("Token\tTimestamp\tDuration")
+        for token, ts, dur in zip(result.tokens, result.timestamps, result.durations):
+            print(f"{token}\t{ts:.2f}\t{dur:.2f}")
+    else:
+        print("Token\tTimestamp")
+        for token, ts in zip(result.tokens, result.timestamps):
+            print(f"{token}\t{ts:.2f}")
 else:
     print("Timestamps or durations not available.")
c-api-examples/nemo-parakeet-c-api.c (3)

24-27: Preflight-check all model/token files, not just the WAV.

You only verify the WAV exists. If any ONNX or tokens path is wrong, the example fails later with a generic message. Add explicit checks to fail fast with actionable errors.

Apply this diff:

   const char *provider = "cpu";
 
-  if (!SherpaOnnxFileExists(wav_filename)) {
-    fprintf(stderr, "File not found: %s\n", wav_filename);
-    return -1;
-  }
+  // Preflight checks for all inputs
+  if (!SherpaOnnxFileExists(wav_filename)) {
+    fprintf(stderr, "File not found: %s\n", wav_filename);
+    return -1;
+  }
+  if (!SherpaOnnxFileExists(encoder_filename)) {
+    fprintf(stderr, "File not found: %s\n", encoder_filename);
+    return -1;
+  }
+  if (!SherpaOnnxFileExists(decoder_filename)) {
+    fprintf(stderr, "File not found: %s\n", decoder_filename);
+    return -1;
+  }
+  if (!SherpaOnnxFileExists(joiner_filename)) {
+    fprintf(stderr, "File not found: %s\n", joiner_filename);
+    return -1;
+  }
+  if (!SherpaOnnxFileExists(tokens_filename)) {
+    fprintf(stderr, "File not found: %s\n", tokens_filename);
+    return -1;
+  }

Also applies to: 14-21


28-33: Tighten the WAV parse error message or validate format.

ReadWave returns floats; the failure isn’t necessarily “not a valid mono 16‑bit.” Either make the message generic or actually validate channels/bit-depth before printing a specific claim.

-    fprintf(stderr, "Failed to read or parse %s (not a valid mono 16-bit WAVE file)\n", wav_filename);
+    fprintf(stderr, "Failed to read or parse %s (unsupported/invalid WAV)\n", wav_filename);

74-81: Print whatever timing info is available instead of all-or-nothing.

If tokens exist but only timestamps or durations are present, you currently print “not available” and hide useful data. Print available columns dynamically.

-  if (result->tokens_arr && result->timestamps && result->durations) {
-    printf("Token\tTimestamp\tDuration\n");
-    for (int32_t i = 0; i < result->count; ++i) {
-      printf("%s\t%.2f\t%.2f\n", result->tokens_arr[i], result->timestamps[i], result->durations[i]);
-    }
-  } else {
-    printf("Timestamps or durations not available.\n");
-  }
+  if (result->tokens_arr) {
+    // Header
+    printf("Token");
+    if (result->timestamps) printf("\tTimestamp");
+    if (result->durations) printf("\tDuration");
+    printf("\n");
+    // Rows
+    for (int32_t i = 0; i < result->count; ++i) {
+      printf("%s", result->tokens_arr[i]);
+      if (result->timestamps) printf("\t%.2f", result->timestamps[i]);
+      if (result->durations) printf("\t%.2f", result->durations[i]);
+      printf("\n");
+    }
+  } else {
+    printf("Tokens not available.\n");
+  }
sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (3)

159-160: Make the duration type explicit.

durations is a vector; casting avoids implicit narrowing surprises during overload resolution or template deduction.

-      ans.durations.push_back(duration); // Use the index as duration, matching Python
+      ans.durations.push_back(static_cast<float>(duration));  // index-as-duration, matching Python

104-105: Remove unused variable.

token_durations is never read. Drop it to keep the decoder tight.

-  std::vector<float> token_durations;

142-154: Duration semantics verified consistent across all bindings and JSON

  • The greedy‐search decoder in sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc records raw frame‐count indices in ans.durations (0..num_durations–1) and uses a separate skip clamp to advance at least one frame.
  • In sherpa-onnx/csrc/offline-recognizer-transducer-impl.h, each index is converted to seconds via r.durations.push_back(d * frame_shift_s), producing a durations vector in seconds.
  • The Python binding (offline-stream.cc), C API (c-api.cc) and the JSON serializer (offline-stream.cc) all expose the post-conversion durations in seconds, as documented in offline-stream.h and c-api.h.
  • There is no need for any additional “+1” or offset: model-output classes map directly to frame counts starting at zero, then clamped at runtime for skip.

Optional: consider clarifying the inline comment at line 159 of the greedy‐search decoder to note that these values are raw frame-count indices (converted downstream), to avoid confusion.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ba13109 and fa6064c.

📒 Files selected for processing (12)
  • c-api-examples/CMakeLists.txt (1 hunks)
  • c-api-examples/nemo-parakeet-c-api.c (1 hunks)
  • python-api-examples/offline-nemo-parakeet-decode-file.py (1 hunks)
  • scripts/go/sherpa_onnx.go (2 hunks)
  • sherpa-onnx/c-api/c-api.cc (2 hunks)
  • sherpa-onnx/c-api/c-api.h (1 hunks)
  • sherpa-onnx/csrc/offline-recognizer-transducer-impl.h (2 hunks)
  • sherpa-onnx/csrc/offline-stream.cc (1 hunks)
  • sherpa-onnx/csrc/offline-stream.h (1 hunks)
  • sherpa-onnx/csrc/offline-transducer-decoder.h (1 hunks)
  • sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (3 hunks)
  • sherpa-onnx/python/csrc/offline-stream.cc (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
python-api-examples/offline-nemo-parakeet-decode-file.py (3)
sherpa-onnx/csrc/offline-recognizer-transducer-impl.h (1)
  • sherpa_onnx (31-192)
sherpa-onnx/csrc/offline-stream.h (1)
  • sherpa_onnx (17-107)
sherpa-onnx/csrc/offline-transducer-decoder.h (1)
  • sherpa_onnx (13-44)
sherpa-onnx/c-api/c-api.cc (1)
sherpa-onnx/csrc/offline-stream.cc (2)
  • r (250-250)
  • r (250-250)
c-api-examples/nemo-parakeet-c-api.c (1)
sherpa-onnx/c-api/c-api.cc (22)
  • SherpaOnnxFileExists (2055-2057)
  • SherpaOnnxFileExists (2055-2055)
  • SherpaOnnxReadWave (1439-1456)
  • SherpaOnnxReadWave (1439-1439)
  • SherpaOnnxCreateOfflineRecognizer (549-565)
  • SherpaOnnxCreateOfflineRecognizer (549-550)
  • SherpaOnnxFreeWave (1480-1485)
  • SherpaOnnxFreeWave (1480-1480)
  • SherpaOnnxCreateOfflineStream (580-585)
  • SherpaOnnxCreateOfflineStream (580-581)
  • SherpaOnnxDestroyOfflineRecognizer (575-578)
  • SherpaOnnxDestroyOfflineRecognizer (575-576)
  • SherpaOnnxAcceptWaveformOffline (598-602)
  • SherpaOnnxAcceptWaveformOffline (598-600)
  • SherpaOnnxDecodeOfflineStream (604-608)
  • SherpaOnnxDecodeOfflineStream (604-606)
  • SherpaOnnxGetOfflineStreamResult (620-709)
  • SherpaOnnxGetOfflineStreamResult (620-621)
  • SherpaOnnxDestroyOfflineRecognizerResult (711-725)
  • SherpaOnnxDestroyOfflineRecognizerResult (711-712)
  • SherpaOnnxDestroyOfflineStream (594-596)
  • SherpaOnnxDestroyOfflineStream (594-594)
sherpa-onnx/csrc/offline-recognizer-transducer-impl.h (1)
sherpa-onnx/csrc/offline-stream.cc (2)
  • r (250-250)
  • r (250-250)
🪛 Ruff (0.12.2)
python-api-examples/offline-nemo-parakeet-decode-file.py

6-6: Redefinition of unused os from line 4

Remove definition: os

(F811)

🪛 Flake8 (7.2.0)
python-api-examples/offline-nemo-parakeet-decode-file.py

[error] 6-6: redefinition of unused 'os' from line 4

(F811)

🔇 Additional comments (8)
c-api-examples/CMakeLists.txt (1)

62-64: Example doesn’t use cargs; no extra linkage needed

The nemo-parakeet-c-api.c file contains no #include <cargs.h> directive nor any calls to cargs_* functions. Therefore, linking against the cargs library is unnecessary and the existing:

add_executable(nemo-parakeet-c-api nemo-parakeet-c-api.c)
target_link_libraries(nemo-parakeet-c-api sherpa-onnx-c-api)

is correct.

sherpa-onnx/c-api/c-api.cc (1)

711-717: Destructor now frees durations – LGTM.

delete[] r->durations; is placed alongside other fields, so no leak.

sherpa-onnx/c-api/c-api.h (1)

662-672: Fix offline result doc comment function name

The documentation for freeing an offline recognizer result incorrectly references the online destroy function. Update the comment in sherpa-onnx/c-api/c-api.h:

  • File: sherpa-onnx/c-api/c-api.h
  • Line: 663
- ///         SherpaOnnxDestroyOnlineRecognizerResult() to free the returned
+ ///         SherpaOnnxDestroyOfflineRecognizerResult() to free the returned
scripts/go/sherpa_onnx.go (2)

523-530: Go result type gains Durations – LGTM.

Public API is extended in a backward-compatible manner for Go users (new field, no breaking changes). Make sure release notes call out the addition.


876-889: Safe conditional population of Timestamps and Durations – LGTM.

  • Guards against nil C pointers before slicing avoid panics.
  • Loops copy with proper float32 casts.
  • Durations population is independent of timestamps, which is correct.

No changes requested.

c-api-examples/nemo-parakeet-c-api.c (2)

83-87: Good cleanup ordering and completeness.

Result, stream, recognizer, then wave are destroyed in the right order. No leaks.


66-71: No overflow risk on num_samples conversion

The SherpaOnnxWave struct defines num_samples as an int32_t, which matches the int32_t n parameter in SherpaOnnxAcceptWaveformOffline. There is no wider type being implicitly converted, so the overflow concern does not apply. Feel free to disregard the suggested cast/check.

Likely an incorrect or invalid review comment.

sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc (1)

138-146: Clean split between token and duration logits looks correct.

Using the first vocab_size as token logits and the tail as duration logits aligns with the TDT output layout. Argmax over the token slice to get y is standard greedy decoding.

@csukuangfj csukuangfj requested a review from Copilot August 21, 2025 02:26

This comment was marked as outdated.

Copy link
Collaborator

@csukuangfj csukuangfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Left some minor comments.
Otherwise, it looks great!

@bradmurray-dt bradmurray-dt requested a review from Copilot August 21, 2025 02:54
@csukuangfj csukuangfj merged commit 06ae4a7 into k2-fsa:master Aug 21, 2025
1 check was pending
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds duration support to offline recognition results for TDT (Token-Duration-Time) models, specifically targeting Nemo Parakeet models. The feature provides per-token duration information alongside existing timestamps.

  • Extends the recognition result structure to include duration information across all APIs
  • Updates the TDT decoder to extract and utilize duration logits from model output
  • Adds comprehensive examples demonstrating duration usage in both C and Python

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.cc Core decoder logic to extract duration from TDT model output
sherpa-onnx/csrc/offline-transducer-decoder.h Added durations field to decoder result structure
sherpa-onnx/csrc/offline-recognizer-transducer-impl.h Duration conversion from frames to seconds
sherpa-onnx/csrc/offline-stream.h Added durations field to recognition result
sherpa-onnx/csrc/offline-stream.cc JSON serialization support for durations
sherpa-onnx/python/csrc/offline-stream.cc Python binding for durations property
sherpa-onnx/c-api/c-api.h C API structure extension for durations
sherpa-onnx/c-api/c-api.cc C API implementation for duration handling
scripts/go/sherpa_onnx.go Go API support for durations
python-api-examples/offline-nemo-parakeet-decode-file.py Python example demonstrating duration usage
c-api-examples/nemo-parakeet-c-api.c C example demonstrating duration usage
c-api-examples/CMakeLists.txt Build configuration for new C example

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

token_logits,
std::max_element(token_logits, token_logits + vocab_size)));

skip = 1;
Copy link

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable skip is unconditionally set to 1 on line 144, but then conditionally overridden on line 150. This creates confusion as the initial assignment appears unnecessary. Consider moving the skip = 1 assignment inside the else clause or after the duration calculation.

Suggested change
skip = 1;

Copilot uses AI. Check for mistakes.
duration_logits,
std::max_element(duration_logits, duration_logits + num_durations)));
skip = duration;
if (skip == 0) skip = 1;
Copy link

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default duration value of 1 should be documented or made into a named constant. This magic number affects the behavior when no duration information is available from the model.

Suggested change
if (skip == 0) skip = 1;
skip = kDefaultDuration;
int32_t duration = kDefaultDuration;
if (num_durations > 0) {
duration = static_cast<int32_t>(std::distance(
duration_logits,
std::max_element(duration_logits, duration_logits + num_durations)));
skip = duration;
if (skip == 0) skip = kDefaultDuration;

Copilot uses AI. Check for mistakes.
@coderabbitai coderabbitai bot mentioned this pull request Aug 25, 2025
@csukuangfj csukuangfj mentioned this pull request Sep 1, 2025
@coderabbitai coderabbitai bot mentioned this pull request Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants