Skip to content

Conversation

@ricofurtado
Copy link
Contributor

@ricofurtado ricofurtado commented Feb 9, 2026

Introduce two new options, merge_peers and always_emit_headings, to enhance the functionality of the ChunkDoclingDocumentComponent. These options allow for merging undersized chunks with shared metadata and emitting headings for empty sections, respectively.

Summary by CodeRabbit

  • New Features
    • Added "Merge peers" option to merge undersized chunks with shared metadata (enabled by default).
    • Added "Always emit headings" option to emit headings for empty sections (disabled by default).

…gDocumentComponent `pragma: allowlist secret`
@github-actions github-actions bot added the community Pull Request from an external contributor label Feb 9, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Two new boolean input parameters (merge_peers and always_emit_headings) are added to the ChunkDoclingDocumentComponent with visibility logic that displays them only when HybridChunker is the active chunker. These parameters are passed to HybridChunker initialization during document processing.

Changes

Cohort / File(s) Summary
New ChunkDoclingDocumentComponent inputs
src/lfx/src/lfx/_assets/component_index.json, src/lfx/src/lfx/components/docling/chunk_docling_document.py
Added two new boolean inputs (merge_peers, always_emit_headings) with display names, descriptions, and default values. Extended build-config logic to show/hide these inputs when HybridChunker is active. Updated HybridChunker instantiation to receive these parameters from component state.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error PR introduces new functionality (merge_peers and always_emit_headings parameters) without any corresponding test coverage. Add unit tests for parameter validation and integration tests verifying HybridChunker receives correct parameters. Address API incompatibility of always_emit_headings parameter.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage ⚠️ Warning PR adds two new boolean parameters to ChunkDoclingDocumentComponent but includes zero test files or test modifications. Add unit tests validating parameter storage, visibility toggle logic, and HybridChunker instantiation; add integration tests for full chunk_documents() workflow.
Test File Naming And Structure ⚠️ Warning The pull request adds two new component options (merge_peers and always_emit_headings) but includes no test files following standard naming conventions (test_*.py or *.test.ts). Add test_chunk_docling_document.py with tests validating the new options are properly exposed, passed to HybridChunker, and handle edge cases including unsupported parameters.
Excessive Mock Usage Warning ❓ Inconclusive No test files for ChunkDoclingDocumentComponent were found in the pull request to assess mock usage patterns. Provide test file paths for ChunkDoclingDocumentComponent or clarify if this PR includes test coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and accurately summarizes the main change: adding two new options (merge_peers and always_emit_headings) to the ChunkDoclingDocumentComponent.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chunk-docling-document-component-changes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 18%
18.41% (5919/32150) 12.03% (3017/25069) 12.36% (855/6913)

Unit Test Results

Tests Skipped Failures Errors Time
2288 0 💤 0 ❌ 0 🔥 31.642s ⏱️

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 35.22%. Comparing base (e10cdbf) to head (ffe701a).
⚠️ Report is 1 commits behind head on main.

❌ Your project check has failed because the head coverage (42.12%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11684      +/-   ##
==========================================
+ Coverage   35.20%   35.22%   +0.01%     
==========================================
  Files        1521     1521              
  Lines       72922    72923       +1     
  Branches    10936    10936              
==========================================
+ Hits        25674    25686      +12     
+ Misses      45853    45843      -10     
+ Partials     1395     1394       -1     
Flag Coverage Δ
backend 55.70% <ø> (+0.04%) ⬆️
frontend 16.65% <ø> (-0.01%) ⬇️
lfx 42.12% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/lfx/src/lfx/_assets/component_index.json`:
- Line 64090: Summary: remove the unsupported always_emit_headings parameter and
input. Fix: in ChunkDoclingDocumentComponent remove the Message/Bool Input
definition for "always_emit_headings" from the inputs list and remove any
build_config toggles referencing "always_emit_headings" in update_build_config;
also remove the argument always_emit_headings=bool(self.always_emit_headings)
passed into the HybridChunker() instantiation inside chunk_documents (and any
uses of self.always_emit_headings). References to change: the inputs list entry
named "always_emit_headings", the update_build_config branch that sets
build_config["always_emit_headings"][...] and the HybridChunker(...) call in
chunk_documents.

In `@src/lfx/src/lfx/components/docling/chunk_docling_document.py`:
- Around line 183-187: The instantiation of HybridChunker is passing an
unsupported parameter always_emit_headings which will raise a TypeError; remove
the always_emit_headings argument from the HybridChunker(...) call (leave
tokenizer=tokenizer and merge_peers=bool(self.merge_peers)), or if you intend to
control heading inclusion, replace it with the supported parameter
include_heading_hierarchy and pass the appropriate boolean (e.g.,
include_heading_hierarchy=bool(self.include_heading_hierarchy)) so the
HybridChunker call uses only valid kwargs.
🧹 Nitpick comments (1)
src/lfx/src/lfx/_assets/component_index.json (1)

72454-72454: Unrelated dependency version bumps included in this PR.

Hunks 5–14 update google to 2.5.0 and vlmrun to 0.5.4 across multiple components. These changes are unrelated to the stated PR objective (adding merge_peers and always_emit_headings). Consider whether these should be in a separate PR for cleaner change tracking, or confirm they were intentionally bundled (e.g., via an index regeneration script).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Pull Request from an external contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant