Skip to content

Refactor recon intermediate persistence#2230

Merged
m-abulazm merged 22 commits intomainfrom
refactor/improve-recon-persistence
Jan 30, 2026
Merged

Refactor recon intermediate persistence#2230
m-abulazm merged 22 commits intomainfrom
refactor/improve-recon-persistence

Conversation

@m-abulazm
Copy link
Contributor

@m-abulazm m-abulazm commented Jan 15, 2026

Changes

What does this PR do?

  • Clean checkpoint volume after reconcile runs
  • Remove usage of overwrite write mode
  • use delta volumes if running on databricks instead of parquet
  • Refactor interface to hide implementation details and be more generic
  • add TODO to mark where we need to persist to delta instead of hitting source system

Caveats/things to watch out for when reviewing:

On Serverless we cannot use cache/persist, so we use Delta writes acting as materialization boundaries

Linked issues

Resolves #1056
Advances #1905, #1438

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 0% with 85 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.82%. Comparing base (cff1b55) to head (829d53b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...abricks/labs/lakebridge/reconcile/recon_capture.py 0.00% 43 Missing ⚠️
...ridge/reconcile/trigger_recon_aggregate_service.py 0.00% 25 Missing ⚠️
...labs/lakebridge/reconcile/trigger_recon_service.py 0.00% 10 Missing ⚠️
...rc/databricks/labs/lakebridge/reconcile/compare.py 0.00% 4 Missing ⚠️
...bricks/labs/lakebridge/reconcile/reconciliation.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2230      +/-   ##
==========================================
- Coverage   63.95%   63.82%   -0.13%     
==========================================
  Files          99       99              
  Lines        8644     8661      +17     
  Branches      890      890              
==========================================
  Hits         5528     5528              
- Misses       2944     2961      +17     
  Partials      172      172              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Jan 15, 2026

✅ 130/130 passed, 9 flaky, 5 skipped, 30m37s total

Flaky tests:

  • 🤪 test_installs_and_runs_local_bladebridge (21.792s)
  • 🤪 test_installs_and_runs_pypi_bladebridge (29.594s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (17.51s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (17.451s)
  • 🤪 test_transpiles_informatica_to_sparksql (18.495s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (21.975s)
  • 🤪 test_transpile_teradata_sql (6.64s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (5.896s)
  • 🤪 test_recon_databricks_job_succeeds (20m46.509s)

Running from acceptance #3546

@m-abulazm m-abulazm force-pushed the refactor/improve-recon-persistence branch from f112109 to fb290c2 Compare January 15, 2026 15:23
@m-abulazm m-abulazm marked this pull request as ready for review January 21, 2026 12:25
@m-abulazm m-abulazm requested a review from a team as a code owner January 21, 2026 12:25
@m-abulazm m-abulazm added feat/recon making sure that remorphed query produces the same results as original internal technical pr's not end user facing labels Jan 21, 2026
@m-abulazm m-abulazm requested a review from BesikiML January 23, 2026 10:38
Copy link
Contributor

@BesikiML BesikiML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@m-abulazm
Copy link
Contributor Author

Followup issue created to tackle the TODOs #2257

Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will add a FAQ about retaining the governance model when reconciling PII information, so any failures to clean up to uc volume shouldn't result in any unintended consiquence

@m-abulazm m-abulazm added this pull request to the merge queue Jan 30, 2026
@sundarshankar89 sundarshankar89 removed this pull request from the merge queue due to a manual request Jan 30, 2026
@m-abulazm m-abulazm added this pull request to the merge queue Jan 30, 2026
Merged via the queue into main with commit 8ea3f39 Jan 30, 2026
6 checks passed
@m-abulazm m-abulazm deleted the refactor/improve-recon-persistence branch January 30, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat/recon making sure that remorphed query produces the same results as original internal technical pr's not end user facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TODO] for now we are overwriting the intermediate cache path. We should delete the volume in future

3 participants