Skip to content

Conversation

@asnare
Copy link
Contributor

@asnare asnare commented Jan 6, 2026

Changes

What does this PR do?

This PR makes the following changes related to logging within the project:

Relevant implementation details

The base_install.py entry point for installation has been moved into install.py: this where it was logging from, and where it's normally located in blueprint-based applications.

Caveats/things to watch out for when reviewing:

Linked issues

Resolves: #2211
Relates to: #2167

Functionality

Modified existing commands:

  • databricks labs lakebridge * (all subcommands)
  • databricks labs install lakebridge
  • databricks labs uninstall lakebridge

Tests

  • manually tested
  • existing unit tests
  • existing integration tests

Manual Tests

  • databricks labs install lakebridge
  • databricks labs install lakebridge@consistent-log-initialization
  • rm -fr ~/.databricks/labs
  • databricks labs install lakebridge@consistent-log-initialization
  • databricks labs lakebridge install-transpile --debug
  • databricks labs install lakebridge@consistent-log-initialization
  • databricks labs uninstall lakebridge

@asnare asnare self-assigned this Jan 6, 2026
@asnare asnare added bug Something isn't working tech debt design flaws and other cascading effects labels Jan 6, 2026
@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 22.80702% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.96%. Comparing base (fa47bd7) to head (8db65a2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...s/assessments/synapse/dedicated_sqlpool_extract.py 0.00% 9 Missing ⚠️
.../assessments/synapse/monitoring_metrics_extract.py 0.00% 7 Missing ⚠️
...resources/assessments/synapse/workspace_extract.py 0.00% 7 Missing ⚠️
src/databricks/labs/lakebridge/cli.py 33.33% 6 Missing ⚠️
.../assessments/synapse/serverless_sqlpool_extract.py 0.00% 5 Missing ⚠️
...rc/databricks/labs/lakebridge/reconcile/execute.py 0.00% 4 Missing ⚠️
src/databricks/labs/lakebridge/__init__.py 60.00% 2 Missing ⚠️
...urces/assessments/synapse/common/duckdb_helpers.py 0.00% 2 Missing ⚠️
.../labs/lakebridge/assessments/dashboards/execute.py 0.00% 1 Missing ⚠️
.../resources/assessments/synapse/common/functions.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2216      +/-   ##
==========================================
- Coverage   64.05%   63.96%   -0.09%     
==========================================
  Files         100       99       -1     
  Lines        8625     8626       +1     
  Branches      889      888       -1     
==========================================
- Hits         5525     5518       -7     
- Misses       2928     2936       +8     
  Partials      172      172              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

✅ 129/129 passed, 8 flaky, 5 skipped, 19m42s total

Flaky tests:

  • 🤪 test_installs_and_runs_local_bladebridge (19.92s)
  • 🤪 test_installs_and_runs_pypi_bladebridge (22.68s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (19.889s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (19.998s)
  • 🤪 test_transpiles_informatica_to_sparksql (21.446s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (21.202s)
  • 🤪 test_transpile_teradata_sql (21.16s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (5.639s)

Running from acceptance #3446

@asnare asnare marked this pull request as ready for review January 6, 2026 17:16
@asnare asnare requested a review from a team as a code owner January 6, 2026 17:16
self._logger_instance = self._logger
self._logger_instance.setLevel(logging.INFO)
return self._logger_instance
def _log_level(self, raw: str) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we doing this change here because it is faster than releasing blueprint with that change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I'm hoping that it's a temporary. (My plan is to implement this workaround in blueprint next week if I don't have a better idea about when it will fixed upstream.)


# Ensure that anything that imports this (or lower) submodules triggers setup of the blueprint
# logging.
install_logger()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should move this to the entry points similar to bladebridge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this means we keep reinitializing the logger over and over again instead of just once. it feels odd and will break some handlers e.g. file handlers. I know we dont use it and will be blocked in the future to use if we want to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-abulazm: That's not how imports work in python: modules are effectively imported once and cached. The way import works is:

  1. Python looks inside sys.modules for the module. If it's there, import just uses the already-initialised module.
  2. If not, locate via sys.path and initialise, which "runs" the module. (This is where the above install_logger() call happens.
  3. Add the initialised module to sys.modules, after which step 1 will always find it.

(Without this approach things like using local imports to break cycles would not work.)

This means that the install_logger() above is only invoked once, not over and over again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok got it. I also googled a bit more and looks like import mypkg vs import src.mypkg will cause a module to run twice. ideally this should not happen otherwise it would be a bug

Comment on lines 446 to 447
databricks_log_level = logging.DEBUG if is_in_debug() else logging.INFO
logging.getLogger("databricks").setLevel(databricks_log_level)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
databricks_log_level = logging.DEBUG if is_in_debug() else logging.INFO
logging.getLogger("databricks").setLevel(databricks_log_level)
log_level = logging.DEBUG if is_in_debug() else logging.INFO
install_logger(log_level)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this a bit more? I don't understand why you're suggesting this change.

Here we're setting the filtering log-level on the databricks logger (which databricks.* delegate to by default)

This is different to the log-level that we pass to install_logger(): that one configures the minimum level that will be written out to the console: it defaults to DEBUG so that anything that reaches it will be written out, which is the conventional way to configure the handlers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blueprint configures databricks logger depending on the requested log level from the user. why do we need to set it again?

we should also configure the root logger with the right log level. if we do not set it and only set databricks logger, the user might get the wrong DEBUG logs if they run with info as root logger allows DEBUG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-abulazm: Blueprint does not configure the logger level, it only configures the level of the handler it attaches. Handlers and loggers have their own (independent) levels. This is described in the docstring as such:

    The root logger will be modified:

     - Its logging level will be left as-is.
     - All existing handlers will be removed.
     - A new handler will be installed with our custom formatter. It will be configured to emit logs at the given level
       (default: DEBUG) or higher, to the specified stream (default: sys.stderr).

Note that the level is set on the handler, we leave the logging level as it was.

we should also configure the root logger with the right log level. if we do not set it and only set databricks logger, the user might get the wrong DEBUG logs if they run with info as root logger allows DEBUG

As documented here Python's logging system defaults to WARNING and we don't touch that. When the user requests --debug the intent is that this only applies to the databricks.* modules/loggers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

App#__call__() configures the databricks logger which gets called in cli.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning that we could leverage that instead of adding our own method

Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will run e2e tests with this branch before we can merge. I'm unsure about the log level behaviour.

Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, now that tests pass.

Copy link
Collaborator

@gueniai gueniai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@m-abulazm m-abulazm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@m-abulazm m-abulazm added this pull request to the merge queue Jan 21, 2026
Merged via the queue into main with commit a678bfe Jan 21, 2026
6 checks passed
@m-abulazm m-abulazm deleted the consistent-log-initialization branch January 21, 2026 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working tech debt design flaws and other cascading effects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: CLI mishandles --log-level= and --debug

4 participants