-
Notifications
You must be signed in to change notification settings - Fork 84
Survey and fix logger initialization #2216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests should not do this.
The `.level` property is undocumented, and doesn't need to be set: the logger level is then inherited. Using `.getEffectiveLevel()` is the way this should be done.
…mport. Entry points are responsible for configuring logging, not the code that does the logging.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2216 +/- ##
==========================================
- Coverage 64.05% 63.96% -0.09%
==========================================
Files 100 99 -1
Lines 8625 8626 +1
Branches 889 888 -1
==========================================
- Hits 5525 5518 -7
- Misses 2928 2936 +8
Partials 172 172 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
✅ 129/129 passed, 8 flaky, 5 skipped, 19m42s total Flaky tests:
Running from acceptance #3446 |
| self._logger_instance = self._logger | ||
| self._logger_instance.setLevel(logging.INFO) | ||
| return self._logger_instance | ||
| def _log_level(self, raw: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we doing this change here because it is faster than releasing blueprint with that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I'm hoping that it's a temporary. (My plan is to implement this workaround in blueprint next week if I don't have a better idea about when it will fixed upstream.)
|
|
||
| # Ensure that anything that imports this (or lower) submodules triggers setup of the blueprint | ||
| # logging. | ||
| install_logger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should move this to the entry points similar to bladebridge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm… why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this means we keep reinitializing the logger over and over again instead of just once. it feels odd and will break some handlers e.g. file handlers. I know we dont use it and will be blocked in the future to use if we want to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m-abulazm: That's not how imports work in python: modules are effectively imported once and cached. The way import works is:
- Python looks inside
sys.modulesfor the module. If it's there, import just uses the already-initialised module. - If not, locate via
sys.pathand initialise, which "runs" the module. (This is where the aboveinstall_logger()call happens. - Add the initialised module to
sys.modules, after which step 1 will always find it.
(Without this approach things like using local imports to break cycles would not work.)
This means that the install_logger() above is only invoked once, not over and over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok got it. I also googled a bit more and looks like import mypkg vs import src.mypkg will cause a module to run twice. ideally this should not happen otherwise it would be a bug
| databricks_log_level = logging.DEBUG if is_in_debug() else logging.INFO | ||
| logging.getLogger("databricks").setLevel(databricks_log_level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| databricks_log_level = logging.DEBUG if is_in_debug() else logging.INFO | |
| logging.getLogger("databricks").setLevel(databricks_log_level) | |
| log_level = logging.DEBUG if is_in_debug() else logging.INFO | |
| install_logger(log_level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this a bit more? I don't understand why you're suggesting this change.
Here we're setting the filtering log-level on the databricks logger (which databricks.* delegate to by default)
This is different to the log-level that we pass to install_logger(): that one configures the minimum level that will be written out to the console: it defaults to DEBUG so that anything that reaches it will be written out, which is the conventional way to configure the handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blueprint configures databricks logger depending on the requested log level from the user. why do we need to set it again?
we should also configure the root logger with the right log level. if we do not set it and only set databricks logger, the user might get the wrong DEBUG logs if they run with info as root logger allows DEBUG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m-abulazm: Blueprint does not configure the logger level, it only configures the level of the handler it attaches. Handlers and loggers have their own (independent) levels. This is described in the docstring as such:
The root logger will be modified:
- Its logging level will be left as-is.
- All existing handlers will be removed.
- A new handler will be installed with our custom formatter. It will be configured to emit logs at the given level
(default: DEBUG) or higher, to the specified stream (default: sys.stderr).
Note that the level is set on the handler, we leave the logging level as it was.
we should also configure the root logger with the right log level. if we do not set it and only set databricks logger, the user might get the wrong
DEBUGlogs if they run with info as root logger allowsDEBUG
As documented here Python's logging system defaults to WARNING and we don't touch that. When the user requests --debug the intent is that this only applies to the databricks.* modules/loggers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
App#__call__() configures the databricks logger which gets called in cli.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning that we could leverage that instead of adding our own method
sundarshankar89
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will run e2e tests with this branch before we can merge. I'm unsure about the log level behaviour.
src/databricks/labs/lakebridge/resources/assessments/synapse/common/functions.py
Show resolved
Hide resolved
sundarshankar89
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, now that tests pass.
gueniai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
m-abulazm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes
What does this PR do?
This PR makes the following changes related to logging within the project:
--log-level=warn#2167, resolves [BUG]: CLI mishandles--log-level=and--debug#2211) for the CLI is introduced so that we can once again get--debuglogs if we need them.Relevant implementation details
The
base_install.pyentry point for installation has been moved intoinstall.py: this where it was logging from, and where it's normally located in blueprint-based applications.Caveats/things to watch out for when reviewing:
Linked issues
Resolves: #2211
Relates to: #2167
Functionality
Modified existing commands:
databricks labs lakebridge *(all subcommands)databricks labs install lakebridgedatabricks labs uninstall lakebridgeTests
Manual Tests
databricks labs install lakebridgedatabricks labs install lakebridge@consistent-log-initializationrm -fr ~/.databricks/labsdatabricks labs install lakebridge@consistent-log-initializationdatabricks labs lakebridge install-transpile --debugdatabricks labs install lakebridge@consistent-log-initializationdatabricks labs uninstall lakebridge