Skip to content

Conversation

@adavoudi
Copy link

Problem

  • dbt-athena is missing several Lake Formation capabilities that exist in the dbt-glue adapter (database-level tags, better data cell filter controls, more flexible behavior for tag/filter removal).
  • Current LF tag behavior is effectively “authoritative” and can break inherited tags and external tag management.
  • Data cell filters don’t support all_rows, column include/exclude patterns, or a non-destructive mode.
  • There are existing unit-test warnings around catalog filtering in AthenaAdapter.
  • Sample mode tests using naive datetimes cause an integration test failure in dbt-athena/tests/functional/adapter/test_sample_mode.py:40.

Many of the concepts and shapes in this PR are borrowed from, or closely aligned with, the implementation and docs in aws-samples/dbt-glue

Solution

  • Lake Formation tags

    • Extend lf_tags_config with:
      • drop_existing (default false) to choose between additive vs authoritative tag management.
      • tags_database for database-level tags.
      • inherited_tags (clarified) to protect inherited keys from removal when drop_existing: true.
    • Update tag processing to:
      • Optionally remove tags on database/table/columns when drop_existing: true (respecting inherited_tags).
      • Only add/update tags when drop_existing: false.
    • Update README to document the new options, clarify database vs table vs column behavior, and note that:
      • dbt manages LF tag associations only.
      • All LF tags must exist in Lake Formation ahead of time.
  • Data cell filters (lf_grants)

    • Extend data_cell_filters with:
      • drop_existing (default false) to control whether filters missing from config are deleted.
      • Per-filter options:
        • row_filter or all_rows: true.
        • column_names or excluded_column_names (omit both for all columns).
    • Add validation so invalid row_filter / all_rows combinations fail fast.
    • Improve update logic to only update filters when row/column semantics actually change.
    • Document the new options and provide a fuller example (row predicate + column scoping).
  • Adapter / tests

    • dbt-athena/src/dbt/adapters/athena/impl.py:
      • Add _CATALOG_TEXT_COLUMNS and _catalog_filter_table using _catalog_filter_schemas to address unit-test warnings around catalog filtering and text-only columns.
    • Sample mode tests:
      • Update now to datetime.datetime.now(datetime.timezone.utc) in:
        • dbt-athena/tests/functional/adapter/test_sample_mode.py
        • dbt-bigquery/tests/functional/adapter/test_sample_mode.py
      • This fixes the integration test error in dbt-athena/tests/functional/adapter/test_sample_mode.py:40 caused by naive vs timezone-aware datetime comparisons.
    • Extend Lake Formation unit tests to cover:
      • The adjusted tag response shape.
      • FilterConfig validation, API representation, and update detection.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@adavoudi adavoudi requested a review from a team as a code owner November 15, 2025 14:55
@cla-bot
Copy link

cla-bot bot commented Nov 15, 2025

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @adavoudi

@adavoudi
Copy link
Author

I just signed the CLA

@adavoudi adavoudi closed this Nov 15, 2025
@adavoudi adavoudi reopened this Nov 15, 2025
* add drop_existing toggles plus separate database tags so we only delete LF tags when explicitly requested and can tag DB and tables independently

* reshape _parse_and_log_lf_response and tag removal helpers to work with the new config and skip inherited tags, including conditional column cleanup

* enhance data cell filters with validation, all_rows/column wildcard support, and drop-existing control to avoid accidental deletions

Signed-off-by: Alireza Davoudi <davoudialireza@gmail.com>
…nges

Signed-off-by: Alireza Davoudi <davoudialireza@gmail.com>
@cla-bot cla-bot bot added the cla:yes The PR author has signed the CLA label Nov 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla:yes The PR author has signed the CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant