-
Notifications
You must be signed in to change notification settings - Fork 140
Dont drop tables in incremental append runs #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I signed the CLA btw. Dont know how long it takes to update. |
Hi @stephen-up, Thank you for your contribution! I appreciate the effort you’ve put into this. I'm hesitant to approve this, as it could leave a hanging table behind. While I understand that it would be dropped in the next run, it’s still something to consider. Additionally, this solution seems to address a specific edge case rather than providing a more comprehensive approach. A broader solution for handling parallel runs would be preferable. I'll review the suggested alternative, though it currently applies only to temporary tables and not to relations created via make_intermediate_relation. Perhaps we could override make_intermediate_relation to include the invocation_id, similar to what was done in #150 and #353? We could also introduce a feature flag for this macro, defaulting to false (disabled). What do you think? |
Hi Bentsi. Leaving hanging tables does sound like something we want to avoid. Having random ids in the names like #150 or #353 do sound like solutions that ensure better concurrency, support multiple concurrent runs, but they increase the risk / cost of hanging tables. Just not dropping the shadow tables that arent even used during an incremental+append seems simpler and more desirable for me. Im happy to add a feature flag for it. |
I've added the feature flag. Is there a way to set the feature flag for every model in a dbt project? They way i've implemented it, it looks like it needs to be set in every model like... {{ config(
order_by='project_id, igid',
engine='ReplacingMergeTree(version)',
materialized='incremental',
incremental_strategy='append',
incremental_append_skip_drop_table='true'
) }} |
The feature flag I referred to in my previous reply was for the suggested |
Hi @BentsiLeviav, I've added some comments to #420. When thinking about race conditions, it gets tricky. Do you want to support concurrent full refreshes. Or a pair of a refresh and an incremental run. If you just want the latter, like me. I think this PR is perfect. The incremental + refresh seems like a common pattern. We've setup a pipeline where we have an incremental run every 1/2 hour then occasionally run full refreshes which take hours, easily leading to a clash. |
Hi, when dbt runs for a incremental+append it always drops the temporary tables that are used for full refreshes. Even if its not going to use them.
In my system, I have scheduled incremental runs of dbt that run frequently updating incremental+append tables. Whenever I do a full refresh (concurrently), the scheduled run drops the tables of the full refresh causing it to crash.
This PR stops that, by making the incremental+append runs not drop tables.
I see there have been a few other issues similar to this, but this seems like an easy minimal way to fix the issue.
Thanks