-
Notifications
You must be signed in to change notification settings - Fork 30
[BUG] ma_sat_v0 macro is not working properly #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you take a look? |
Hi @JulioGarciaVW and sorry for the late reply. The This is whats causing the loading error on your side - your stage seems to not be configured for multiactivity, and hence does not calculate this multi-active block hashdiff. To do so, see Example 3 on the staging docs page. You basically have to add this config to your stage yaml:
Let me know if the delta loading process now works! |
Understood. Are you working in 319 issue? If you add some flag to keep spaces we can change to use staging macros. |
I have also seen there can be some problem with SHA256 algorithm in redshift? it looks like it's replaced by SHA256 but this doesn't exists in redshift, the correct one has to be SHA2.
This is the macro because there is no one specific to redshift. I guess that hash_alg would have to be 'SHA2' for redshift. |
Okay understood. This is an existing feature request, where active development is not currently planned yet. However, i described a workaround in this issue, which involves derived columns in the staging model, to replace these whitespaces with alternative characters. If you would implement it like that, you could use the block hashdiff and the multi-active should work fine. However, if you do not want to do this, you would need to modifiy your current hashing script to use a LISTAGG / STRINGAGG for hash calculation. More details can be found here under the keyword "delta check on". |
ok, thanks for your quick answer @tkirschke ! We will try with the second option and when you add the feature request about leading and trailing spaces maybe we will refactor the solution with that approach. Thanks for oppenind the redshif issue as well! |
Great, let me know if the satellite works as intended then. If so, I would appreciate if you close this bug! :) |
Solved! |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
We have figure out a current bug with ma_sat_v0 macro.
First run works as expected. Bug was found in the incremental way.
In our case, in staging we don't have the creation date of the record and we are using a sysdate function (as DV recomends if you don't have the date). So, each day we check if some hashdiff have changed because the dv_load_date is always newer.
The ma_sat primary key is parent_hashkey + src_ma_key but the macro source code is checking only the parent_hashkey.
So, in our case we have two records per parent_hashkey but the query above is only keeping one of them because it's checking by parent_hashkey and not by parent_hashkey + src_ma_key.
If you combine this query with the query below, every day you are inserting one of the two records (even if they don't have any change) and you don't have the real latest one, every day you will have one of them as the latest.
Environment
To Reproduce
SQL example_model:
ma_sat_model:
Expected behavior
Each parent_hashkey + src_ma_key are treated separately.
Screenshots
In this case I have filter by only one parent_hashkey (first column).
Second column is the hash diff. Where you can see that it's not changing in the same primary key (parent_hashkey + nation)
Fourth column (nation) is the src_ma_key.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: