[BUG] ma_sat_v0 macro is not working properly #331

JulioGarciaVW · 2025-03-13T11:20:20Z

Describe the bug

We have figure out a current bug with ma_sat_v0 macro.

First run works as expected. Bug was found in the incremental way.

In our case, in staging we don't have the creation date of the record and we are using a sysdate function (as DV recomends if you don't have the date). So, each day we check if some hashdiff have changed because the dv_load_date is always newer.
The ma_sat primary key is parent_hashkey + src_ma_key but the macro source code is checking only the parent_hashkey.

{%- if is_incremental() %}
latest_entries_in_sat AS (

    SELECT
        {{ parent_hashkey }},
        {{ ns.hdiff_alias }}
    FROM 
        {{ this }}
    QUALIFY ROW_NUMBER() OVER(PARTITION BY {{ parent_hashkey|lower }} ORDER BY {{ src_ldts }} DESC) = 1  
),
{%- endif %}

So, in our case we have two records per parent_hashkey but the query above is only keeping one of them because it's checking by parent_hashkey and not by parent_hashkey + src_ma_key.

If you combine this query with the query below, every day you are inserting one of the two records (even if they don't have any change) and you don't have the real latest one, every day you will have one of them as the latest.

records_to_insert AS (

    SELECT
        deduped_rows.{{ parent_hashkey }},
        deduped_rows.{{ ns.hdiff_alias }},
        {{ datavault4dbt.alias_all(columns=source_cols, prefix='deduped_rows') }}
    FROM deduped_rows
    {%- if is_incremental() %}
    WHERE NOT EXISTS (
        SELECT 1
        FROM latest_entries_in_sat
        WHERE {{ datavault4dbt.multikey(parent_hashkey, prefix=['latest_entries_in_sat', 'deduped_rows'], condition='=') }}
            AND {{ datavault4dbt.multikey(ns.hdiff_alias, prefix=['latest_entries_in_sat', 'deduped_rows'], condition='=') }} 
            )
    {%- endif %}

    )

Environment

dbt version: core, 1.9.0
datavault4dbt version: 1.9.0
Database/Platform: Redshift/aws

To Reproduce

SQL example_model:

select
  1 as parent_hashkey, 
  'a' as src_ma_attribute,
  'comment' as payload_field,
  'abc' as src_hashdiff,
  sysdate as dv_load_date,
  'source1' as dv_source_syste
union all
select
  1 as parent_hashkey, 
  'b' as src_ma_attribute,
  'comment2' as payload_field,
  'bcd' as src_hashdiff,
  sysdate as dv_load_date,
  'source1' as dv_source_syste

ma_sat_model:

{{ config(materialized='incremental') }}

{%- set yaml_metadata -%} 
source_model: 'example_model'
parent_hashkey: 'parent_hashkey'
src_hashdiff: 'src_hashdiff'
src_ma_key: 'src_ma_attribute'
src_payload: 
    - comment2
{%- endset -%}

{%- set metadata_dict = fromyaml(yaml_metadata) -%}

{{ datavault4dbt.ma_sat_v0(source_model=metadata_dict.get('source_model'),
                        parent_hashkey=metadata_dict.get('parent_hashkey'),
                        src_hashdiff=metadata_dict.get('src_hashdiff'),
                        src_ma_key=metadata_dict.get('src_ma_key'),
                        src_payload=metadata_dict.get('src_payload')) }}

Run the example_model.
Run ma_sat_v0.
Run again the example_model.
Run again ma_sat_v0.
Check ma_sat_v0 results

Expected behavior

Each parent_hashkey + src_ma_key are treated separately.

Screenshots

In this case I have filter by only one parent_hashkey (first column).
Second column is the hash diff. Where you can see that it's not changing in the same primary key (parent_hashkey + nation)
Fourth column (nation) is the src_ma_key.

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

JulioGarciaVW · 2025-03-26T08:25:03Z

Could you take a look?

tkirschke · 2025-04-08T05:57:47Z

Hi @JulioGarciaVW and sorry for the late reply.

The latest_entry_in_sat CTE has the only purpose to get the latest hashdiff value for each parent_hashkey. In theory, the hashdiff for all participants of one multi-active group (x rows for one hashkey+ldts with different ma_key values) would be identically, because a block hashdiff is used in the stage model.

This is whats causing the loading error on your side - your stage seems to not be configured for multiactivity, and hence does not calculate this multi-active block hashdiff. To do so, see Example 3 on the staging docs page.

You basically have to add this config to your stage yaml:

multi_active_config:
    multi_active_key: 'src_ma_attribute'
    main_hashkey_column: 'parent_hashkey'

Let me know if the delta loading process now works!

JulioGarciaVW · 2025-04-08T09:46:17Z

Understood.
Our problem is we are not ussing staging macros because we have the same problem is posted here.
We have data with leading and trailing spaces that are really important and staging macro is trimming everything.

Are you working in 319 issue? If you add some flag to keep spaces we can change to use staging macros.

JulioGarciaVW · 2025-04-08T11:46:20Z

I have also seen there can be some problem with SHA256 algorithm in redshift?

it looks like it's replaced by SHA256 but this doesn't exists in redshift, the correct one has to be SHA2.

{%- macro default__hash_default_values(hash_function, hash_datatype) -%}

    {%- set dict_result = {} -%}
    {%- set hash_alg = '' -%}
    {%- set unknown_key = '' -%}
    {%- set error_key = '' -%}

    {%- if hash_function == 'MD5' -%}
        {%- set hash_alg = 'MD5' -%}
        {%- set unknown_key = '!00000000000000000000000000000000' -%}
        {%- set error_key = '!ffffffffffffffffffffffffffffffff' -%}
    {%- elif hash_function == 'SHA' or hash_function == 'SHA1' -%}
        {%- set hash_alg = 'SHA1' -%}
        {%- set unknown_key = '!0000000000000000000000000000000000000000' -%}
        {%- set error_key = '!ffffffffffffffffffffffffffffffffffffffff' -%}
    {%- elif hash_function == 'SHA2' or hash_function == 'SHA256' -%}
        {%- set hash_alg = 'SHA256' -%}
        {%- set unknown_key = '!0000000000000000000000000000000000000000000000000000000000000000' -%}
        {%- set error_key = '!ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff' -%}
    {%- endif -%}

    {%- do dict_result.update({"hash_alg": hash_alg, "unknown_key": unknown_key, "error_key": error_key }) -%}

    {{ return(dict_result | tojson ) }}

{%- endmacro -%}

This is the macro because there is no one specific to redshift. I guess that hash_alg would have to be 'SHA2' for redshift.

tkirschke · 2025-04-08T12:28:33Z

We have data with leading and trailing spaces that are really important and staging macro is trimming everything.

Are you working in 319 issue? If you add some flag to keep spaces we can change to use staging macros.

Okay understood. This is an existing feature request, where active development is not currently planned yet.

However, i described a workaround in this issue, which involves derived columns in the staging model, to replace these whitespaces with alternative characters. If you would implement it like that, you could use the block hashdiff and the multi-active should work fine.

However, if you do not want to do this, you would need to modifiy your current hashing script to use a LISTAGG / STRINGAGG for hash calculation. More details can be found here under the keyword "delta check on".

JulioGarciaVW · 2025-04-08T13:22:17Z

ok, thanks for your quick answer @tkirschke !

We will try with the second option and when you add the feature request about leading and trailing spaces maybe we will refactor the solution with that approach.

Thanks for oppenind the redshif issue as well!

tkirschke · 2025-04-08T14:52:48Z

ok, thanks for your quick answer @tkirschke !

We will try with the second option and when you add the feature request about leading and trailing spaces maybe we will refactor the solution with that approach.

Thanks for oppenind the redshif issue as well!

Great, let me know if the satellite works as intended then. If so, I would appreciate if you close this bug! :)

JulioGarciaVW · 2025-04-09T05:54:27Z

Solved!

JulioGarciaVW added the bug Something isn't working label Mar 13, 2025

tkirschke mentioned this issue Apr 8, 2025

SHA256 in Redshift not working #333

Closed

JulioGarciaVW closed this as completed Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] ma_sat_v0 macro is not working properly #331

[BUG] ma_sat_v0 macro is not working properly #331

JulioGarciaVW commented Mar 13, 2025 •

edited

Loading

JulioGarciaVW commented Mar 26, 2025

Uh oh!

tkirschke commented Apr 8, 2025

Uh oh!

JulioGarciaVW commented Apr 8, 2025

Uh oh!

JulioGarciaVW commented Apr 8, 2025

Uh oh!

tkirschke commented Apr 8, 2025

Uh oh!

JulioGarciaVW commented Apr 8, 2025

Uh oh!

tkirschke commented Apr 8, 2025

Uh oh!

JulioGarciaVW commented Apr 9, 2025

Uh oh!

[BUG] ma_sat_v0 macro is not working properly #331

[BUG] ma_sat_v0 macro is not working properly #331

Comments

JulioGarciaVW commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JulioGarciaVW commented Mar 26, 2025

Uh oh!

tkirschke commented Apr 8, 2025

Uh oh!

JulioGarciaVW commented Apr 8, 2025

Uh oh!

JulioGarciaVW commented Apr 8, 2025

Uh oh!

tkirschke commented Apr 8, 2025

Uh oh!

JulioGarciaVW commented Apr 8, 2025

Uh oh!

tkirschke commented Apr 8, 2025

Uh oh!

JulioGarciaVW commented Apr 9, 2025

Uh oh!

JulioGarciaVW commented Mar 13, 2025 •

edited

Loading