Skip to content

DISTINCT_AVG / DISTINCT_SUM broken in v6 — wrong argument type generated #39223

@jorickdefraine

Description

@jorickdefraine

Bug description

After upgrading from Superset v5 to v6, metrics using DISTINCT_AVG and DISTINCT_SUM (defined via custom metric expressions on virtual datasets) are broken. The SQL generated by v6 passes a record tuple to these functions, which PostgreSQL cannot resolve, causing the following error:

PostgreSQL Error
postgresql error: function distinct_avg(record) does not exist
LINE 2:   DISTINCT_AVG(
          ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

This may be triggered by:
Issue 1002 - The database returned an unexpected error.

Steps to Reproduce

  1. Create a virtual dataset with a column like time_to_accept and a unique key report_id
  2. Define a custom metric using DISTINCT_AVG(DISTINCT report_id, (time_to_accept)/86400)
  3. Use this metric in any chart
  4. Observe the error in v6

SQL generated by v5 (working)

SELECT DISTINCT_AVG(DISTINCT report_id, (time_to_accept)/86400) AS avg_time_to_accept
FROM (...) AS virtual_table
LIMIT 50000

SQL generated by v6 (broken)

SELECT
  DISTINCT_AVG(
    DISTINCT CASE
      WHEN report_id IS NULL THEN NULL
      WHEN (time_to_accept / 86400) IS NULL THEN NULL
      ELSE (report_id, CAST((time_to_accept) AS DOUBLE PRECISION) / 86400)
    END
  ) AS avg_time_to_accept
FROM (...) AS virtual_table
LIMIT 50000

Root Cause

In v6, Superset wraps the arguments into a record tuple (report_id, value) inside a CASE expression before passing it to DISTINCT_AVG. PostgreSQL does not recognize distinct_avg(record) as a valid function signature, hence the error. The v5 behavior of passing two separate arguments was correct.

Expected Behavior

The generated SQL should either:

  • Preserve the v5 two-argument calling convention: DISTINCT_AVG(DISTINCT report_id, value), or
  • Provide a documented migration path / replacement for DISTINCT_AVG and DISTINCT_SUM in v6

Environment

  • Superset version (broken): v6.x
  • Superset version (working): v5.x
  • Database: PostgreSQL

Additional Notes

The same issue affects DISTINCT_SUM. Example broken expression:

DISTINCT_SUM(DISTINCT report_id, total_bounty_reward_amount)

This is a regression with no documented migration path in the changelog or release notes.

Screenshots/recordings

No response

Superset version

6.0.0

Python version

3.9

Node version

16

Browser

Firefox

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Metadata

Metadata

Assignees

No one assigned

    Labels

    #bug:regressionBugs that are identified as regessionsexplore:metricsRelated to metrics of Explore

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions