Bug description
After upgrading from Superset v5 to v6, metrics using DISTINCT_AVG and DISTINCT_SUM (defined via custom metric expressions on virtual datasets) are broken. The SQL generated by v6 passes a record tuple to these functions, which PostgreSQL cannot resolve, causing the following error:
PostgreSQL Error
postgresql error: function distinct_avg(record) does not exist
LINE 2: DISTINCT_AVG(
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
This may be triggered by:
Issue 1002 - The database returned an unexpected error.
Steps to Reproduce
- Create a virtual dataset with a column like
time_to_accept and a unique key report_id
- Define a custom metric using
DISTINCT_AVG(DISTINCT report_id, (time_to_accept)/86400)
- Use this metric in any chart
- Observe the error in v6
SQL generated by v5 (working)
SELECT DISTINCT_AVG(DISTINCT report_id, (time_to_accept)/86400) AS avg_time_to_accept
FROM (...) AS virtual_table
LIMIT 50000
SQL generated by v6 (broken)
SELECT
DISTINCT_AVG(
DISTINCT CASE
WHEN report_id IS NULL THEN NULL
WHEN (time_to_accept / 86400) IS NULL THEN NULL
ELSE (report_id, CAST((time_to_accept) AS DOUBLE PRECISION) / 86400)
END
) AS avg_time_to_accept
FROM (...) AS virtual_table
LIMIT 50000
Root Cause
In v6, Superset wraps the arguments into a record tuple (report_id, value) inside a CASE expression before passing it to DISTINCT_AVG. PostgreSQL does not recognize distinct_avg(record) as a valid function signature, hence the error. The v5 behavior of passing two separate arguments was correct.
Expected Behavior
The generated SQL should either:
- Preserve the v5 two-argument calling convention:
DISTINCT_AVG(DISTINCT report_id, value), or
- Provide a documented migration path / replacement for
DISTINCT_AVG and DISTINCT_SUM in v6
Environment
- Superset version (broken): v6.x
- Superset version (working): v5.x
- Database: PostgreSQL
Additional Notes
The same issue affects DISTINCT_SUM. Example broken expression:
DISTINCT_SUM(DISTINCT report_id, total_bounty_reward_amount)
This is a regression with no documented migration path in the changelog or release notes.
Screenshots/recordings
No response
Superset version
6.0.0
Python version
3.9
Node version
16
Browser
Firefox
Additional context
No response
Checklist
Bug description
After upgrading from Superset v5 to v6, metrics using
DISTINCT_AVGandDISTINCT_SUM(defined via custom metric expressions on virtual datasets) are broken. The SQL generated by v6 passes arecordtuple to these functions, which PostgreSQL cannot resolve, causing the following error:Steps to Reproduce
time_to_acceptand a unique keyreport_idDISTINCT_AVG(DISTINCT report_id, (time_to_accept)/86400)SQL generated by v5 (working)
SQL generated by v6 (broken)
Root Cause
In v6, Superset wraps the arguments into a
recordtuple(report_id, value)inside aCASEexpression before passing it toDISTINCT_AVG. PostgreSQL does not recognizedistinct_avg(record)as a valid function signature, hence the error. The v5 behavior of passing two separate arguments was correct.Expected Behavior
The generated SQL should either:
DISTINCT_AVG(DISTINCT report_id, value), orDISTINCT_AVGandDISTINCT_SUMin v6Environment
Additional Notes
The same issue affects
DISTINCT_SUM. Example broken expression:This is a regression with no documented migration path in the changelog or release notes.
Screenshots/recordings
No response
Superset version
6.0.0
Python version
3.9
Node version
16
Browser
Firefox
Additional context
No response
Checklist