Skip to content

Pre-Aggregation workaround on "Average" measure breaks with external dimension #2408

@scuilla

Description

@scuilla

Problem

I'm unable to apply the workaround (https://stackoverflow.com/questions/60100720/) for the non-additive measure on my engagement rate calculation. This seems to break with the inclusion of an external cube's dimension as part of the measure. This results in the count("..._id") being stripped and applied as just a {sum_of_rows} / {id}, instead of the count.

Error: operator does not exist: numeric / character varying

Background/Setup:

  • Cube A (Organizations) has an id set via queryTransformer to filter all incoming requests to an Organization ID
  • Joins occur in the following pattern: Organizations >< OrgInfJunction >< Influencers < Channels - Twitter Channel (extends: Channels) < Tweets - Posts (extends: Tweets) - Original Posts (extends: Posts)
  • The Posts cube includes a measure that contains a dimension reference from a joined cube, Latest Twitter Snapshots. Dividing the total interactions count by the account's current (latest) follower count.

Notes:

  • I've excluded the averageEngagementEstimate from the preAggregation because it fails with the same exact error but hides the SQL behind the rollup.

Related Cube.js schema

cube(`Posts`, {

  extends: Tweets,
  sql: `
    SELECT
      *
    FROM
      ${Tweets.sql()} as ${Tweets}
    WHERE 
      "authorId" = ANY (
        SELECT
          DISTINCT("accountId")
        FROM
          ${TwitterChannels.sql()} as ${TwitterChannels}
      )
  `,

  joins: {
    TwitterChannels: {
      relationship: `belongsTo`,
      sql: `${CUBE.authorId} = ${TwitterChannels.accountId}`
    },
    LatestTwitterSnapshots: {
      relationship: `hasOne`,
      sql: `${CUBE.authorId} = ${LatestTwitterSnapshots.accountId}`
    },
  },

  measures: {
    totalInteractions: {
      sql: interactionsCountEstimate,
      type: `sum`,
      meta: {
        chartLabel: 'Total Interactions',
        metricLabel: 'Interactions',
        iconFamily: 'material-icons',
        iconName: 'TouchAppRounded'
      }
    },
    averageInteractions: {
      // Workaround PreAgg on Avg: https://stackoverflow.com/questions/60100720/
      sql: `(${totalInteractions} / ${count})`,
      type: `number`,
      meta: {
        chartLabel: 'Avg Interactions'
      }
    },
    totalEngagements: {
      sql: engagementEstimate,
      type: `sum`,
      shown: false
    },
    averageEngagementEstimate: {
      // Workaround PreAgg on Avg: https://stackoverflow.com/questions/60100720/
      sql: `(${totalEngagements} / ${count})`,
      type: `number`,
      format: `percent`,
      meta: {
        chartLabel: 'Avg Engagement'
      }
    },
  },

  dimensions: {
    currentFollowerCount: {
      sql: `${LatestTwitterSnapshots.channelFollowerCount}`,
      type: `number`,
    },
    interactionsCountEstimate: {
      sql: `(COALESCE(${favoriteCount}, 0)
            + COALESCE(${repliesCount}, 0)
            + COALESCE(${retweetCount}, 0))`,
      type: `number`,
    },
    engagementEstimate: {
      sql: `
        round(
          cast(${interactionsCountEstimate} as decimal)
          / CASE WHEN ${currentFollowerCount}<>0 THEN ${currentFollowerCount} ELSE 1 END
          * 100
          , 1
        )
      `,
      type: `number`,
      format: `percent`,
    },
  },

  preAggregations: {
    main: {
      type: `originalSql`,
    },
    byAuthorDaily: {
      type: `rollup`,
      external: false,
      measureReferences: [
        count,
        totalInteractions,
        averageInteractions,
      ],
      dimensionReferences: [
        authorId,
        TwitterChannels.accountId,
        TwitterChannels.influencerId,
        Influencers.id,
        OrgInfs.influencerId,
        OrgInfs.organizationId,
        Organizations.id
      ],
      timeDimensionReference: createdAt,
      granularity: `day`,
      partitionGranularity: `month`,
      refreshKey: {
        every: `1 hour`,
        incremental: true,
        updateWindow: `7 day`,
      },
    }
  },

});

Failing: Generated SQL with (sum(...) / count) workaround:

SELECT
  q_0."original_posts__created_at_day",
  "original_posts__average_engagement_estimate" "original_posts__average_engagement_estimate"
FROM
  (
    SELECT
      "keys"."original_posts__created_at_day",
      "original_posts__average_engagement_estimate" "original_posts__average_engagement_estimate"
    FROM
      (
        SELECT
          DISTINCT date_trunc(
            'day',
            (
              "original_posts_key___original_posts"."createdAt" :: timestamptz AT TIME ZONE 'UTC'
            )
          ) "original_posts__created_at_day",
          "original_posts_key___original_posts"."id" "original_posts__id"
        FROM
          dev_pre_aggregations.original_posts_main AS "original_posts_key___original_posts"
          LEFT JOIN dev_pre_aggregations.latest_twitter_snapshots_main AS "original_posts_key___latest_twitter_snapshots" ON "original_posts_key___original_posts"."authorId" = "original_posts_key___latest_twitter_snapshots"."accountId"
          LEFT JOIN dev_pre_aggregations.twitter_channels_main AS "original_posts_key___twitter_channels" ON "original_posts_key___original_posts"."authorId" = "original_posts_key___twitter_channels"."accountId"
          LEFT JOIN dev_pre_aggregations.influencers_main AS "original_posts_key___influencers" ON "original_posts_key___twitter_channels"."influencerId" = "original_posts_key___influencers"."id"
          LEFT JOIN dev_pre_aggregations.org_infs_main AS "original_posts_key___org_infs" ON "original_posts_key___org_infs"."influencerId" = "original_posts_key___influencers"."id"
          LEFT JOIN dev_pre_aggregations.organizations_main AS "original_posts_key___organizations" ON "original_posts_key___org_infs"."organizationId" = "original_posts_key___organizations"."id"
        WHERE
          (
            "original_posts_key___original_posts"."createdAt" >= $ 1 :: timestamptz
            AND "original_posts_key___original_posts"."createdAt" <= $ 2 :: timestamptz
          )
          AND ("original_posts_key___organizations"."id" = $ 3)
      ) AS "keys"
      LEFT JOIN (
        SELECT
          "original_posts_measure_join___original_posts"."id" "original_posts__id",
          (
            round(
              cast(
                (
                  COALESCE(
                    "original_posts_measure_join___original_posts"."favoriteCount",
                    0
                  ) + COALESCE(
                    "original_posts_measure_join___original_posts"."repliesCount",
                    0
                  ) + COALESCE(
                    "original_posts_measure_join___original_posts"."retweetCount",
                    0
                  )
                ) as decimal
              ) / CASE
                WHEN "original_posts_measure_join___latest_twitter_snapshots"."channelFollowerCount" <> 0 THEN "original_posts_measure_join___latest_twitter_snapshots"."channelFollowerCount"
                ELSE 1
              END * 100,
              1
            ) / "original_posts_measure_join___original_posts"."id"
          ) "original_posts__average_engagement_estimate"
        FROM
          dev_pre_aggregations.original_posts_main AS "original_posts_measure_join___original_posts"
          LEFT JOIN dev_pre_aggregations.latest_twitter_snapshots_main AS "original_posts_measure_join___latest_twitter_snapshots" ON "original_posts_measure_join___original_posts"."authorId" = "original_posts_measure_join___latest_twitter_snapshots"."accountId"
      ) AS "original_posts_key___original_posts" ON "keys"."original_posts__id" = "original_posts_key___original_posts"."original_posts__id"
    GROUP BY
      1
  ) as q_0
ORDER BY
  1 ASC
LIMIT
  10000

Working Measure (doesn't allow preAgg though):

cube(`Posts`, {
  ...
  measures: {
    averageEngagementEstimate: {
      sql: engagementEstimate,
      type: `avg`,
      format: `percent`,
      meta: {
        chartLabel: 'Avg Engagement'
      }
    },
  }
});    

Working: Generated SQL using AVG:

SELECT
  q_0."original_posts__created_at_day",
  "original_posts__average_engagement_estimate" "original_posts__average_engagement_estimate"
FROM
  (
    SELECT
      "keys"."original_posts__created_at_day",
      avg("original_posts__average_engagement_estimate") "original_posts__average_engagement_estimate"
    FROM
      (
        SELECT
          DISTINCT date_trunc(
            'day',
            (
              "original_posts_key___original_posts"."createdAt" :: timestamptz AT TIME ZONE 'UTC'
            )
          ) "original_posts__created_at_day",
          "original_posts_key___original_posts"."id" "original_posts__id"
        FROM
          dev_pre_aggregations.original_posts_main AS "original_posts_key___original_posts"
          LEFT JOIN dev_pre_aggregations.latest_twitter_snapshots_main AS "original_posts_key___latest_twitter_snapshots" ON "original_posts_key___original_posts"."authorId" = "original_posts_key___latest_twitter_snapshots"."accountId"
          LEFT JOIN dev_pre_aggregations.twitter_channels_main AS "original_posts_key___twitter_channels" ON "original_posts_key___original_posts"."authorId" = "original_posts_key___twitter_channels"."accountId"
          LEFT JOIN dev_pre_aggregations.influencers_main AS "original_posts_key___influencers" ON "original_posts_key___twitter_channels"."influencerId" = "original_posts_key___influencers"."id"
          LEFT JOIN dev_pre_aggregations.org_infs_main AS "original_posts_key___org_infs" ON "original_posts_key___org_infs"."influencerId" = "original_posts_key___influencers"."id"
          LEFT JOIN dev_pre_aggregations.organizations_main AS "original_posts_key___organizations" ON "original_posts_key___org_infs"."organizationId" = "original_posts_key___organizations"."id"
        WHERE
          (
            "original_posts_key___original_posts"."createdAt" >= $ 1 :: timestamptz
            AND "original_posts_key___original_posts"."createdAt" <= $ 2 :: timestamptz
          )
          AND ("original_posts_key___organizations"."id" = $ 3)
      ) AS "keys"
      LEFT JOIN (
        SELECT
          "original_posts_measure_join___original_posts"."id" "original_posts__id",
          round(
            cast(
              (
                COALESCE(
                  "original_posts_measure_join___original_posts"."favoriteCount",
                  0
                ) + COALESCE(
                  "original_posts_measure_join___original_posts"."repliesCount",
                  0
                ) + COALESCE(
                  "original_posts_measure_join___original_posts"."retweetCount",
                  0
                )
              ) as decimal
            ) / CASE
              WHEN "original_posts_measure_join___latest_twitter_snapshots"."channelFollowerCount" <> 0 THEN "original_posts_measure_join___latest_twitter_snapshots"."channelFollowerCount"
              ELSE 1
            END * 100,
            1
          ) "original_posts__average_engagement_estimate"
        FROM
          dev_pre_aggregations.original_posts_main AS "original_posts_measure_join___original_posts"
          LEFT JOIN dev_pre_aggregations.latest_twitter_snapshots_main AS "original_posts_measure_join___latest_twitter_snapshots" ON "original_posts_measure_join___original_posts"."authorId" = "original_posts_measure_join___latest_twitter_snapshots"."accountId"
      ) AS "original_posts_key___original_posts" ON "keys"."original_posts__id" = "original_posts_key___original_posts"."original_posts__id"
    GROUP BY
      1
  ) as q_0
ORDER BY
  1 ASC
LIMIT
  10000

Metadata

Metadata

Labels

questionThe issue is a question. Please use Stack Overflow for questions.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions