Skip to content

Error counts for grouped exception notifications are updated inconsistently, causing duplicate alerts #534

@maivy

Description

@maivy

Steps to reproduce

  1. Use Redis as a cache to group notifications and deploy app onto Heroku using Heroku-20 Stack. (This bug does not appear during local development.)
  2. Trigger an exception 8 times within the default grouping period of 5 minutes. See that notifications are triggered appriopriately.
  3. Wait for the default grouping period of 5 minutes to pass.
  4. Do step 2 again. See that notifications are not triggered appropriately. The second alert is duplicated.
Step 2 Step 4
before-exception-notifications after-exception-notifications

Expected behavior

Notifications should always be triggered appropriately and not duplicated.

Actual behavior

Sometimes, notifications are duplicated. When investigating the Redis cache, it looks like ExceptionNotifier sometimes switches from updating and using the error count from the message-based key to the error count from the backtrace-based key even though it is supposed to just hit the message-based key consistently.

For example, in step 4, alerts are triggered in the following way:

  1. Trigger first exception:
    message-based key has error count of 1
    backtrace-based key has error count of 1
    First alert is triggered
  2. Trigger second exception
    message-based key has error count of 2
    backtrace-based key has error count of 1
    Second alert is triggered with "(2 times)" in title
  3. Trigger third exception
    message-based key has error count of 3
    backtrace-based key has error count of 1
    No alert is triggered
  4. Trigger fourth exception
    message-based key has error count of 3
    backtrace-based key has error count of 2
    Third alert is triggered with "(2 times)" in title
  5. Trigger fifth exception
    message-based key has error count of 3
    backtrace-based key has error count of 3
    No alert is triggered
  6. Trigger sixth exception
    message-based key has error count of 4
    backtrace-based key has error count of 3
    Fourth alert is triggered with "(4 times)" in title

System configuration

Rails version: 6.1.7.3

Ruby version: 3.0.5

Other configurations:
Sidekiq: 6.5.9
Redis: >= 4.5.0, < 5 (we get Redis from importing Sidekiq)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions