Skip to content

Add lumping_interval Feature for Semian Circuit Breakers #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 23, 2025

Conversation

abishanan-shopify
Copy link
Contributor

@abishanan-shopify abishanan-shopify commented May 20, 2025

Changes:

  • Adjust the push_time, create_circuit_breaker, and initialize functions to incorporate lumping intervals
  • Implement tests for respective changes

Rationale:

When it comes to real-world applications, errors can sometimes occur in short-term bursts due to insignificant network disruptions, outages, or other small "blips" in the network that rapidly self-resolve.

Under the current implementation, each error, no matter how close together in time, counts individually. This means that a short burst of errors, no matter how far apart they are in time, will affect the circuit breaker by accumulating the error count. This has a few issues:

  • This makes the circuit breaker more sensitive to high-frequency error accumulations
  • It makes it harder to identify larger sustained problems beyond small instances of noise
  • Small noise has the ability to break the circuit entirely
  • The system is impacted at a larger scale, since the circuit breaker can open and close for insignificant issues - decreasing resiliency

Implementation:

  • Added a "lump interval" which groups multiple errors that occur within a set timeframe (e.g., 1 second)
  • The "lump interval" is not based on fixed time windows but rather based on the timestamps of the latest error (see graphic below)
  • This makes error counter less sensitive to error bursts and improves the stability and user experience of those using services involving circuit breakers
lumping__interval

@abishanan-shopify abishanan-shopify force-pushed the lumping-interval branch 9 times, most recently from 77226a1 to 079c0d0 Compare May 21, 2025 19:49
@@ -26,6 +27,11 @@ def initialize(name, exceptions:, success_threshold:, error_threshold:,
@error_timeout = error_timeout
@exceptions = exceptions
@half_open_resource_timeout = half_open_resource_timeout
@lumping_interval = lumping_interval

if @lumping_interval >= @error_threshold_timeout && @lumping_interval > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the && here? I think what would make more sense is to prevent negative values for all parameters instead

if error_threshold_timeout_enabled
@errors.reject! { |err_time| err_time + @error_threshold_timeout < time }
end

@errors << time
if @lumping_interval == 0 || @errors.empty? || @errors.last < time - @lumping_interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the first condition necessary here? 🤔

fix notation for index for SlidingWindow and edge case for empty array
@conrad-mo conrad-mo marked this pull request as ready for review May 22, 2025 14:34
@abishanan-shopify abishanan-shopify changed the title add lumping_interval feature in Semian Circuit Breaker Add lumping_interval Feature for Semian Circuit Breakers May 22, 2025
Copy link
Contributor

@AbdulRahmanAlHamali AbdulRahmanAlHamali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@AbdulRahmanAlHamali AbdulRahmanAlHamali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can we add the new feature to the README?

@AbdulRahmanAlHamali
Copy link
Contributor

Regarding the visual, I love it! But I also feel like it's sending the wrong message

image

The feature is helping us not open the circuit too soon on a network blip. So I feel like the visuals should show the errors happening within the first second, followed by a period of calmness. And how the lumping_interval helped us not open the circuit in that situation

Copy link
Contributor

@AbdulRahmanAlHamali AbdulRahmanAlHamali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but suggested a clarification in the README

README.md Outdated
the circuit is half-open (supported for MySQL, Net::HTTP and Redis).
- **lumping_interval**. Interval for timeframe of errors to be lumped and recorded as one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **lumping_interval**. Interval for timeframe of errors to be lumped and recorded as one.
- **lumping_interval**. If provided, errors within this interval will be lumped and recorded as one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would also be good to mention that it's in seconds.

@abishanan-shopify
Copy link
Contributor Author

Regarding the visual, I love it! But I also feel like it's sending the wrong message

image The feature is helping us not open the circuit too soon on a network blip. So I feel like the visuals should show the errors happening within the first second, followed by a period of calmness. And how the lumping_interval helped us not open the circuit in that situation

Updated with a more relevant visual! 👍

Copy link

@Aguasvivas22 Aguasvivas22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AbdulRahmanAlHamali AbdulRahmanAlHamali merged commit ec34870 into main May 23, 2025
32 checks passed
@AbdulRahmanAlHamali AbdulRahmanAlHamali deleted the lumping-interval branch May 23, 2025 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants