-
Notifications
You must be signed in to change notification settings - Fork 256
fix: Recover periodic metric readers after forking #1823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Recover periodic metric readers after forking #1823
Conversation
Add a little bit of guarding to help surface errors in future should problems cause the suite to fail
312d1df
to
213d166
Compare
213d166
to
857b773
Compare
Thanks @chrisholmes , just curious if it's possible to do something like reset_on_fork instead of prepend |
Hi @xuan-cao-swi, AIUI, the BSP implementation works well because it is able to recover its thread when it receives a new span.
A consequence of this would be forks would not publish any metrics until a value is recorded. This would probably be fine for most circumstances. As an example, it might be an issue with gauges that you would expect to publish continuously. Though from my own experience, I can't think of a use for a metrics derived from a parent process. What's you view given this? |
Hi @chrisholmes , thanks for the explanation. My thinking is that as long as the thread (in child) keep running and export data for every period of time, then there is no need to recover it. Please correct me if I am misunderstand something or wrong. |
hi @xuan-cao-swi , I'll try to explain my understanding: Recovery is required because when we fork only the thread that performed the fork is copied from the parent onto the child process. This means that any background thread behaviour, such as PeriodicMetricReader, is stalled until the thread is re-created. We have a few options for recovery:
For options 1 & 2, there would be immediate recovery. For option 3, there would be a delay in recovery until the triggering event. One scenario, where immediate recovery is desired, is if a developer creates gauges prior to forking. For option 3, these gauges will not be reported by the child process until the thread is recovered by the trigger, whose immediacy is not guaranteed. Whereas, a user of the metrics would probably expect metrics to be reported continuously without gaps. Does this scenario help? |
Hi @chrisholmes, thanks for the explanation—you're right that there's no recovery point like BSP's Option 3 would delay metric export, but could the recorded timestamps still help correlate metrics on the representation side? As you mentioned, recovery from the The main concern with Option 1 is the monkey patching of the |
This PR provides a fix to
PeriodicMetricReader
so that it continues working after a Ruby process is forked. This problem currently surfaces itself on applications that use forking webservers.When forking a Ruby process only the thread creating the child process is coped across from the parent process, which means that background threads need to be created for monitoring tools to continue working. This means that, before this fix,
PeriodicMetricReader
would not work on the child process as the background thread is not copied.The fix is implemented by overriding the
Process._fork
method provided in Ruby versions 3.1+. Overriding this method allows the authors of monitoring tools to introduce callbacks on fork events.An alternative approach, required prior to 3.1, would require clients to call hooks directly from their forking webserver's configuration such as use
on_worker_boot
inpuma.rb
. OverridingProcess._fork
removes the need for this boilerplate.When resetting the
PeriodicMetricReader
, theafter_fork
hook will first read and discard metrics that would have been published by the parent process. This is in order to avoid duplicate publishing of metrics that will be handled by the parent process.The parent process'
PeriodicMetricReader
is left running after forking as it is assumed that clients will want to publish metrics from it still.