fix: Recover periodic metric readers after forking #1823

chrisholmes · 2025-03-06T10:11:08Z

This PR provides a fix to PeriodicMetricReader so that it continues working after a Ruby process is forked. This problem currently surfaces itself on applications that use forking webservers.

When forking a Ruby process only the thread creating the child process is coped across from the parent process, which means that background threads need to be created for monitoring tools to continue working. This means that, before this fix, PeriodicMetricReader would not work on the child process as the background thread is not copied.

The fix is implemented by overriding the Process._fork method provided in Ruby versions 3.1+. Overriding this method allows the authors of monitoring tools to introduce callbacks on fork events.

An alternative approach, required prior to 3.1, would require clients to call hooks directly from their forking webserver's configuration such as use on_worker_boot in puma.rb. Overriding Process._fork removes the need for this boilerplate.

When resetting the PeriodicMetricReader, the after_fork hook will first read and discard metrics that would have been published by the parent process. This is in order to avoid duplicate publishing of metrics that will be handled by the parent process.

The parent process' PeriodicMetricReader is left running after forking as it is assumed that clients will want to publish metrics from it still.

Add a little bit of guarding to help surface errors in future should problems cause the suite to fail

xuan-cao-swi · 2025-03-13T00:36:48Z

Thanks @chrisholmes , just curious if it's possible to do something like reset_on_fork instead of prepend Process.

chrisholmes · 2025-03-13T16:37:03Z

Thanks @chrisholmes , just curious if it's possible to do something like reset_on_fork instead of prepend Process.

Hi @xuan-cao-swi, AIUI, the BSP implementation works well because it is able to recover its thread when it receives a new span.

PeriodicMetricReader itself doesn't receive any signals so wouldn't be able to recover itself this way. There is a possibility to use the recording of metrics to recover the PeriodicMetricReader possibly by adding a patch to reset the metric readers in MetricStream [here](https://github.com/open-telemetry/opentelemetry-ruby/blob/main/metrics_sdk/lib/opentelemetry/sdk/metrics/state/metric_stream.rb#L55). This would add a bit of coupling, but maybe this is preferable to prepend`

A consequence of this would be forks would not publish any metrics until a value is recorded. This would probably be fine for most circumstances. As an example, it might be an issue with gauges that you would expect to publish continuously. Though from my own experience, I can't think of a use for a metrics derived from a parent process.

What's you view given this?

xuan-cao-swi · 2025-03-18T15:21:59Z

Hi @chrisholmes , thanks for the explanation. My thinking is that as long as the thread (in child) keep running and export data for every period of time, then there is no need to recover it. Please correct me if I am misunderstand something or wrong.

chrisholmes · 2025-03-21T14:12:32Z

Hi @chrisholmes , thanks for the explanation. My thinking is that as long as the thread (in child) keep running and export data for every period of time, then there is no need to recover it. Please correct me if I am misunderstand something or wrong.

hi @xuan-cao-swi , I'll try to explain my understanding:

Recovery is required because when we fork only the thread that performed the fork is copied from the parent onto the child process. This means that any background thread behaviour, such as PeriodicMetricReader, is stalled until the thread is re-created.

We have a few options for recovery:

Override Process._fork, which was introduced for this purpose
Ask developers to add a recovery hook into their codebase (such as in on_worker_boot in puma)
Use a trigger, such as the creation of a metric or the recording of a metric value, to recreate the thread

For options 1 & 2, there would be immediate recovery. For option 3, there would be a delay in recovery until the triggering event.

One scenario, where immediate recovery is desired, is if a developer creates gauges prior to forking. For option 3, these gauges will not be reported by the child process until the thread is recovered by the trigger, whose immediacy is not guaranteed. Whereas, a user of the metrics would probably expect metrics to be reported continuously without gaps.

Does this scenario help?

xuan-cao-swi · 2025-04-11T19:46:46Z

Hi @chrisholmes, thanks for the explanation—you're right that there's no recovery point like BSP's on_finish for spans.

Option 3 would delay metric export, but could the recorded timestamps still help correlate metrics on the representation side? As you mentioned, recovery from the metric_stream record could work. That said, I’m not a fan of checking whether the forked periodic reader needs to start its thread on every record—though BSP does something similar per span finish.

The main concern with Option 1 is the monkey patching of the Process module. Option 2 seems more reasonable, but we’re aiming to minimize direct user interaction with the SDK. We’ll bring this up in the SIG meeting next week.

chrisholmes changed the title ~~Recover periodic metric readers from forking~~ fix: Recover periodic metric readers from forking Mar 6, 2025

chrisholmes added 4 commits March 6, 2025 10:12

feat: hook into Process forking to reset metric readers

71cecc7

fix: recover PeriodicMetricReader in forked processes

851f2bf

tests: guard against errors in process forking

67a6229

Add a little bit of guarding to help surface errors in future should problems cause the suite to fail

feat: remove the before_hook as it is unused

dedabe7

chrisholmes force-pushed the recover-periodic-metric-readers-from-forking branch 2 times, most recently from 312d1df to 213d166 Compare March 7, 2025 10:42

skip forking tests on windows and java platforms

857b773

chrisholmes force-pushed the recover-periodic-metric-readers-from-forking branch from 213d166 to 857b773 Compare March 7, 2025 10:55

chrisholmes changed the title ~~fix: Recover periodic metric readers from forking~~ fix: Recover periodic metric readers after forking Mar 7, 2025

chrisholmes marked this pull request as ready for review March 7, 2025 12:22

chrisholmes requested review from fbogsany, mwear, robertlaurin, dazuma, ericmustin, arielvalentin, ahayworth, plantfansam, robbkidd and kaylareopelle as code owners March 7, 2025 12:22

chrisholmes mentioned this pull request Mar 7, 2025

Periodic Metric exporter unable to collect data points on Passenger-based servers #1800

Open

xuan-cao-swi mentioned this pull request Apr 15, 2025

Metrics feedback #1662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Recover periodic metric readers after forking #1823

fix: Recover periodic metric readers after forking #1823

chrisholmes commented Mar 6, 2025 •

edited

Loading

xuan-cao-swi commented Mar 13, 2025

chrisholmes commented Mar 13, 2025

xuan-cao-swi commented Mar 18, 2025

chrisholmes commented Mar 21, 2025

xuan-cao-swi commented Apr 11, 2025

fix: Recover periodic metric readers after forking #1823

Are you sure you want to change the base?

fix: Recover periodic metric readers after forking #1823

Conversation

chrisholmes commented Mar 6, 2025 • edited Loading

xuan-cao-swi commented Mar 13, 2025

chrisholmes commented Mar 13, 2025

xuan-cao-swi commented Mar 18, 2025

chrisholmes commented Mar 21, 2025

xuan-cao-swi commented Apr 11, 2025

chrisholmes commented Mar 6, 2025 •

edited

Loading