Add :clean_dir option and clean_pid method to DirectFileStore #173

stefansundin · 2019-12-28T02:08:32Z

Hi again,

As hinted in my other PR (#172), here's another one.

Currently, it is a bit of a problem that the gem doesn't help with keeping the directory clean. I'm sure everyone who attempts to use DirectFileStore will soon realize that metrics from old processes hang around when collecting the metrics. So I wouldn't be surprised if many people have worked around this by simply cleaning the directory first.

So the first change, adding a new :clean_dir option, will let me remove this piece of code from my app: stefansundin/github-activity@936d4d6#diff-d21913acbe4d1528a62c64cdec6d43d7R11-R14

# Clean up old metric files
Dir["#{app_path}/tmp/prometheus/*.bin"].each do |file_path|
  File.unlink(file_path)
end

I was even thinking of making this the default, but I'm afraid to accidentally delete someone's files.. even though it should be unlikely. Is there any use case of possibly initializing two DirectFileStore's that use the same directory? I feel like that may be a bad idea for other reasons..

The second piece, the new clean_pid method, will let me simplify this part of my app: stefansundin/github-activity@936d4d6#diff-3f0c524806c4374c99a509efaffd3e8aR29-R32

So I can change this:

on_worker_shutdown do |index|
  # Delete stale metric files on worker shutdown
  Dir["#{app_path}/tmp/prometheus/*___#{Process.pid}.bin"].each do |file_path|
    File.unlink(file_path)
  end
end

to:

on_worker_shutdown do |index|
  # Delete stale metric files on worker shutdown
  Prometheus::Client.config.data_store.clean_pid(Process.pid)
end

Thanks!

stefansundin · 2019-12-28T02:11:21Z

lib/prometheus/client/data_stores/direct_file_store.rb

@@ -68,6 +80,7 @@ def validate_metric_settings(metric_settings)
        end

        class MetricStore
+          FILENAME_PREFIX = "prometheus_#{ Prometheus::Client::VERSION }"


I changed this from metric to prometheus to reduce the chance of deleting unrelated files.. thoughts?

stefansundin · 2019-12-28T02:11:44Z

lib/prometheus/client/data_stores/direct_file_store.rb

          @store_settings = { dir: dir }
          FileUtils.mkdir_p(dir)
+
+          if clean_dir
+            Dir.glob(File.join(dir, "#{ MetricStore::FILENAME_PREFIX }_*___*.bin")).each do |file_path|


This won't delete files from old versions of the gem.. thoughts?

coveralls · 2019-12-28T02:13:07Z

Coverage remained the same at 100.0% when pulling 049948b on stefansundin:DirectFileStore-unlink into 7129453 on prometheus:master.

coveralls · 2019-12-28T02:13:07Z

Coverage remained the same at 100.0% when pulling 049948b on stefansundin:DirectFileStore-unlink into 7129453 on prometheus:master.

coveralls · 2019-12-28T02:13:07Z

Coverage remained the same at 100.0% when pulling 8537f5a on stefansundin:DirectFileStore-unlink into 7129453 on prometheus:master.

Signed-off-by: Stefan Sundin <stefan@stefansundin.com>

dmagliola · 2020-01-07T14:46:17Z

Hi @stefansundin, thank you for this PR!

This looks quite good, and it seems to solve Issue #109. (@Sinjo RTYI)
I have one main concern, however, which is multiple processes running this cleanup code on app start, and racing to delete each other's files...

You mention you are replacing some currently existing code in your app that deletes these files...
Where is that code in your app, currently?
Does it run before Unicorn forks?
If not, does it run on all the client processes?
I find that last idea a bit scary, in case the first processes to start have already created metrics files by the time the following ones start deleting them...

So i'm not sure whether this should be an option in the constructor, and instead it should be a static method that you call in your Unicorn startup script, before forking.

We should also definitely add some documentation on this, things like your on_worker_shutdown example.

Thoughts?

stefansundin · 2020-01-07T16:47:47Z

You mention you are replacing some currently existing code in your app that deletes these files...
Where is that code in your app, currently?

The relevant samples are linked in the PR description.

Does it run before Unicorn forks?

My puma config is using preload_app!, so my initializers run before puma forks. If I remove preload_app! then prometheus is initialized multiple times and may cause undefined behavior. I haven't really tried to see what problems may arise. Here is my puma config file: https://github.com/stefansundin/github-activity/blob/17ab554695411e702f664feacf1cc2d8494f0880/config/puma.rb

So i'm not sure whether this should be an option in the constructor, and instead it should be a static method that you call in your Unicorn startup script, before forking.

I don't think it matters that much. You can also conditionally set this new argument the same way you would conditionally call a new method.

I actually think we should consider making this the default behavior. As I noted above, I think everyone with forking processes will encounter this issue. As I wrote above, my hesitation is:

I was even thinking of making this the default, but I'm afraid to accidentally delete someone's files.. even though it should be unlikely. Is there any use case of possibly initializing two DirectFileStore's that use the same directory? I feel like that may be a bad idea for other reasons..

So if there is no use case to do this, current or future, then I would at least consider making it the default. You guys are the experts so I will let you make the call. :)

dmagliola · 2020-01-28T17:05:25Z

@stefansundin After discussing this quite a bit, I don't think we should have this change.
The reason is that it's quite dangerous / quite easy for someone to shoot themselves in the foot if they don't make sure that this has definitely ran before forking.
As such, it's a bit of an "advanced" usage mode.

Having that flag in the constructor also feels a bit awkward as an interface.
I'd prefer to have a method you call on the store to clean the directory, that people can call from some hook that definitely runs before forking.

Having looked into that option, since we can't do it automatically because you may be calling post-fork, and thus since you still have to call a method manually on startup, we don't seem to be saving users much work. There's not that much difference in my opinion between having to call Store.clean_dir and the 3-line snippet of glob + unlink.

For this same reason, we can't make this the default, as much as i'd like to. It's too easy to inadvertently run after forking.

What I think we should do instead is document this in the README. Let's have a section where we propose "this is some code you can use to make sure you start with an empty dir, and make sure you run that before forking!".

Would you prefer if I close this PR, and we tackle the documentation when we have time, or would you like to write that documentation section?

As for your worker shutdown code, there might be a pretty big issue with it.
I may be wrong, because I don't know Puma well enough to understand when exactly that gets called.
If a certain worker gets recycled (like Unicorn allows you to do after a certain number of requests, for example), will that hook run?
Or does it only run when the whole server is being shutdown?

If that runs when a worker ends but the whole server keep running... That code can be quite problematic. Unless all of your metrics are tagged with their pid (in which case you get the result you want, the time series disappears), you will see counters going down if you delete the files for some processes but not others. This works in your current code because, if i'm reading that correctly, you only have one gauge. But I really don't think we should encourage that behaviour, it's very easy to get wrong.

If the hook only runs on server shutdown, however, that's safe, but i'm not sure I understand what's the use case for each worker to clean after itself, if the whole thing gets wiped after the server restarts.

stefansundin · 2020-02-03T03:38:27Z

@dmagliola Ok. I think it would be better if you would update the README and other documentation. If you do it then there's less need to go back-and-forth between the two of us.

As for the worker shutdown code, yes there are some gotchas. In my opinion, people already have to deal with most of these gotchas in a multi-server environment, where servers come and go very frequently.

Feel free to close this after you've updated the docs.

dmagliola · 2020-02-24T17:51:45Z

Hi Stefan,
I'm sorry for my slow reply.

I agree with you. I'm not sure what's proper etiquette in terms of taking over other people's PRs, so I was suggesting some changes.
I'll close this and make those changes in another branch, which is easier.
Thank you for bringing this up!
Daniel

@stefansundin

Add an example on how to solve the "files are already there on startup" problem, based on @stefansundin 's proposal in PR #173

@stefansundin

Add an example on how to solve the "files are already there on startup" problem, based on @stefansundin 's proposal in PR #173 Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

stefansundin force-pushed the DirectFileStore-unlink branch from 831dda8 to 049948b Compare December 28, 2019 02:09

stefansundin commented Dec 28, 2019

View reviewed changes

Add :clean_dir option and clean_pid method to DirectFileStore.

8537f5a

Signed-off-by: Stefan Sundin <stefan@stefansundin.com>

stefansundin force-pushed the DirectFileStore-unlink branch from 049948b to 8537f5a Compare December 28, 2019 02:25

stefansundin mentioned this pull request Dec 28, 2019

Add :most_recent aggregation to DirectFileStore #172

Merged

Envek mentioned this pull request Jan 29, 2020

Support store_settings option yabeda-rb/yabeda-prometheus#10

Closed

dmagliola closed this Feb 24, 2020

dmagliola pushed a commit that referenced this pull request Feb 24, 2020

Document how to clean up work dir for DirectFileStore on startup

025b390

Add an example on how to solve the "files are already there on startup" problem, based on @stefansundin 's proposal in PR #173

dmagliola mentioned this pull request Feb 24, 2020

Document how to clean up work dir for DirectFileStore on startup #183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add :clean_dir option and clean_pid method to DirectFileStore #173

Add :clean_dir option and clean_pid method to DirectFileStore #173

stefansundin commented Dec 28, 2019

stefansundin Dec 28, 2019

stefansundin Dec 28, 2019

coveralls commented Dec 28, 2019

coveralls commented Dec 28, 2019

coveralls commented Dec 28, 2019 •

edited

Loading

dmagliola commented Jan 7, 2020

stefansundin commented Jan 7, 2020

dmagliola commented Jan 28, 2020

stefansundin commented Feb 3, 2020

dmagliola commented Feb 24, 2020

Add :clean_dir option and clean_pid method to DirectFileStore #173

Add :clean_dir option and clean_pid method to DirectFileStore #173

Conversation

stefansundin commented Dec 28, 2019

stefansundin Dec 28, 2019

Choose a reason for hiding this comment

stefansundin Dec 28, 2019

Choose a reason for hiding this comment

coveralls commented Dec 28, 2019

coveralls commented Dec 28, 2019

coveralls commented Dec 28, 2019 • edited Loading

dmagliola commented Jan 7, 2020

stefansundin commented Jan 7, 2020

dmagliola commented Jan 28, 2020

stefansundin commented Feb 3, 2020

dmagliola commented Feb 24, 2020

coveralls commented Dec 28, 2019 •

edited

Loading