Skip to content

Commit 367cbff

Browse files
author
Daniel Magliola
authored
Merge pull request #167 from prometheus/improved_documentation
Improve documentation for DirectFileStore
2 parents fd26a66 + 81222d3 commit 367cbff

File tree

2 files changed

+55
-31
lines changed

2 files changed

+55
-31
lines changed

README.md

Lines changed: 47 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ is stored in a global Data Store object, rather than in the metric objects thems
271271
(This "storage" is ephemeral, generally in-memory, it's not "long-term storage")
272272

273273
The main reason to do this is that different applications may have different requirements
274-
for their metrics storage. Application running in pre-fork servers (like Unicorn, for
274+
for their metrics storage. Applications running in pre-fork servers (like Unicorn, for
275275
example), require a shared store between all the processes, to be able to report coherent
276276
numbers. At the same time, other applications may not have this requirement but be very
277277
sensitive to performance, and would prefer instead a simpler, faster store.
@@ -311,7 +311,7 @@ whether you want to report the `SUM`, `MAX` or `MIN` value observed across all p
311311
For almost all other cases, you'd leave the default (`SUM`). More on this on the
312312
*Aggregation* section below.
313313

314-
Other custom stores may also accept extra parameters besides `:aggregation`. See the
314+
Custom stores may also accept extra parameters besides `:aggregation`. See the
315315
documentation of each store for more details.
316316

317317
### Built-in stores
@@ -326,26 +326,46 @@ There are 3 built-in stores, with different trade-offs:
326326
it's absolutely not thread safe.
327327
- **DirectFileStore**: Stores data in binary files, one file per process and per metric.
328328
This is generally the recommended store to use with pre-fork servers and other
329-
"multi-process" scenarios.
330-
331-
Each metric gets a file for each process, and manages its contents by storing keys and
332-
binary floats next to them, and updating the offsets of those Floats directly. When
333-
exporting metrics, it will find all the files that apply to each metric, read them,
334-
and aggregate them.
335-
336-
In order to do this, each Metric needs an `:aggregation` setting, specifying how
337-
to aggregate the multiple possible values we can get for each labelset. By default,
338-
they are `SUM`med, which is what most use-cases call for (counters and histograms,
339-
for example). However, for Gauges, it's possible to set `MAX` or `MIN` as aggregation,
340-
to get the highest/lowest value of all the processes / threads.
341-
342-
Even though this store saves data on disk, it's still much faster than would probably be
343-
expected, because the files are never actually `fsync`ed, so the store never blocks
344-
while waiting for disk. The kernel's page cache is incredibly efficient in this regard.
345-
346-
If in doubt, check the benchmark scripts described in the documentation for creating
347-
your own stores and run them in your particular runtime environment to make sure this
348-
provides adequate performance.
329+
"multi-process" scenarios. There are some important caveats to using this store, so
330+
please read on the section below.
331+
332+
### `DirectFileStore` caveats and things to keep in mind
333+
334+
Each metric gets a file for each process, and manages its contents by storing keys and
335+
binary floats next to them, and updating the offsets of those Floats directly. When
336+
exporting metrics, it will find all the files that apply to each metric, read them,
337+
and aggregate them.
338+
339+
**Aggregation of metrics**: Since there will be several files per metrics (one per process),
340+
these need to be aggregated to present a coherent view to Prometheus. Depending on your
341+
use case, you may need to control how this works. When using this store,
342+
each Metric allows you to specify an `:aggregation` setting, defining how
343+
to aggregate the multiple possible values we can get for each labelset. By default,
344+
Counters, Histograms and Summaries are `SUM`med, and Gauges report all their values (one
345+
for each process), tagged with a `pid` label. You can also select `SUM`, `MAX` or `MIN`
346+
for your gauges, depending on your use case.
347+
348+
**Memory Usage**: When scraped by Prometheus, this store will read all these files, get all
349+
the values and aggregate them. We have notice this can have a noticeable effect on memory
350+
usage for your app. We recommend you test this in a realistic usage scenario to make sure
351+
you won't hit any memory limits your app may have.
352+
353+
**Resetting your metrics on each run**: You should also make sure that the directory where
354+
you store your metric files (specified when initializing the `DirectFileStore`) is emptied
355+
when your app starts. Otherwise, each app run will continue exporting the metrics from the
356+
previous run.
357+
358+
**Large numbers of files**: Because there is an individual file per metric and per process
359+
(which is done to optimize for observation performance), you may end up with a large number
360+
of files. We don't currently have a solution for this problem, but we're working on it.
361+
362+
**Performance**: Even though this store saves data on disk, it's still much faster than
363+
would probably be expected, because the files are never actually `fsync`ed, so the store
364+
never blocks while waiting for disk. The kernel's page cache is incredibly efficient in
365+
this regard. If in doubt, check the benchmark scripts described in the documentation for
366+
creating your own stores and run them in your particular runtime environment to make sure
367+
this provides adequate performance.
368+
349369

350370
### Building your own store, and stores other than the built-in ones.
351371

@@ -364,16 +384,16 @@ If you are in a multi-process environment (such as pre-fork servers like Unicorn
364384
process will probably keep their own counters, which need to be aggregated when receiving
365385
a Prometheus scrape, to report coherent total numbers.
366386

367-
For Counters and Histograms (and quantile-less Summaries), this is simply a matter of
387+
For Counters, Histograms and quantile-less Summaries this is simply a matter of
368388
summing the values of each process.
369389

370390
For Gauges, however, this may not be the right thing to do, depending on what they're
371391
measuring. You might want to take the maximum or minimum value observed in any process,
372-
rather than the sum of all of them. You may also want to export each process's individual
373-
value.
392+
rather than the sum of all of them. By default, we export each process's individual
393+
value, with a `pid` label identifying each one.
374394

375-
In those cases, you should use the `store_settings` parameter when registering the
376-
metric, to specify an `:aggregation` setting.
395+
If these defaults don't work for your use case, you should use the `store_settings`
396+
parameter when registering the metric, to specify an `:aggregation` setting.
377397

378398
```ruby
379399
free_disk_space = registry.gauge(:free_disk_space_bytes,

lib/prometheus/client/data_stores/direct_file_store.rb

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,14 @@ module DataStores
1818
#
1919
# In order to do this, each Metric needs an `:aggregation` setting, specifying how
2020
# to aggregate the multiple possible values we can get for each labelset. By default,
21-
# they are `SUM`med, which is what most use cases call for (counters and histograms,
22-
# for example).
23-
# However, for Gauges, it's possible to set `MAX` or `MIN` as aggregation, to get
24-
# the highest value of all the processes / threads.
21+
# Counters, Histograms and Summaries get `SUM`med, and Gauges will report `ALL`
22+
# values, tagging each one with a `pid` label.
23+
# For Gauges, it's also possible to set `SUM`, MAX` or `MIN` as aggregation, to get
24+
# the highest / lowest value / or the sum of all the processes / threads.
25+
#
26+
# Before using this Store, please read the "`DirectFileStore` caveats and things to
27+
# keep in mind" section of the main README in this repository. It includes a number
28+
# of important things to keep in mind.
2529

2630
class DirectFileStore
2731
class InvalidStoreSettingsError < StandardError; end

0 commit comments

Comments
 (0)