-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Is your feature request related to a problem? Please describe.
Yes. When running pgwatch in a distributed setup — with multiple agents collecting metrics from different VMs and pushing data into a shared PostgreSQL sink — the maintainUniqueSources() function puts noticeable load on the sink database.
The recursive query that runs every 24h causes locking and contention, especially when there are tens of thousands of monitored sources.
We discussed a related topic in #786, but this one focuses specifically on optimizing maintainUniqueSources().
Describe the solution you'd like
It would be great to have the option to:
Run maintenance tasks like maintainUniqueSources() and deleteOldPartitions() only on one designated node, e.g. via
--admin-mode or --skip-maintenance-tasks
Alternatively, make these tasks callable via subcommands, for example:
pgwatch maintain or pgwatch cleanup,
so they can be triggered manually or by a cronjob from a central controller node.
(Optional) Add a command like
pgwatch cleanup --stale-sources --max-age=7d
to remove sources that haven’t sent data recently, keeping the sink database clean.
Describe alternatives you've considered
We temporarily patched our internal build to reduce the frequency and impact of maintainUniqueSources().
It works for now but isn’t ideal — long-term, having this built into pgwatch would make distributed deployments much cleaner and more efficient.
Internal references:
Additional context
We currently manage over 52,000 unique dbname definitions, so running a single centralized pgwatch instance isn’t feasible due to metric lag.
Distributing the agents solved that issue but shifted maintenance overhead to the sink cluster.
Being able to delegate or externalize maintenance tasks would significantly improve scalability and reduce contention in large deployments.