-
Notifications
You must be signed in to change notification settings - Fork 21
Downtime on a removed object are never closed. #910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here is my proposed solution: Whenever a object gets removed, all the currently active downtimes get closed as well. |
I am willing to implement the change myself, but I'd like to coordinate with you first, so my proposed solution is the right approach. Since the downtimes are dropped afterwards from the |
There is no such thing as deactivate downtime when a new version of the configuration is deployed via Icinga Director. When the host the downtimes belong to does not exist in the newly deployed configuration, then the downtimes become dangling objects that Icinga 2 cannot map to their respective host/service, and they will not even survive the config validation. However, since they are created with the
If you don't mind wasting time on something that can't be fixed, then go ahead, but bear in mind that this is simply impossible to fix right now. Once the corresponding downtime host/service object is gone, the downtime object itself becomes pretty much useless and is not even a valid object anymore. If you don't want such strange history views, I suggest to manually clear the downtimes before removing the host/service object via Icinga Director. |
I already wasted that time and I already implemented my solution. It seems to work for mariadb/mysql, but I need to test it for pgsql and icingadb as well. But I'll probably have to do a second take to make it completely correct.
I am well aware of that. That's the problem we are currently facing. It happens often, but randomly enough that cleaning it up manually for all objects that may be affected by it is not feasible. Mostly we notice that once the SLA uptime report is generated and a host is completely out of bounds, as the downtime was not handled correctly. If we trigger a OnDowntimeRemoved before it gets erased from disk, that solution already works for us. |
The logic I am thinking of is this:
Lmk if I have any holes in my understanding here, but from what I can observe rn, this is whats happening. |
I doubt this, as (IIRC) Icinga DB syncs the correct state every time. |
Doing a quick test, it does not look like it did close it correctly. I added to both objects a downtime of a few minutes and deleted the object on the left before the downtime could run out. As you can see, the object on the left has the end of the downtime in the history, while the object on the left does not. The linked PR above resolves that issue by handling the ending of the downtime during the objects deactivation. |
@Al2Klimov Hi, are there any updates on this issue? |
Hi @w1ll-i-code, we had an internal discussion about how we can fix this and came to the conclusion that this can only be fixed by Icinga DB (Go) as there is no way in Icinga 2 as of today to fix this as noted in Icinga/icinga2#10311 (comment). We will try to fake the corresponding end event or do something else when removing the downtime configuration from the database, so I will move this to the Icinga DB repo and close Icinga/icinga2#10311 if you don't mind. |
I'm sorry, I really did not understand what you were talking about in that comment. But as long as it gets fixed, it's fine my me. |
You didn't give any reaction to that comment, so I automatically assumed that you have understood why it's not possible to fix this on the Icinga 2 side. But generally, if I haven't explained something well enough, just say something and I'll be happy to explain it in more detail. Correct me if I'm wrong, but the problem you're experiencing looks like this: You've a host object named H1 created via the Icinga Director, which essentially uses the Now, let's go back to your issue, if your host Instead, once it has taken over and the old process is terminated, it will dump its freshly loaded configuration into Redis, which will be processed by the Icinga DB (Go) daemon. The Icinga DB (Go) daemon then inspects the configuration received from Redis and those from the database, and removes any objects that are not now part of the objects read from Redis. This means it won't receive any events for H1 or its downtimes from Redis, so it removes them from the database. While performing these checks, we can hook up there and easily check if it's a downtime configuration that should be deleted from the database, we can manually try to fake the corresponding end/cancelled events and insert them into the history tables with #913. This will ensure that your history view in Icinga DB Web won't show any unclosed downtimes or whatsoever when you recreate the exact same host I hope it's now clear what I was talking about and what this is all about. |
Yeah, sorry about that. I was planning to dig deeper into that and then it fell completely off my radar, because I forgot to put it into my todos. I think I understand now, thanks. |
Describe the bug
If a object with a Downtime gets disabled (even just temporary) the end of the associated Downtime is never written out to the IDO / IcingaDB.
To Reproduce
4.Redeploy the new version
Expected behavior
I would expect the Downtime to be terminated once the object is deactivated (The actual_end_time set to the current time). But since the downtime is dropped without ever setting this field, the object looks in the reports as if it where in a constant downtime. which does not correspond to the internal state of icinga2.
Screenshots
The text was updated successfully, but these errors were encountered: