Downtime on a removed object are never closed. #910

w1ll-i-code · 2025-01-15T16:17:58Z

Describe the bug

If a object with a Downtime gets disabled (even just temporary) the end of the associated Downtime is never written out to the IDO / IcingaDB.

To Reproduce

Create a host in the director and deploy it.
Create a downtime on the host
Use the director to roll back to an older version
4.Redeploy the new version

Expected behavior

I would expect the Downtime to be terminated once the object is deactivated (The actual_end_time set to the current time). But since the downtime is dropped without ever setting this field, the object looks in the reports as if it where in a constant downtime. which does not correspond to the internal state of icinga2.

Screenshots

w1ll-i-code · 2025-01-15T16:19:20Z

Here is my proposed solution: Whenever a object gets removed, all the currently active downtimes get closed as well.

w1ll-i-code · 2025-01-16T07:41:14Z

I am willing to implement the change myself, but I'd like to coordinate with you first, so my proposed solution is the right approach. Since the downtimes are dropped afterwards from the icinga2.state file, this seems like the most reasonable solution to me. I'd prefer it the downtimes would persist through the deploys, but that'd be a more invasive change I don't feel comfortable with implementing myself.

yhabteab · 2025-01-20T12:47:47Z

I would expect the Downtime to be terminated once the object is deactivated (The actual_end_time set to the current time)

There is no such thing as deactivate downtime when a new version of the configuration is deployed via Icinga Director. When the host the downtimes belong to does not exist in the newly deployed configuration, then the downtimes become dangling objects that Icinga 2 cannot map to their respective host/service, and they will not even survive the config validation. However, since they are created with the ignore_on_errror flag, they will not stop Icinga 2 from loading the other configurations and once Icinga 2 is done loading/validating the other configuration, it will simply erase them from disk.

Here is my proposed solution: Whenever a object gets removed, all the currently active downtimes get closed as well.

If you don't mind wasting time on something that can't be fixed, then go ahead, but bear in mind that this is simply impossible to fix right now. Once the corresponding downtime host/service object is gone, the downtime object itself becomes pretty much useless and is not even a valid object anymore. If you don't want such strange history views, I suggest to manually clear the downtimes before removing the host/service object via Icinga Director.

w1ll-i-code · 2025-01-20T15:18:30Z

If you don't mind wasting time on something that can't be fixed, then go ahead, but bear in mind that this is simply impossible to fix right now.

I already wasted that time and I already implemented my solution. It seems to work for mariadb/mysql, but I need to test it for pgsql and icingadb as well. But I'll probably have to do a second take to make it completely correct.

it will simply erase them from disk.

I am well aware of that. That's the problem we are currently facing. It happens often, but randomly enough that cleaning it up manually for all objects that may be affected by it is not feasible. Mostly we notice that once the SLA uptime report is generated and a host is completely out of bounds, as the downtime was not handled correctly. If we trigger a OnDowntimeRemoved before it gets erased from disk, that solution already works for us.

w1ll-i-code · 2025-01-20T15:24:08Z

The logic I am thinking of is this:

The configuration for the object gets removed, it is no longer active.
The object still exists in the icinga2.state file together with the downtime.
The config gets loaded and the object gets set to inactive.
The inactive object gets synced to the IDO
1. Here I propose to also trigger the OnDowntimeRemoved hook for each downtime associated with the host.
The host and downtime are now inactive and will not get synced to disk in the icinga2.state file anymore. (Or just the host, not sure, but the effect is the same.)

Lmk if I have any holes in my understanding here, but from what I can observe rn, this is whats happening.

Al2Klimov · 2025-01-29T16:56:00Z

If a object with a Downtime gets disabled (even just temporary) the end of the associated Downtime is never written out to the IDO / IcingaDB.

I doubt this, as (IIRC) Icinga DB syncs the correct state every time.

w1ll-i-code · 2025-01-30T14:11:57Z

Doing a quick test, it does not look like it did close it correctly. I added to both objects a downtime of a few minutes and deleted the object on the left before the downtime could run out. As you can see, the object on the left has the end of the downtime in the history, while the object on the left does not. The linked PR above resolves that issue by handling the ending of the downtime during the objects deactivation.

w1ll-i-code · 2025-02-10T07:50:42Z

@Al2Klimov Hi, are there any updates on this issue?

yhabteab · 2025-03-27T10:00:09Z

Hi @w1ll-i-code, we had an internal discussion about how we can fix this and came to the conclusion that this can only be fixed by Icinga DB (Go) as there is no way in Icinga 2 as of today to fix this as noted in Icinga/icinga2#10311 (comment). We will try to fake the corresponding end event or do something else when removing the downtime configuration from the database, so I will move this to the Icinga DB repo and close Icinga/icinga2#10311 if you don't mind.

w1ll-i-code · 2025-03-27T21:02:27Z

I'm sorry, I really did not understand what you were talking about in that comment. But as long as it gets fixed, it's fine my me.

yhabteab · 2025-03-28T16:48:31Z

I'm sorry, I really did not understand what you were talking about in that comment.

You didn't give any reaction to that comment, so I automatically assumed that you have understood why it's not possible to fix this on the Icinga 2 side. But generally, if I haven't explained something well enough, just say something and I'll be happy to explain it in more detail.

Correct me if I'm wrong, but the problem you're experiencing looks like this:

You've a host object named H1 created via the Icinga Director, which essentially uses the /v1/config/packages endpoint internally, meaning that every time you trigger an Icinga Director deployment, it dumps all the Icinga 2 config from the Director DB into Icinga 2 via the aforementioned API endpoint. As you will see from the linked documentation, a request to this endpoint will automatically trigger an Icinga 2 daemon reload unless otherwise specified via the reload: false parameter, which Icinga Director doesn't use. What does that mean for Icinga 2? Every time Icinga 2 is reloaded, it will always start a new process with its own config that might not be the same as the ones used in the old process.

Now, let's go back to your issue, if your host H1 was in downtime before you deleted it from Icinga Director and triggered a deployment, Icinga Director will dump your config without H1 and its downtimes because they have just been deleted. The new process that is going to take over after the reload will now have no knowledge that this host and its downtimes ever existed, so it can't trigger any events for these objects, allowing them to be properly deleted by IDO and Icinga DB.

Instead, once it has taken over and the old process is terminated, it will dump its freshly loaded configuration into Redis, which will be processed by the Icinga DB (Go) daemon. The Icinga DB (Go) daemon then inspects the configuration received from Redis and those from the database, and removes any objects that are not now part of the objects read from Redis. This means it won't receive any events for H1 or its downtimes from Redis, so it removes them from the database. While performing these checks, we can hook up there and easily check if it's a downtime configuration that should be deleted from the database, we can manually try to fake the corresponding end/cancelled events and insert them into the history tables with #913. This will ensure that your history view in Icinga DB Web won't show any unclosed downtimes or whatsoever when you recreate the exact same host H1 again. However, this will only resolve the issue for Icinga DB Web, there's nothing we can do for IDO.

I hope it's now clear what I was talking about and what this is all about.

w1ll-i-code · 2025-03-31T08:18:14Z

Yeah, sorry about that. I was planning to dig deeper into that and then it fell completely off my radar, because I forgot to put it into my todos. I think I understand now, thanks.

w1ll-i-code mentioned this issue Jan 21, 2025

Remove downtimes when objects are deactivated. Icinga/icinga2#10311

Closed

yhabteab transferred this issue from Icinga/icinga2 Mar 27, 2025

yhabteab linked a pull request Mar 28, 2025 that will close this issue

Downtimes: mark their histories as cancelled when removed from conf file #913

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downtime on a removed object are never closed. #910

Downtime on a removed object are never closed. #910

w1ll-i-code commented Jan 15, 2025

w1ll-i-code commented Jan 15, 2025

w1ll-i-code commented Jan 16, 2025

yhabteab commented Jan 20, 2025

w1ll-i-code commented Jan 20, 2025

w1ll-i-code commented Jan 20, 2025

Al2Klimov commented Jan 29, 2025

w1ll-i-code commented Jan 30, 2025

w1ll-i-code commented Feb 10, 2025 •

edited

Loading

yhabteab commented Mar 27, 2025

w1ll-i-code commented Mar 27, 2025

yhabteab commented Mar 28, 2025

w1ll-i-code commented Mar 31, 2025

Downtime on a removed object are never closed. #910

Downtime on a removed object are never closed. #910

Comments

w1ll-i-code commented Jan 15, 2025

Describe the bug

To Reproduce

Expected behavior

Screenshots

w1ll-i-code commented Jan 15, 2025

w1ll-i-code commented Jan 16, 2025

yhabteab commented Jan 20, 2025

w1ll-i-code commented Jan 20, 2025

w1ll-i-code commented Jan 20, 2025

Al2Klimov commented Jan 29, 2025

w1ll-i-code commented Jan 30, 2025

w1ll-i-code commented Feb 10, 2025 • edited Loading

yhabteab commented Mar 27, 2025

w1ll-i-code commented Mar 27, 2025

yhabteab commented Mar 28, 2025

w1ll-i-code commented Mar 31, 2025

w1ll-i-code commented Feb 10, 2025 •

edited

Loading