-
Notifications
You must be signed in to change notification settings - Fork 585
*#Stop(): wait for own Checkable#ProcessCheckResult() to finish🏁 #10397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Well, that's indeed necessary.
Interesting that I've said something about a comment I haven't even seen until just now. |
ce0d035
to
5b1c4b5
Compare
@julianbrost Very funny. Fact is, on Thursday you
So I've done that. So far:
Finally, CheckResultProducer::Ptr just has to be passed through. |
a0ad767
to
a06b01a
Compare
ea36975
to
a06b01a
Compare
72a171d
to
4873a34
Compare
@Al2Klimov please don't be mad at me :)! I took the initiative to expand the PR description slightly based on the analysis made together with @jschmidt-icinga, so everyone can follow here without having to browse through the changed files. Though, you can gladly merge all the commits suffixed with the |
I think in principle this solves the problem well, but I agree with @yhabteab, kindly squash it down to a single commit describing the functional change (the PR title is fine), as none of the individual changes make sense without all the others. That would also make it much easier to review on GH. |
4873a34
to
f0de2aa
Compare
Are you sure? I've squashed the low-hanging fruits into 170e7cf – and see how big this one commit became! |
Yes, I'm sure. There's big commits and there's commits with trivial one-line changes in lots of files. You could split off adding the new Classes, maybe adding the final check, but at least all the function signature changes should be in a single commit. What's more, commits should reflect individual consistent states of the code, i.e., each of them, when checked out, should at least still compile and the program should still work. Granted this is less important when on a branch that gets merged, but its still a good idea to maintain some discipline. |
Personally, I'd maybe also use the Then again,
Personally, I think it's reasonable to wrap them in a lambda and add the overload for Alternatively you could do something like this: using ExternalCommandCallback =
std::variant<void (*)(const CheckResultProducer::Ptr &, double, const std::vector<String> &),
void (*)(double, const std::vector<String> &)>; and call it in std::visit([&producer, &time, &realArguments](auto && cb){
using T = std::decay_t<decltype(cb)>;
if constexpr (std::is_same_v<T, void (*)(const CheckResultProducer::Ptr &, double, const std::vector<String> &)>) {
cb(producer, time, realArguments);
} else if constexpr (std::is_same_v<T, void (*)(double, const std::vector<String> &)>) {
cb(time, realArguments);
} else {
static_assert(false, "Must be one or the other");
}
}, eci.Callback); Then you don't need any overloads or Edit: Fixed |
f0de2aa
to
e56731f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your commits still don't compile atomically. I suggest the following commit structure:
- The entire implementation of
CheckResultProducerComponent
(currently this is split between two commits for some reason) - Make classes inherit from this component (which I'd like to discuss in the first place) and call
Start()
andStop()
as appropriate - all function signature and call changes (including unit tests)
- Add the final check in
ProcessCheckResult()
That way each commit builds upon the previous one. Personally I'd squash it down even more ultimately, but that's as atomic as I'd make it for easy review.
But to start, see below for a first batch of suggestions. I'm happy to discuss all of these and learn more about your reasoning behind your choices.
e56731f
to
a4d52af
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This allows .ti classes multiple inheritance from Object.
Namely: Checkable#ProcessCheckResult() ClusterCheckTask::ScriptFunc() ClusterZoneCheckTask::ScriptFunc() DummyCheckTask::ScriptFunc() ExceptionCheckTask::ScriptFunc() IcingaCheckTask::ScriptFunc() IfwApiCheckTask::ScriptFunc() NullCheckTask::ScriptFunc() PluginCheckTask::ScriptFunc() RandomCheckTask::ScriptFunc() SleepCheckTask::ScriptFunc() IdoCheckTask::ScriptFunc() IcingadbCheck::ScriptFunc() CheckCommand#Execute() Checkable#ExecuteCheck() ClusterEvents::ExecuteCheckFromQueue() ExternalCommandProcessor::Process*CheckResult() ExternalCommandCallback ExternalCommandProcessor::Execute() ExternalCommandProcessor::ExecuteFromFile() ExternalCommandProcessor::ProcessFile() LivestatusQuery#ExecuteCommandHelper() LivestatusQuery#Execute()
a4d52af
to
45851bc
Compare
I've had a discussion with @julianbrost and @yhabteab with the following results:
|
Benefits of my version (ordered descending by importance):
... especially when you talk about Go, remember how context works:
This restriction would vanish here if I got your idea correctly:
(Given #10397 (comment), strange that you wanna give up such a restriction.) Let me know whether my concern is legit or I shall refactor anyway.
Before the component is activated? But why? |
This PR is the first step in the process of making the shutdown order of our components less unpredictable. Specifically, the goal is to ensure that components that produce any kind of events (e.g. notifications) or check results are stopped in a determined order, and they don't cause any post-stop events to be triggered. The problem this PR specifically addresses is that e.g. stopping the
CheckerComponent
does prevent new checks from being scheduled, but it does not wait for already started (so can't be cancelled) checks to finish. This can lead to a situation where the e.g.IcingaDB
component is stopped before such checks complete and causing all the resulted events to be lost, becauseIcingaDB
is not able to process them anymore. How does this PR solve this?It introduces a new
CheckResultProducer
interface that is implemented only by types that actually produce check results, which as of now areCheckerComponent
(for local checks) andApiListener
(for remote checks). Each of such instance maintains a simple counter of the number of ongoing checks results it produces. However, they don't just increment the counter after e.g.Checkable::ExecuteCheck()
is called, but only after the check execution produces a result and enters theCheckable::ProcessCheckResult()
method, and decremented when leaving it. This way, we don't have to worry about the fact that check execution time can be very long and block the shutdown of the component indefinitely (credit goes to @Al2Klimov). Ok, but how does this prevent triggering post-factum events, you ask? Well, theCheckable::ProcessCheckResult
requires now theCheckResultProducer
be passed in as an argument, and is only allowed to actually process that result after locking the producer, otherwise thecr
is discarded.But what does locking the producer mean? It doesn't mean acquiring the
ObjectLock
on that object, but rather acquiring a shared lock1 on it. However, here's the catch: the producer is only shared lockable if itsStop()
method hasn't been already called. Meaning, wheneverCheckerComponent::Stop()
is called, any thread that tries to acquire a shared lock on it inCheckable::ProcessCheckResult()
will ultimately fail and discard thecr
without ever processing it. On the other hand, if the checker was stopped after some threads have successfully acquired a shared lock on it (remember that when a thread acquires a shared lock, the producer automatically increments its counter), it will wait for all of them to finish while preventing any new threads from acquiring the lock until its counter goes back to0
.The only remaining problem that this PR doesn't solve is that
ApiListener::Stop()
implementation doesn't gracefully disconnect all active connections, which means that we might still lose some important check results received from remote endpoints. However, since this is a completely different issue, it will be addressed in a separate PR.Edit (yhabteab) end:
This is the very first and hardest step of the #10179 (comment) plan.
Checkable#ProcessCheckResult() now takes a CheckResultProducer::Ptr:
It tries to shared-lock the CheckResultProducer:
This implies that every component producing checks needs to implement CheckResultProducer:
Finally, the CheckResultProducer::Ptr has "just" to be passed through from the CheckResultProducer itself to Checkable#ProcessCheckResult():
Footnotes
The new
CheckResultProducer
interface satisfies thestd::SharedLockable
requirements. ↩