Skip to content

Conversation

BrianJKoopman
Copy link
Member

Just a draft for now. There are a couple of implementation things to maybe work out, and I need to write tests.

This PR adds a check when running an operation for streams that are still running. The check occurs after the operation will have failed due to the task not being able to grab the ocs TimeoutLock within the agent because the stream process was left running, and before we drop the client due to this error.

We check if a stream is still running, and if it is, try to stop it. If we can stop it, great, it's now stopped and we drop that client. It'll get picked back up in the schedule at the next initialize(). If we can't stop it (the .wait() times out after 2 minutes), then we raise an exception, crashing the schedule.

The one sticking point to discuss is what to do on ClientControlError, which is likely when there's a network issue. This was the culprit in recent streams being left on, as described in #229. As written right now, if we can't check the status of stream, we crash the schedule. This, along with other similar errors when trying to stop/wait for the streams, do the same. This likely leaves the streams running if they were running already. The only indication of this is the "Streams may still be running" in the error message.

I'll aim to discuss this last point in the #229 issue. Just wanted to get the example code for this solution up in draft form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant