-
Notifications
You must be signed in to change notification settings - Fork 0
Description
We have been seeing a problem when users accidentally leave the SMuRF streams on (whether through user error, or through not realizing an interrupt has left them on -- see #166). This causes all subsequent SMuRF commands, i.e. take_bgmap
to fail, as the operations cannot run simultaneously, and so the command can't grab the ocs TimeoutLock
, causing it to fail.
In sorunlib
this causes us to stop commanding the controller. In practice, initialize()
is called again within a few hours, but then another take_bgmap
is run, which also fails for the same reason. The result is the stream
is never turned off until an observant user notices that this is the reason the controller is being dropped and manually stops them.
The proposed fix for this is to check the SMuRF stream
state after an operation fails. If stream
is on, turn it off and then drop the slot like usual. If we can't turn it off (up to some timeout period), raise an exception. If stream
is off, just drop the slot as normal. We'll still raise an exception if the threshold for number of running slots is crossed.