-
Notifications
You must be signed in to change notification settings - Fork 0
Description
A significant upgrade to pymsurf
that will remove EPICS as a communication layer in SMuRF has been in the works for a while. It may alleviate some of the lock-up issues that have been a regular occurrence on site (although I don't think the cause of those has been decisively identified yet), but in addition to that it should noticeably improve performance of register operations. This will probably result in some level of speed-up for most operations as-is, but we think we can refactor some of the code to better take advantage of the new capabilities. In particular, tuning resonators had been identified as a bottleneck that could be made a lot faster by introducing parallelisation that is not possible with the current EPICS communication.
However, in a recent call it came up that actually tuning is not thought of as a bottleneck for site operations, but possibly relock or taking IVs is. I don't think this should make us reconsider improving tuning performance, but we could focus also on improving the performance of these other operations if that is useful. Note that this doesn't have to be conditional on the pysmurf
overhaul mentioned above, as it may or may not be possible to speed up these operations in the current code.
I don't have much insight into what operations are bottlenecks, how frequently they need to be run, constraints from the scheduler -- in general any requirements for their performance. So specific requests would be helpful.
I tried looking at nextline
scripts and pysmurf
logs to get an idea of what is being run, and although I found a few variations, the detector calibration that is run periodically looks something like this:
2025-03-24T21:14:03+0000 stream:378 Finished streaming data
2025-03-24T21:14:21+0000 start called for take_bias_steps
2025-03-24T21:15:25+0000 take_bias_steps:379 Status is now "done".
2025-03-24T21:16:30+0000 start called for take_bgmap
2025-03-24T21:17:30+0000 take_bgmap:380 Status is now "done".
2025-03-24T21:17:31+0000 start called for take_noise
2025-03-24T21:18:31+0000 take_noise:381 Status is now "done".
2025-03-24T21:18:31+0000 start called for take_iv
2025-03-24T21:26:24+0000 take_iv:382 Status is now "done".
2025-03-24T21:26:25+0000 start called for bias_dets
2025-03-24T21:26:29+0000 bias_dets:383 Status is now "done".
2025-03-24T21:29:31+0000 start called for take_bias_steps
2025-03-24T21:30:35+0000 take_bias_steps:384 Status is now "done".
2025-03-24T21:30:35+0000 start called for take_noise
2025-03-24T21:31:37+0000 take_noise:385 Status is now "done".
2025-03-24T21:36:03+0000 start called for stream
which takes 15min to run, and half of that time being take_iv
, which spends 5min in "cool wait".
The other step I looked at is uxm_relock
, which appears to be run daily
2025-03-23T22:31:37+0000 start called for zero_biases
2025-03-23T22:31:37+0000 zero_biases:200 Status is now "done".
2025-03-23T22:33:38+0000 start called for take_noise
2025-03-23T22:34:41+0000 take_noise:202 Finished taking noise
2025-03-23T22:34:46+0000 start called for uxm_relock
2025-03-23T22:40:19+0000 uxm_relock:203 Finished UXM Relock
This takes only about 5min.
I didn't find an instance of tuning (uxm_setup
) in the logs, but I know this takes very long, like 45min or so.
What I'd like to know is:
- Tuning, relock, calibration as I've defined them above -- do these span all SMuRF operations on site that we could consider optimising?
- Which of these are currently most constraining for scheduling?
- At what cadence do we need to run these? (if this is known)
- Would we do it more frequently if they were faster? Which steps specifically?
People who may be interested in this topic: @msilvafe @kmharrington @tpsatt @swh76