Skip to content

jackhammer hangs until controller agents restarted #463

@BrianJKoopman

Description

@BrianJKoopman

Filing this issue based on reports from @msilvafe and @aashrita because I have a proposed fix.

With jackhammer no longer interacting with the OCS agents, since we split those docker containers into a separate compose file (following the implementation of #451), it seems a jackhammer hammer will hang until the pysmurf-controller agents are rebooted manually by the user using Host Manager.

This was first reported by @msilvafe on June 5th on satp1. @aashrita provided some output from the same issue occurring on satp3 on June 8th. Running the following it hung here until agents were restarted via host manager:

cryo@smurf-so10-satp3:~/docker/pcie/v2.1.1$ jackhammer hammer 4
You are hard-resetting slots [4]. Are you sure (y/n)? y

Dumping docker logs to /data/logs/17494/1749402654
Saving 'docker ps' to /data/logs/17494/1749402654/docker_state.log
Saving ocs-det-controller-c2s5 logs to /data/logs/17494/1749402654/ocs-det-controller-c2s5.log
Saving ocs-det-controller-c2s4 logs to /data/logs/17494/1749402654/ocs-det-controller-c2s4.log
Saving ocs-det-crate-2 logs to /data/logs/17494/1749402654/ocs-det-crate-2.log
Saving ocs-daq-sync-smurf-so10 logs to /data/logs/17494/1749402654/ocs-daq-sync-smurf-so10.log
Saving ocs-det-monitor-so10 logs to /data/logs/17494/1749402654/ocs-det-monitor-so10.log

Proposed Solution

I think the best solution here would be to get jackhammer to resume restarting any required ocs agents, just via the host manager, rather than directly using docker compose itself. This may require adding the host manager instance-id to sys_config.yaml, or maybe just assuming there is only one host manager (which there should be now, in all cases on site).

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions