-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Description
Cause not known, but it would appear a message is sent before closure of the websocket could complete or known about:
Task exception was never retrieved
future: <Task finished name='Task-10268090' coro=<TornadoSubscriptionServer.on_start() done, defined at /opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py:129> exception=WebSocketClosedError()>
Traceback (most recent call last):
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 148, in on_start
await self.send_execution_result(
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 173, in send_execution_result
await BaseSubscriptionServer.send_execution_result(
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/graphql_ws/base_async.py", line 181, in send_message
return await connection_context.send(message)
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 49, in send
await self.ws.write_message(data)
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/tornado/websocket.py", line 360, in write_message
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 151, in on_start
await self.send_error(connection_context, op_id, e)
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/graphql_ws/base_async.py", line 181, in send_message
return await connection_context.send(message)
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 49, in send
await self.ws.write_message(data)
File "/opt/niwa/share/cylc/8.4.3/lib/python3.9/site-packages/tornado/websocket.py", line 360, in write_message
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
Note: This code has changed location content since 1.6.x/8.4.x, however the send mechanism is likely similar so a fix should be against 1.7.0 once understood
Also, this was found in NIWA operations, and there are indications this may have a large contribution to memory accumulation of the UI Server...
Servers usually start with about ~3-4Gb memory used on their designated VMs and currently, after a similar load and run duration, the oper
and test
systems differ in their memory used:
Oper:
$ free -h
total used free shared buff/cache available
Mem: 31G 16G 13G 57M 664M 13G
Swap: 0B 0B 0B
Test:
total used free shared buff/cache available
Mem: 15G 3.8G 592M 61M 11G 10G
Swap: 0B 0B 0B
The only difference found was oper
has 36 of these errors in the log (test none):
[ecox_oper@w-ec-admin02 ~]$ grep -c 'During handling of the above exception' nohup.out
36
where oper
UIS appears to have a compounding memory build..
Perhaps a coincidence, however, if subscriptions aren't cleaned up properly their queues could be accumulating deltas.. (which would explain the apparent compounding effect as the error count increases)
Reproducible Example
Unknown.
Expected Behaviour
Send attempt on closed web-socket is handled gracefully, ending all associated subscriptions (etc...) like on_close