Skip to content

Error resolving broadcast hostname being swallowed #716

@keyz182

Description

@keyz182

Describe the bug

We had a deployment of opal-server that we'd typoed the broadcast URI. We were seeing websockets disconnect errors in the clients, but no errors server side, so no clues it was the broadcast URI.

After turning debug logging on for the server, I spotted the following output:

2024-12-04T15:27:01.018085+0000| fastapi_websocket_rpc.websocket_rpc_e...| INFO  | Client connected
2024-12-04T15:27:01.018500+0000| websockets.legacy.server                | INFO  | connection open
2024-12-04T15:27:01.022749+0000| fastapi_websocket_rpc.rpc_channel       |DEBUG  | Handling RPC request - {'request': RpcRequest(method='_ping_', arguments={}, call_id='5d8d421e3dc94751a035b202644c8e8a'), 'channel': '72c2a547ffc64741bd095079bc778d7d'}
2024-12-04T15:27:01.023420+0000| fastapi_websocket_pubsub.pub_sub_server |DEBUG  | task is done: {<Task finished name='Task-3053' coro=<EventBroadcaster.__read_notifications__() done, defined at /usr/local/lib/python3.10/site-packages/fastapi_websocket_pubsub/event_broadcaster.py:245> exception=gaierror(-2, 'Name or service not known')>}
2024-12-04T15:27:01.023565+0000| fastapi_websocket_pubsub.event_broadc...| INFO  | Cancelling broadcast listen task
2024-12-04T15:27:01.023677+0000| fastapi_websocket_pubsub.event_broadc...|DEBUG  | Unsubscribing from ALL TOPICS
2024-12-04T15:27:01.023790+0000| fastapi_websocket_pubsub.event_notifier |DEBUG  | Removing Subscription of topic='__EventNotifier_ALL_TOPICS__' for subscriber=cc43be7c3ebb41e1b4869ace10d213db
2024-12-04T15:27:01.023948+0000| fastapi_websocket_rpc.websocket_rpc_e...| INFO  | Client connection failed - 42723 :: 72c2a547ffc64741bd095079bc778d7d
2024-12-04T15:27:01.024165+0000| fastapi_websocket_pubsub.pub_sub_server |DEBUG  | Leaving endpoint's main loop
2024-12-04T15:27:01.026785+0000| websockets.legacy.server                | INFO  | connection closed

Based on the 4th line, I went to look at fastapi_websocket_pubsub/event_broadcaster.py:245 and saw it was referencing the broadcast URI, at which point I double checked ours and saw the typo.

What I believe to be a bug is that the error is being swallowed, and should probably be elevated to an ERROR level log for visibility.

To Reproduce
Set the broadcast URI to an invalid value - in our case, a postgres URI with a hostname that didn't resolve.

Logs: se616-opal-opal-server-545c454db-hx2nc.log

Expected behavior
A clear error level log to indicate that the broadcast URI could not be resolved

OPAL version

  • Version: Client - 0.7.15, Server - 0.7.8

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions