Client reconnects but fails to register with a namespace #1475
-
First of all big thanks to developer(s) behind python-socketio and flask-socketio. I have been using both very successfully for a few years now. I have encountered a situation where server-client pair does not behave as I would expect it to. I am not sure if this is a bug or intended behaviour. Would appreciate some guidance. Here are minimal steps to recreate. server: import flask_socketio
import flask
import threading
app = flask.Flask('test')
sio = flask_socketio.SocketIO(app,
ping_timeout=10,
ping_interval=10,
logger=True,
engineio_logger=True,
)
@app.route("/simulate_timeout")
def timeout():
'''This route will simulate a network outage
it will make sure main server thread is blocked long enough for clients
to disconnect
'''
print('simulating timeout')
threading.Event().wait(timeout=30)
return 'simulating timeout'
@sio.on('connect', namespace='/clients')
def client_connected(message):
print(f'Client connected. {message=}')
@sio.on('disconnect', namespace='/clients')
def client_discconnected():
print('Client disconnected')
@sio.on('msg', namespace='/clients')
def msg(message):
print(f'/clients received {message=}')
sio.run(app, port=8081, host='0.0.0.0') Client setup (I run this from an interactive shell to make it simpler to explore objects) after the setup is complete: import socketio
sio = socketio.Client(
engineio_logger = True,
logger = True,
)
sio.connect(
url='http://127.0.0.1:8081/socketio',
transports=['websocket'],
namespaces=['/clients'],
retry=True,
) Sequence of steps to reproduce.
client:
server:
client:
server:
... so far so good. Now I open Here is what happens: client:
server:
If I then try try to examine the client it appears to be in a weird state. It is connected. And it continues to do PING/PONG with the server, but it is not registered with any namespace. Attempt to emit to the namespace client:
Is this intended behavior? Am I not using the client correctly? I am expecting client to reconnect to original namespace or fail to reconnect entirely. Any help appreciated
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
I'm not sure I understand this:
Why would waiting on an event block the server? If this is what happens for you then you have a problem. Like maybe you are using gevent or eventlet for the server, but your |
Beta Was this translation helpful? Give feedback.
-
Hi, @miguelgrinberg Code inside http://127.0.0.1:8081/simulate_timeout is not what I am using in production. In fact I agree with you code like that does not belong in a threaded server. I only use it to simulate a very rare condition that occurs in production on k8s. How it occurs I still did not figure out, but it may have to do not with my code or your library but with the infrastructure. From logs that I gathered I can see that when the situation occurs main thread gets hard-blocked long-enough for clients to begin disconnecting. Which is fine, it happens very rarely. What worries me is that when clients attempt reconnecting they end up in that weird state when it is considers itself connected but is not registered with a namespace. This leaves the whole client-server pair in an unusable state. I am open to any ideas. |
Beta Was this translation helpful? Give feedback.
-
I am using: I run flask-socketio.SocketIO with async_mode='eventlet', monkeypatched. In production my intervals are set to: I modified them in the example above to make it it easier to reproduce the situation locally. I am almost certain that blocking in production occurs due to infrastructure. So perhaps you statement "Really if you have a server that can block completely, I feel there is no point in looking at downstream problems." is the answer. Wouldn't the same situation occur due to a loss of comms between server and client? |
Beta Was this translation helpful? Give feedback.
-
You are right I don't use both gevent and eventlet simultaniously. Async mode is eventlet in my case. I will take your feedback seriously and consider moving to gevent. I was under impression that eventlet is preferred mode as it is the first in the list of default options selected by flask-socketio as outlined here A situation I am trying to recreate is:
What I expect to see:
What I actually see:
I admit that the method I used to simulate that is rather crude. Hard blocking the server was the first thing that came to my mind. But it appears to cause the same symptoms that I have observed in the logs of remote production server. Symptoms are:
Can you recommend a better method to simulate the same? Also it is worth mentioning that the code I shared is an absolutely minimal example that demonstrates the problem that I see and it is not my production code. |
Beta Was this translation helpful? Give feedback.
Right, and what I'm saying is that if the server gets blocked long enough that clients start to drop then all bets are off pretty much. This isn't a condition that is expected to happen during normal use.
I can't say for sure that there is no bug in the reconnection logic, maybe there is. But if the only way for this bug to appear is to block the server for more than 25 seconds (using default timeouts) then really the problem is the blocking of the server, which should never be allowed to happen. The whole point of using an asynchronous server is that everything is non-blocking.
I would recommend that you address the blocking problem in your server first. Then if the reconnection issues p…