-
Notifications
You must be signed in to change notification settings - Fork 5
Example is getting caught in a loop #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @jcalfee , Thanks for ther report. At first glance this message "request entries done" originates from the server part of the raft engine. This code does not take part in maintaining inter-node raft communication or cluster integrity, but rather it handles communication with clients. Last time I was extensively testing zmq-raft was with node v16. I'm still using it with node v18, but in a limited fashion- I'm using an in-process raft node only, so there are no external entry requests. Perhaps you are right and something changed in the node stream API and now the code might be doing something wrong. I'll take a look into it and let you know my findings. In the meantime perhaps try to repeat your exercise but with all the raft clients disconnected from the cluster and let me know the outcome. |
I'm not sure how to run the 3+1 example and have clients that are not connected to a cluster. I tried v14 with a newly installed node_modules and no ./tmp with the same results. Test data was not needed. Just to re-iterate, this is in a docker container and I don't fully understand the ramifications; I can see that the container is otherwise very stable, I do all my development work all day long within and interacting with this container. If I rebooted, who knows, something may change/fix within the docker container. I have not rebooted because everything else is generally stable, so there is usually something to learn, and using node-zmq-raft in production would likely involve containerd or dockerd. |
Have you come across this situation?
It hangs when adding a 4th node
bin/zr-config.js -c config/example.hjson -a tcp://127.0.0.1:8347/4
The info and www interface usually reports that the 4th node was added. But, more often than not, when I re-try and remove or add the 4th node again, one of the other 3 running nodes gets caught in a loop:
That starts a the long running loop you see above (presumably forever).
I can press ctrl+c to stop the looping node and the issue moves to the next Leader. If I bring it down to 2 nodes it stops then bring up the 3rd node, the looping continues in what ever node is the new Leader. It does not matter what node I start and stop, the loop just continues in the new Leader.
The
lastIndex
is400
(the same number of messages created in the test data before I issued.stop
). The queue has one item in the array, a buffer of size 400....
It is clear it is calling
cancel
and removing theondata
listener:It must be calling ondata again to get that data back into the queue:
I don't have any changes and I can re-create the issue almost every time even after removing the
tmp
directory. It think it goes back to the-a
add command hanging, and also why does that keep affecting the Leader.This appeared on node v18.19.0 but I see the same issue with v20 as well, which means I did have to remove node_modules and re-install (
pnpm i
) sozeromq
would work. The only other thing, run codium in Docker (x11docker if your interested) and start all my processes from codium (everything in docker). I doubt this is the cause though, I just wanted to help check the node version because this may have something to do with streams and timing.The text was updated successfully, but these errors were encountered: