Skip to content
This repository was archived by the owner on Jan 9, 2024. It is now read-only.
This repository was archived by the owner on Jan 9, 2024. It is now read-only.

Possible connection leak in ClusterBlockingConnectionPool #458

Open
@FranGM

Description

@FranGM

Hey there!

I've been tracking down an issue that I believe could have been caused (or at least made worse) by a possible connection leak in ClusterBlockingConnectionPool. Let me try to explain my theory to see if it makes sense.

In get_connection_by_node we pull connections from the queue one by one until we find a connection already established to this node or a None (free slot for a connection we can make).

My concern is with the logic that follows if we run out of items to pull from the queue:

            # queue is full of connections to other nodes
            if len(connections_to_other_nodes) == self.max_connections:
                # is the earliest released / longest un-used connection
                connection_to_clear = connections_to_other_nodes.pop()
                self._connections.remove(connection_to_clear)
                connection_to_clear.disconnect()
                connection = None  # get a new connection
            else:
                # Note that this is not caught by the redis cluster client and will be
                # raised unless handled by application code.
                # ``ConnectionError`` is raised when timeout is hit on the queue.
                raise ConnectionError("No connection available")

If we have pulled exactly max_connections from the pool and all of them are for different nodes then we repurpose one of those connections, otherwise we raise an Exception. Note that when raising this exception we don't put any connections we might have pulled back into the pool.

I think the logic on
if len(connections_to_other_nodes) == self.max_connections:
is a bug because it implies that all connections are available inside the queue at all times, but connections that are currently being used won't be in the pool, which means there are cases when you have less than max_connections elements in the queue (most of the time really), so we will potentially raise an exception when we have connections we could repurpose.

I'm currently testing a fix that would change that line with if len(connections_to_other_nodes) >0: but wanted to hear your opinion on this hypothesis before submitting a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions