Skip to content

Pool::close does not always wait for all connections to close #3217

@madadam

Description

@madadam

Bug Description

Despite what the documentation says, calling Pool::close does not actually always wait for all the connections to close before returning. This is because of a bug / race condition in PoolInner::close.

This issue is hard to detect because it has almost no observable effects. Probably the only db engine where this is detectable is sqlite. We use sqlite in WAL mode which means there are two additional files next to the main database file - one whose name ends in -wal and the other in -shm. According to the sqlite documentation, those should be deleted when the last connection closes (unless the last connection is read-only). We use a setup where we have a pool with read-only connection and a separate, single read-write connection. We first close the read-only pool using Pool::close and only when it returns we close the read-write connection. The expectation is that the read-write connection should be the last connection at that point and so it should perform the WAL checkpoint and delete the two auxiliary files. We noticed this wasn't always the case. Eventually we traced this issue down to a bug (actually, two bugs) in PoolInner::close which in some cases didn't wait for all the connections to close and so there were still some read-only connections open by the time we closed the read-write one. This prevented it from running the WAL checkpoint and deleting the auxiliary files.

The first bug is a race condition when idle_conns to not always empty by the time the loop finishes. This means any connections still in idle_conns are not closed until the pool itself has been dropped.

Second bug is that when idle_conns is drained, the idle connections are temporarily treated as live which means a semaphore permit is created for them out of thin air. Then when this permit is released it causes the semaphore to get more available permits than it had initially which on the last iteration of the loop, causes this acquire to complete prematurely.

Info

  • SQLx version: 0.7.4
  • SQLx features enabled: runtime-tokio, sqlite
  • Database server and version: sqlite 3.45.3
  • Operating system: linux
  • rustc --version: rustc 1.77.1 (7cf61ebde 2024-03-27)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions