Proposal: Align context handling for connection creation with database/sql and net/http

### Description

In high-concurrency scenarios where the connection pool is exhausted, the `go-redis` library currently uses the incoming request's context (`reqCtx`) for the entire connection creation process (dialing, TLS handshake, AUTH, etc.) in `ConnPool.Get`. This creates a critical issue that is at odds with the established best practices implemented in the Go standard library.

If the request times out while the connection is being established, the dialing operation is canceled. This often happens even if the connection is milliseconds away from completion, leading to several detrimental effects that are exacerbated by certain common conditions:

**Three key factors compound this problem:**

1. **PoolTimeout Compression:** The configured `PoolTimeout` period is consumed while waiting for an available connection. By the time a new connection creation is initiated, a significant portion of the request's original timeout budget may already be exhausted, leaving an insufficient time window for the dialing operation to complete successfully. This makes connection establishment far more likely to fail.
2. **Asymmetric Timeout Configurations:** For latency-sensitive services, the timeout on the request context (dictated by upstream service level objectives) is often significantly shorter than a reasonable `DialTimeout` (which must account for network latency and TCP/TLS handshakes). This mismatch means the request is likely to time out well before the underlying dial operation has had a fair chance to succeed.
3. **Propagated Context Timeout Decay:** In distributed systems, the timeout on a request's context is typically set by the initial upstream client and decays as it propagates through the service chain. By the time it reaches a downstream service like Redis, the remaining time is frequently very short—often just milliseconds. This leaves virtually no time for a new connection to be established from scratch, making failures almost certain under pool contention.
    

**These factors lead directly to the following negative outcomes:**

1.  **Unnecessary connection establishment failures:** Perfectly good connections are aborted due to brief client-side timeouts.
2.  **Connection pool starvation:** A storm of timed-out requests can prevent the pool from ever being refilled, causing a cascade of failures for subsequent requests.
3.  **Diagnostic ambiguity and operational overhead:** It becomes extremely difficult to distinguish between genuine server-side/network connectivity issues and client-side timeout pressure. Observing a high rate of connection timeouts could indicate a problem with the Redis server, or it could simply be a symptom of the client's own aggressive timeouts cancelling viable connections. This ambiguity makes alerting, debugging, and root cause analysis significantly more complex and time-consuming.
4.  **Resource waste:** Server-side resources (TCP slots, CPU for TLS handshakes) are used without any benefit.

### Prior Art in Go Stdlib and the Robust Solution

This is a well-understood problem in connection pool design. The Go standard library's `database/sql` and `net/http` packages have already implemented robust solutions to prevent request timeouts from destabilizing the connection pool.

**1. database/sql.DB**
The `sql.DB` pool decouples connection creation from request lifecycles. It uses a long-lived, dedicated goroutine (`connectionOpener`) to serialize connection creation. Crucially, **it uses a background context for the connection establishment itself**.
*   **Source Code Reference:**
    *   The `connectionOpener` goroutine: [`sql.OpenDB`](https://github.com/golang/go/blob/7bba745820b771307593b7278ce17464eeda2f3d/src/database/sql/sql.go#L841)
    *   The `openNewConnection` method which uses the opener's context: [`(*DB).openNewConnection`](https://github.com/golang/go/blob/7bba745820b771307593b7278ce17464eeda2f3d/src/database/sql/sql.go#L1271)
    *  The `mysql/driver` use `Connect` method which creates new context under opener's context using connection [`Timeout`](https://github.com/go-sql-driver/mysql/blob/76c00e35a8d48f8f70f0e7dffe584692bd3fa612/connector.go#L94)

**2. net/http.Transport**
The implementation of `http.Transport` is almost the same. A dialing operation has its own timeout (`net.Dialer.Timeout`), but also listens for cancellation from the request context to avoid waste.
*   **Source Code Reference:**
    *   The `getConn` method creating a new dial context: [`(*Transport).getConn`](https://github.com/golang/go/blob/7bba745820b771307593b7278ce17464eeda2f3d/src/net/http/transport.go#L1493)
    *   The canonical example of context propagation: Look for the `select` statement with `<-ctx.Done()` and `dialCancel()` inside `getConn`.
    *   The `dialConn` method using a context which is detached from the request context's cancellation signal: [`(*Transport).dialConnFor`](https://github.com/golang/go/blob/7bba745820b771307593b7278ce17464eeda2f3d/src/net/http/transport.go#L1616)

### Expected Behavior

The library should adopt the same context strategy used by the standard library. The creation of a new connection, which is a shared resource, should be resilient to the cancellation of the individual request that triggered it.

The expected behavior is:
*   Connection creation should primarily be governed by a **independent, reasonable timeout** (e.g., the configured `DialTimeout`).
*   It should **also listen for cancellation** from the original request's context for efficient resource cleanup.
*   **Most importantly**, if the connection establishment succeeds *after* the original request has canceled, that healthy connection should still be placed into the pool for use by other requests.

### Willingness to Contribute

The solution involves modifying the connection establishment logic (in `internal/pool`) to implement this pattern. The following is the implementation of imitating the standard library:

```go

type wantConn struct {
	mu        sync.Mutex      // protects ctx, done and sending of the result
	ctx       context.Context // context for dial, cleared after delivered or canceled
	cancelCtx context.CancelFunc
	done      bool                // true after delivered or canceled
	result    chan wantConnResult // channel to deliver connection or error
}

func (w *wantConn) tryDeliver(cn *Conn, err error) bool {
	w.mu.Lock()
	defer w.mu.Unlock()
	if w.done {
		return false
	}

	w.done = true
	w.ctx = nil

	w.result <- wantConnResult{cn: cn, err: err}
	close(w.result)

	return true
}

func (w *wantConn) cancel(ctx context.Context, p *ConnPool) {
	w.mu.Lock()
	var cn *Conn
	if w.done {
		select {
		case result := <-w.result:
			cn = result.cn
		default:
		}
	} else {
		close(w.result)
	}

	w.done = true
	w.ctx = nil
	w.mu.Unlock()

	if cn != nil {
		p.Put(ctx, cn)
	}
}

type wantConnResult struct {
	cn  *Conn
	err error
}

func (p *ConnPool) asyncNewConn(ctx context.Context) (*Conn, error) {
	// First try to acquire permission to create a connection
	select {
	case p.dialsInProgress <- struct{}{}:
		// Got permission, proceed to create connection
	case <-ctx.Done():
		p.freeTurn()
		return nil, ctx.Err()
	}

	dialCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), p.cfg.DialTimeout)

	w := &wantConn{
		ctx:       dialCtx,
		cancelCtx: cancel,
		result:    make(chan wantConnResult, 1),
	}
	var err error
	defer func() {
		if err != nil {
			w.cancel(ctx, p)
		}
	}()

	go func(w *wantConn) {
		defer w.cancelCtx()
		defer func() { <-p.dialsInProgress }() // Release connection creation permission

		cn, cnErr := p.newConn(w.ctx, true)
		delivered := w.tryDeliver(cn, cnErr)
		if cnErr == nil && delivered {
			return
		} else if cnErr == nil && !delivered {
			p.Put(w.ctx, cn)
		} else { // freeTurn after error
			p.freeTurn()
		}
	}(w)

	select {
	case <-ctx.Done():
		err = ctx.Err()
		return nil, err
	case result := <-w.result:
		err = result.err
		return result.cn, err
	}
}
```

I am willing to work on a Pull Request to implement this change and would appreciate guidance from the maintainers on the preferred approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Align context handling for connection creation with database/sql and net/http #3497

Description

Prior Art in Go Stdlib and the Robust Solution

Expected Behavior

Willingness to Contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Align context handling for connection creation with database/sql and net/http #3497

Description

Description

Prior Art in Go Stdlib and the Robust Solution

Expected Behavior

Willingness to Contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions