Skip to content

[BUG]: Data race in gRPC client interceptor when using grpc.Peer() CallOption #3895

@lihao-figma

Description

@lihao-figma

Tracer Version(s)

1.65.1

Go Version(s)

1.22

Bug Report

gRPC version: v1.73.0

  Race Detector Output

  WARNING: DATA RACE
  Read at 0x00c0007e9920 by goroutine 380:
    gopkg.in/DataDog/dd-trace-go.v1/contrib/google.golang.org/grpc.doClientRequest()
        contrib/google.golang.org/grpc/client.go:197 +0x506

  Previous write at 0x00c0007e9920 by goroutine 763:
    google.golang.org/grpc.PeerCallOption.after()
        google.golang.org/grpc@v1.73.0/rpc_util.go:271 +0xbe

This happens upon stream grpc call with fast context cancellation after start.

The issue occurs in doClientRequest (client.go):

  // line 191-197
  var p peer.Peer
  opts = append(opts, grpc.Peer(&p))

  handlerCtx := injectSpanIntoContext(ctx)
  err := handler(handlerCtx, opts)

  setSpanTargetFromPeer(span, p)  // RACE: Reading p here

in grpc library
google.golang.org/grpc@v1.73.0/stream.go line 391

	if desc != unaryStreamDesc {
		// Listen on cc and stream contexts to cleanup when the user closes the
		// ClientConn or cancels the stream context.  In all other cases, an error
		// should already be injected into the recv buffer by the transport, which
		// the client will eventually receive, and then we will cancel the stream's
		// context in clientStream.finish.
		go func() {
			select {
			case <-cc.ctx.Done():
				cs.finish(ErrClientConnClosing)
			case <-ctx.Done():
				cs.finish(toRPCErr(ctx.Err()))
			}
		}()
	}

The race happens because:

  1. A local peer.Peer variable is created and passed to gRPC via grpc.Peer(&p)
  2. The handler (which creates the gRPC stream/call) returns immediately
  3. DataDog immediately reads from p via setSpanTargetFromPeer()
  4. Meanwhile, gRPC populates the peer asynchronously (if context cancellation signal received) when the call completes, calling PeerCallOption.after() which writes to p

The grpc.Peer() CallOption is designed to be populated after the RPC completes, not immediately after stream creation. The peer information is only available once the connection is established and the call finishes. DataDog's code incorrectly assumes the peer is populated synchronously when the handler returns.

We can consider change fetching peer information from stream context?

  if p, ok := peer.FromContext(stream.Context()); ok {
      setSpanTargetFromPeer(span, *p)
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugunintended behavior that has to be fixedneeds-triageNew issues that have not yet been triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions