Skip to content

Avoid blocking stubs hanging indefinitely when client interceptor listeners throw uncaught exceptions #12348

@nicholashagen

Description

@nicholashagen

Is your feature request related to a problem?

There are situations in which uncaught exceptions (mainly Error such as NoSuchMethodError) can occur within interceptors and those can lead to hanging RPCs. Typically, applications catch Exception rather than Error or Throwable. When the latter occurs, it results in hanging RPCs. Those particular errors are obviously bad and the application is unhealthy. However, these paths may only trigger in certain situations and resulting in a hanging RPC fully exasperates the issue worse. Allowing gRPC to properly close RPCs when these occur would at least avoid making things worse for the app.

Describe the solution you'd like

Ideally, gRPC would catch any Throwable within the call chain including interceptors and result in a failed RPC and properly release any resources attached.

Describe alternatives you've considered

The alternative here is to have each app, each interceptor, etc catch Throwable themselves to protect against these cases and abide by the ClientCall and Listener specs. This can easily be missed or forgotten.

Additional context

The following is sample blocking code and interceptor which produces the hanging effect.

                 int port = 12000;

		NettyServerBuilder.forPort(port)
				.addService(ServerInterceptors.intercept(new TestServiceGrpcImpl("testservice")))
				.build()
				.start();

		ManagedChannel channel = NettyChannelBuilder.forAddress("localhost", port)
				.usePlaintext()
				.intercept(new ClientInterceptor() {
					@Override
					public <ReqT, RespT> ClientCall<ReqT, RespT> interceptCall(MethodDescriptor<ReqT, RespT> method, CallOptions callOptions, Channel next) {
						return new ForwardingClientCall.SimpleForwardingClientCall<ReqT, RespT>(next.newCall(method, callOptions)) {
							private ReqT request;

							@Override
							public void start(Listener<RespT> responseListener, Metadata headers) {
								super.start(new ForwardingClientCallListener.SimpleForwardingClientCallListener<RespT>(responseListener) {
									@Override
									public void onClose(Status status, Metadata trailers) {
										//normal flow is doing some work and calling super.onClose
										//super.onClose(status, trailers);

										// bad flow is an Error happens such as NoSuchMethodError
										throw new NoSuchMethodError();
									}
								}, headers);
							}
						};
					}
				})
				.build();

		TestResponse response = TestServiceGrpc.newBlockingStub(channel).echoMessage(TestRequest.getDefaultInstance());
		Assert.assertEquals("testservice", response.getName());

The successful path using super.onClose will invoke UnaryStreamToFuture.onClose as the parent delegate and blocking stub. This will mark the GrpcFuture as completed with a value or exception. That closes the future and results in ClientCalls.blockingUnaryCall finishing.

The failing path skips that and results in waitAndDrain on the ThreadlessExecutor. That calls that onClose listener as expected, but runQuietly catches Throwable, logs it, swallows it, and returns. This results in the future never being closed and results in the blocking call hanging and never completing.

This is by design as onClose states nothing should throw. However, Throwable seems excessive to avoid and protecting against hanging RPCs would be beneficial. I'm not sure precisely where that would even occur to ensure the future is canceled or failed appropriately, so marking this as a feature request more than a bug.

For now, we are protecting by updating various interceptors to catch Throwable in specific cases.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions