Skip to content

Implement RPC request interruption on timeout to prevent resource exhaustion #9116

@jflo

Description

@jflo

Problem Statement

Currently, when RPC requests timeout, they continue executing in the background, consuming CPU and memory resources. This can lead to system degradation when multiple requests timeout, as the underlying operations are not interrupted.

Proposed Solution

Implement request interruption for RPC methods that timeout, particularly for expensive operations like transaction tracing. Follow the pattern established by InterruptibleOperationTracer.java used in block building.

Acceptance Criteria

  • RPC requests that timeout are properly interrupted
  • Background resource consumption stops when timeout occurs
  • System recovers quickly from multiple simultaneous timeouts
  • Proper error responses returned to clients on timeout

Testing Requirements

Unit Tests

  • InterruptibleOperationTracer Extensions
    • Thread interruption detection
    • Exception propagation and wrapping
    • Interruption at various execution points (pre/post execution, precompile, context enter/exit)

Integration Tests

  • RPC Timeout Handling
    • Timeout triggers thread interruption
    • Resource cleanup after interruption
    • Partial execution handling
    • Concurrent timeout request handling

RPC Method-Specific Tests

  • DebugTraceTransaction,DebugTraceBlockByNumber, DebugTraceCall Timeout Tests
    • Immediate timeout (before execution)
    • Mid-execution timeout
    • Late timeout (near completion)
    • Error response format validation
    • State consistency after interruption

Performance and Load Tests

  • Resource Management
    • Memory leak prevention with many timeouts
    • Thread leak prevention
    • Throughput impact measurement
    • Cascading timeout handling
    • System recovery time after batch timeouts

Edge Cases and Error Scenarios

  • Complex Scenarios
    • Nested tracer interruption
    • Non-interruptible operation handling

Backward Compatibility Tests

  • Legacy Support
    • Existing non-interruptible tracers continue working
    • Configuration migration from old timeout settings

Mock and Stub Tests

  • Simulated Scenarios
    • Mock long-running operations
    • Mock interrupted system calls
    • Controllable timeout scenarios

End-to-End Tests

  • Full Request Lifecycle
    • Complete flow from RPC request to timeout to error response
    • Real blockchain data with expensive traces
    • Metrics and monitoring verification
    • Client retry behavior validation

Implementation Notes

  • Extend InterruptibleOperationTracer pattern to RPC methods
  • Focus initially on tracer-based methods: debug_traceTransaction, debug_traceCall, debug_traceBlockByNumber
  • Consider generic solution that can be applied to all RPC methods
  • Ensure thread safety and proper cleanup

Related Issues/PRs

  • Builds upon the pattern established for block building timeout handling
  • Related to overall RPC performance and stability improvements

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions