Skip to content

fix(rivetkit): use unbounded actor channels#5003

Merged
NathanFlurry merged 1 commit intomainfrom
rivetkit-channels/use-unbounded-channels
May 9, 2026
Merged

fix(rivetkit): use unbounded actor channels#5003
NathanFlurry merged 1 commit intomainfrom
rivetkit-channels/use-unbounded-channels

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

@NathanFlurry NathanFlurry commented May 9, 2026

Summary

  • Switch RivetKit actor, lifecycle event, serverless response, and Rust client websocket channels to unbounded Tokio channels.
  • Remove the obsolete actor inbox capacity fields from ActorConfig. This is an intentional Rust API break because those fields are now meaningless.
  • Keep inbox depth Prometheus gauges and add a regression test for their rendered output.

Testing

  • cargo check -p rivetkit-core
  • cargo test -p rivetkit-core --no-run
  • cargo test -p rivetkit-core actor_inbox_depth_metrics_render
  • git diff --check

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 9, 2026

🚅 Deployed to the rivet-pr-5003 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web May 9, 2026 at 7:33 am
website 😴 Sleeping (View Logs) Web May 9, 2026 at 2:52 am
frontend-inspector ❌ Build Failed (View Logs) Web May 9, 2026 at 2:43 am
ladle ❌ Build Failed (View Logs) Web May 9, 2026 at 2:43 am
frontend-cloud ❌ Build Failed (View Logs) Web May 9, 2026 at 2:42 am
mcp-hub ✅ Success (View Logs) Web May 9, 2026 at 2:42 am

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 9, 2026

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 9, 2026

PR Review: fix(rivetkit): use unbounded actor channels (#5003)

Summary: This PR replaces all bounded mpsc::channel(N) with mpsc::unbounded_channel() across the actor lifecycle/dispatch/event inboxes and the Rust client WebSocket driver. It also removes the capacity config fields from ActorConfig, the overload metric counters, and the actor_channel_overloaded_error helper. A new regression test for inbox depth metric rendering is added.


Design Decision: Is Removing Back-pressure the Right Call?

This is the core question the PR needs to justify. The previous design was documented as an intentional choice: bounded queues gave explicit back-pressure and produced actor.overloaded errors with try_reserve so the system could reject excess work fast, emit per-actor + global warning diagnostics, and track overload counts in Prometheus. All of that is gone.

Concerns:

  1. CLAUDE.md constraint is now violated. The current CLAUDE.md states:

    Actor-owned lifecycle / dispatch / lifecycle-event inbox producers use try_reserve helpers and return actor.overloaded. Do not await bounded mpsc::Sender::send.

    This PR removes try_reserve and the actor.overloaded path entirely. The CLAUDE.md rule needs to be updated in this same commit to reflect the new design, or the rule signals the constraint still applies and this PR is wrong. As-is, the docs and code are out of sync.

  2. Memory growth is now unbounded. With bounded channels, a slow/stuck actor would back-pressure callers. With unbounded channels, a stuck actor accumulates commands indefinitely. The PR description does not explain why the change is safe.

  3. Overloaded error variant in error.rs is now dead code for inbox paths. The only remaining caller is sqlite.rs. The variant still has channel and capacity fields that no longer apply to non-sqlite uses.


Positives

  1. println! replaced with debug!. rivetkit-rust/packages/client/src/drivers/ws.rs had a raw println!("Receiver dropped") now correctly using debug!. Good catch, consistent with CLAUDE.md logging rules.

  2. #[cfg(wasm32)] dead branches removed cleanly. Removing bounded channels eliminates the separate wasm32/non-wasm32 try_reserve error handling paths.

  3. .await calls removed from UnboundedSender::send. Test helpers now call the infallible sync send correctly without .await.

  4. actor_id field removed from ActorMetrics. Only used by the now-deleted actor_channel_overloaded_error. Cleaning it up reduces struct size.

  5. New test actor_inbox_depth_metrics_render. Confirms the three depth gauges still render correctly after removing the overload counters.


Issues and Suggestions

Critical

CLAUDE.md must be updated in this commit.
The invariant still says to use try_reserve and return actor.overloaded. This creates confusion for future contributors. Update the rule to reflect the new design (unbounded channels, no back-pressure at the queue level, overload signaled only at the SQLite layer), or add a justification for why unbounded is safe here.


Moderate

Error semantics change is subtle for callers (state.rs).
request_save_with_revision now maps a send error to ActorRuntime::NotConfigured (component: lifecycle events). With unbounded channels, a send error means the receiver is dropped, which happens when the actor is stopping, but the message Actor capability is not configured is misleading. Consider a dedicated error code (e.g. ActorLifecycle::Stopping) or at minimum a comment explaining that send-failure means the receiver was dropped, not that the actor is misconfigured.

try_send_lifecycle_command / try_send_dispatch_command now map receiver-dropped to NotReady (task.rs).
After an actor has been started and then stopped, returning NotReady to a caller that already observed the actor as ready is confusing. Something like ActorLifecycle::Stopping would be more accurate for the post-shutdown case.


Minor / Style

record_inbox_depths is called on every loop iteration (task.rs).
There is already a // TODO: Sample inbox depths periodically instead of on every loop iteration. comment. A follow-up issue or .agent/todo/ entry would help this not be forgotten.

ActorConfig breaking change not mentioned in commit message.
The PR removes lifecycle_command_inbox_capacity, dispatch_command_inbox_capacity, and lifecycle_event_inbox_capacity from ActorConfig. If the crate is versioned publicly, a BREAKING CHANGE footer or semver bump entry would be appropriate.


Test Coverage

The regression test actor_inbox_depth_metrics_render is a good addition. All channel creation sites appear to be correctly updated.

Missing test: There is no test exercising the new error path in try_send_lifecycle_command and try_send_dispatch_command when the receiver is dropped (returning NotReady on a closed channel). The old path had explicit tests around actor.overloaded. If that error code matters for clients, a test confirming the behavior would be valuable.


Summary

The mechanical changes are correct: all bounded channels replaced, .await dropped from sync sends, config fields removed, tests updated. The main gaps are:

  1. Update CLAUDE.md to reflect the new no-back-pressure design (the current rule directly contradicts this PR).
  2. Improve error code accuracy for the receiver-dropped case in request_save_with_revision and try_send_*_command -- NotConfigured and NotReady are both misleading when the actor has been stopped.
  3. Add justification in the PR description or a code comment explaining why unbounded queues are safe given Pegboard's concurrency model.

@NathanFlurry NathanFlurry force-pushed the rivetkit-channels/use-unbounded-channels branch from 6e295e4 to 3388aec Compare May 9, 2026 07:33
@NathanFlurry NathanFlurry force-pushed the chore/envoy-client-envoy-key-logging branch from 283da30 to 5fddabc Compare May 9, 2026 07:33
@railway-app railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-5003 May 9, 2026 07:33 Destroyed
Base automatically changed from chore/envoy-client-envoy-key-logging to main May 9, 2026 07:50
@NathanFlurry NathanFlurry merged commit 3388aec into main May 9, 2026
11 of 18 checks passed
@NathanFlurry NathanFlurry deleted the rivetkit-channels/use-unbounded-channels branch May 9, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant