redwoodjs
diff --git a/‎.github/workflows/playground-e2e-tests.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/playground-e2e-tests.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/smoke-test.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/smoke-test.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎.notes/justin/worklogs/2025-10-13-resilient-module-state-in-dev.md‎
Lines changed: 1425 additions & 0 deletions b/‎.notes/justin/worklogs/2025-10-13-resilient-module-state-in-dev.md‎
Lines changed: 1425 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 11 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎docs/architecture/devServerDependencyOptimization.md‎
Lines changed: 18 additions & 8 deletions b/‎docs/architecture/devServerDependencyOptimization.md‎
Lines changed: 18 additions & 8 deletions
diff --git a/‎docs/architecture/devServerStability.md‎
Lines changed: 45 additions & 0 deletions b/‎docs/architecture/devServerStability.md‎
Lines changed: 45 additions & 0 deletions
diff --git a/‎docs/architecture/endToEndTesting.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/architecture/endToEndTesting.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/architecture/index.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/architecture/index.md‎
Lines changed: 3 additions & 0 deletions
@@ -51,7 +51,7 @@ on:
 
 concurrency:
   group: ${{ github.workflow }}-${{ github.ref }}
-  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
+  cancel-in-progress: ${{ github.event_name == 'pull_request_target' }}
 
 env:
   CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
 
@@ -51,7 +51,7 @@ on:
 
 concurrency:
   group: ${{ github.workflow }}-${{ github.ref }}
-  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
+  cancel-in-progress: ${{ github.event_name == 'pull_request_target' }}
 
 env:
   CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
 
@@ -2,6 +2,8 @@
 
 .rwsync.lock
 
+.tmp/
+
 logs
 _.log
 npm-debug.log_
 
@@ -187,6 +187,17 @@ You can also specify a package manager or enable debug logging using environment
 PACKAGE_MANAGER="yarn" DEBUG='rwsdk:e2e:environment' pnpm test:e2e hello-world/__tests__/e2e.test.mts
 ```
 
+#### Local Development Performance
+
+To speed up the local test-and-debug cycle, the E2E test harness uses a caching mechanism that is **enabled by default** for local runs.
+
+-   **How it Works**: The harness creates a persistent test environment in your system's temporary directory for each playground project. On the first run, it installs all dependencies. On subsequent runs, it reuses this environment, skipping the lengthy installation step. The cache is automatically disabled in CI environments.
+-   **Disabling the Cache**: If you need to force a clean install, you can disable the cache by setting the `RWSDK_E2E_CACHE` environment variable to `0`:
+    ```sh
+    RWSDK_E2E_CACHE=0 pnpm test:e2e
+    ```
+-   **Cache Invalidation**: If you change a playground's `package.json`, you will need to manually clear the cache for that playground to force a re-installation. The cache directories are located in your system's temporary folder (e.g., `/tmp/rwsdk-e2e-cache` on Linux).
+
 #### Skipping Tests
 
 You can skip dev server or deployment tests using environment variables. This is useful for focusing on a specific part of the test suite.
 
@@ -24,27 +24,31 @@ The sequence of events was as follows:
 
 This happened because the initial optimization pass was only aware of third-party `node_modules` dependencies; it had no knowledge of the application's internal dependency graph.
 
-## The Solution: A Unified, Proactive Scan
+## The Solutions
 
-The solution is a unified strategy that proactively scans the *entire* dependency graph—both third-party and application code—and feeds this complete picture to Vite at startup. This solves both problems at once by ensuring all dependencies are discovered before they are needed, eliminating both request waterfalls and re-optimization triggers.
+The solution is a two-pronged strategy. First, a proactive dependency scan solves the performance problem and reduces the frequency of re-optimizations. Second, a virtual state module provides true resilience against the state-loss that occurs when a re-optimization is unavoidable.
 
-This strategy has three main parts.
+### Solution 1: Proactive Scanning to Prevent Waterfalls and Reduce Re-Optimizations
 
-### 1. A Standalone `esbuild` Scan
+The first part of the solution is a unified strategy that proactively scans the *entire* dependency graph—both third-party and application code—and feeds this complete picture to Vite at startup. This solves the performance problem by ensuring all dependencies are discovered before they are needed, eliminating request waterfalls. It also mitigates the stability problem by making re-optimizations much less frequent, as the optimizer has a more complete picture of the graph from the outset.
 
-The core of the solution is our own, separate `esbuild` scan that runs before Vite's `optimizeDeps` process begins. This scan traverses the application's entire dependency graph to create a definitive list of all modules.
+However, this proactive scan cannot account for dependencies that are truly new, such as when a developer adds an import to a new package or module mid-session. When this happens, a re-optimization is still triggered, which leads to the second part of the solution.
+
+#### 1. A Standalone `esbuild` Scan
+
+The core of this strategy is our own, separate `esbuild` scan that runs before Vite's `optimizeDeps` process begins. This scan traverses the application's entire dependency graph to create a definitive list of all modules.
 
 The scanner's most critical feature is its custom, Vite-aware module resolver, which ensures its dependency traversal perfectly mimics the application's actual runtime behavior, correctly handling complex project configurations like TypeScript path aliases.
 
 For a detailed explanation of the scanner's implementation and the rationale behind its design, see the [Directive Scanning and Module Resolution](./directiveScanningAndResolution.md) documentation.
 
-### 2. The "Barrel File" Strategy to Inform the Optimizer
+#### 2. The "Barrel File" Strategy to Inform the Optimizer
 
 Instead of feeding hundreds of individual files to `optimizeDeps`, we consolidate them into **"barrel files."** We create separate barrels for third-party dependencies (which we refer to as **vendor barrels**) and for the application's own source code.
 
 This approach works *with* the bundler's expectations. By providing a small, consolidated list of entry points (the barrel files), we signal a complete and interconnected dependency graph. This allows `esbuild` to perform an efficient, comprehensive optimization pass that avoids both excessive chunking and the need for later re-optimization.
 
-### 3. Synchronized Execution and Assertive Resolution
+#### 3. Synchronized Execution and Assertive Resolution
 
 A final challenge is the timing and execution of this process within Vite's lifecycle. Vite starts many processes in parallel, creating potential race conditions. Furthermore, Vite's dependency scanner is designed to treat application code as "external" by default, meaning it won't scan it for dependencies.
 
@@ -56,4 +60,10 @@ We solve this with a hybrid blocking and resolution strategy:
 
 3.  **Assertive Resolution:** The same `esbuild` plugin intercepts resolution requests for the application's own source files. It then explicitly returns a resolution result, claiming the file and signaling that it is *internal* code that must be scanned for dependencies. This preempts Vite's default behavior and ensures the entire application graph is traversed.
 
-This approach provides a stable and performant development environment by ensuring Vite has a complete dependency graph from the outset, balancing perceived startup performance with the correctness required to prevent disruptive re-optimizations.
+### Solution 2: A Virtual State Module for Resilient State
+
+To solve the module state loss problem definitively, the framework introduces a centralized, virtual state module that is insulated from Vite's re-optimization process. This module, identified by the specifier `rwsdk/__state`, acts as the single, persistent source of truth for all critical framework-level state.
+
+A dedicated Vite plugin is responsible for managing this module. Its primary job is to mark `rwsdk/__state` as "external" to Vite's dependency optimizer for the `worker` environment. This simple but critical step prevents the state module from being included in the dependency graph that Vite reloads. When a re-optimization occurs, all other application and framework modules are re-instantiated, but the virtual state module remains untouched, preserving its state across the reload.
+
+This approach directly solves the state-loss problem, making features that rely on module-level state (like `AsyncLocalStorage` for request context) resilient to dependency changes during development. It also encourages a more organized approach to state management within the framework by providing a central, explicit location for all shared state.
@@ -0,0 +1,45 @@
+# Architecture: Dev Server Stability
+
+## The Challenge: Unstable Server Renders on Re-optimization
+
+Vite's dependency re-optimization is a core feature of its development server. When a new, previously undiscovered import is added to the codebase, Vite automatically pre-bundles it and reloads the browser to ensure a consistent state. This process is generally seamless for client-side code, as Vite's client script handles the `full-reload` Hot Module Replacement (HMR) event gracefully.
+
+However, this standard recovery model does not apply to code executing on the server, specifically within our framework's `worker` environment. The "client" in this context is not a browser, but the Cloudflare `CustomModuleRunner` executing inside Miniflare. This server-side runner does not have the same built-in recovery logic as a browser.
+
+When a re-optimization is triggered by a module used during a server-render (e.g., inside an SSR'd component or a server action), the system enters an unstable state. Without a robust recovery mechanism designed for this server-side context, this can lead to crashes, hangs, and a frustrating developer experience. The challenge, therefore, is to create a recovery system that makes re-optimization events as seamless for our server environment as they are for a browser.
+
+## The Solution: A Server-Side Recovery System
+
+To solve this, the framework implements a multi-layered system that creates a robust recovery process for the server environment. Building this system required overcoming several technical hurdles that arise from our use of two interconnected Vite environments (`worker` and `ssr`).
+
+### Hurdle 1: Stale Resolution from Cached Module Nodes
+
+Vite's module graph caches a representation of every processed module in a `ModuleNode` object. When a re-optimization occurs, Vite's standard invalidation process sets a flag on these nodes but does not fully remove them, leaving behind a "ghost node". This ghost node retains some old information, including the module's previously resolved ID (e.g., a path with an old version hash).
+
+This creates a problem for our SSR Bridge. When the bridge requests a module by its clean, un-hashed name, Vite's resolver can find this ghost node and, as a shortcut, re-use its stale, version-hashed ID instead of performing a fresh resolution. This leads to a request for an outdated dependency.
+
+**Solution:** The `ssrBridgePlugin` employs **Proactive Hash Resolution**. It avoids this faulty lookup by not relying on Vite's internal resolver for virtual modules. Instead, it proactively determines the correct, up-to-date version hash for any optimized dependency by looking directly at the SSR optimizer's metadata.
+
+### Hurdle 2: Desynchronized Environment Caches
+
+The `worker` and `ssr` environments are isolated; by default, an HMR event in one does not affect the other. This architectural separation becomes a problem during re-optimization. If the `ssr` environment re-optimizes and resets its state, the `worker` environment remains unaware, leaving its own caches (both Vite's module graph and the Cloudflare runner's execution cache) in a stale and inconsistent state.
+
+**Solution:** The `ssrBridgePlugin` is responsible for **Cross-Environment HMR Propagation**. It bridges this gap by intercepting `full-reload` events from the SSR environment's HMR channel and forwarding them to the worker's channel. This ensures that when the `ssr` environment resets, the `worker` environment is also instructed to invalidate its caches in lockstep.
+
+### Hurdle 3: Race Conditions on Re-import
+
+The `CustomModuleRunner` is designed to re-import its entry points immediately after receiving a `full-reload` event. This happens too quickly, hitting the Vite server before it has finished stabilizing, which re-triggers a "stale pre-bundle" error. This necessitates a final safeguard that can gracefully handle this predictable race condition.
+
+### The Debounced Redirect-and-Retry Mechanism
+
+The solution is a final safeguard in the form of an error-handling middleware (`staleDepRetryPlugin`) that performs a **Debounced Retry**. When it catches the predictable "stale pre-bundle" error, it does not immediately retry. Instead, it waits for the server to become "stable" by monitoring Vite's `transform` hook for a period of inactivity.
+
+Once the server is stable, it performs two actions:
+1.  **Triggers a client-side reload:** A `full-reload` HMR message is sent to the browser.
+2.  **Redirects the failed request:** It responds to the original request with a `307 Temporary Redirect`.
+
+This redirect was chosen over a transparent, server-side retry for two key reasons:
+1. **Technical Feasibility:** A transparent retry for requests with bodies (e.g., `POST` for server actions) is not possible without buffering the request body in advance, an approach that was rejected for performance and dev/prod parity reasons.
+2. **Architectural Safety:** Transparently retrying `POST` requests is risky, as it could cause non-idempotent actions to execute twice.
+
+The `307` redirect forces the client to re-issue the request against a now-stable server. This makes it a simple and universal recovery mechanism that handles all types of requests consistently, whether the original request was for a full HTML document (for pages with or without client-side JS), a `fetch` request from a client-side interaction, or a non-browser request from within the worker itself. While this can result in a "no op" result for the first click for client-side interactions (if reoptimization needed to happen when the interaction happened), its robustness and simplicity make it the most pragmatic choice.
@@ -96,5 +96,7 @@ pnpm smoke-test --path=../starter
 To run the playground E2E tests:
 ```
 pnpm test:e2e
+
 ```
 This hybrid architecture provides a fast, reliable, and scalable foundation for the E2E test suite, allowing for both high-performance and high-isolation testing of RedwoodSDK's features.
+
@@ -11,6 +11,9 @@ This collection of documents provides a high-level overview of the core architec
 - [**The SSR Bridge**](./ssrBridge.md)
   Details the architecture that allows the framework to support two different rendering environments (RSC and traditional SSR) within a single Cloudflare Worker. It explains how the "SSR Bridge" uses Vite's Environments API to manage conflicting dependency requirements between the two runtimes.
 
+- [**Dev Server Stability**](./devServerStability.md)
+  Explains the multi-layered system that ensures a stable development experience, detailing how the framework handles race conditions and state desynchronization during Vite's dependency re-optimization process.
+
 - [**Directive Scanning and Module Resolution**](./directiveScanningAndResolution.md)
   Details the internal `esbuild`-based scanner used to discover `"use client"` and `"use server"` directives, and the context-aware module resolution it employs to handle conditional exports correctly.