run a step inside a go routine #109

af-md · 2025-09-04T19:48:34Z

closes #90

Summary

Adds support for running steps inside goroutines with deterministic step ID generation.

This is by no means the final solution, it's a PR to get feedback.

Problem

Currently, running steps inside goroutines causes non-deterministic step ID generation due to race conditions:

Solution

Pre-generate step IDs before launching goroutines:

Added Go() function that pre-generates step IDs and runs steps in goroutines
Added WithNextStepID option to pass pre-generated IDs to RunAsStep
Modified RunAsStep to use pre-generated IDs when provided

    result, err := dbos.Go(ctx, func(ctx context.Context) (string, error) {
        return processItem(ctx, item)
    })

Open Questions

Does this satisfy the UX you had in mind @maxdml ? I am a bit unsure as to what you had in mind but gave it a go anyway.
Might be worth giving me fuller example of how you think the UX would work inside the workflow

ToDos

I still need to add some docs
Still need to write tests for it as well

maxdml · 2025-09-05T17:43:42Z

@af-md thanks for the PR!

The approach has the right fundamentals, i.e., dbos.Go generates a step ID and calls RunAsStep in a goroutine.

I think we can make this simpler by simply having a package level Go() that calls the package level RunAsStep inside a goroutine. That way we don't have to repeat all the logic from the package level RunAsStep into Go.

With respect to what dbos.Go returns to the user: it should be non blocking and return to the caller a channel (typed with the output type of the step). That way I can write:

chans = make([]chan int, 3)
for := range 10 {
    resChan1, err := dbos.Go(ctx, StepFnClosure)
    // handle err
}

// Read from each channel here

The res channel can hold types similar to the workflow outcome chan:

type stepOutcome[R any] struct {
	result R
	err    error
}

af-md · 2025-09-07T08:41:34Z

@maxdml does this feature have any conflict with what @apoliakov said about pre generating stepIDs: https://discord.com/channels/1156433345631232100/1166779411920597002/1413954852618244267

It makes sense to run steps inside Go routines - as they tend to be better performant compared to standard execution - however the users should be advised to write their code to wait for a step to complete (committed into DB) and then move onto the next step? probably that's what you were thinking of anyway...

apoliakov · 2025-09-07T13:10:23Z

@maxdml does this feature have any conflict with what @apoliakov said about pre generating stepIDs: https://discord.com/channels/1156433345631232100/1166779411920597002/1413954852618244267

It makes sense to run steps inside Go routines - as they tend to be better performant compared to standard execution - however the users should be told to write their code to wait for a step to complete (committed into DB) and then move onto the next step? probably that's what you were thinking of anyway...

Ah... what I said was a comment on how Python works. Here we may have an opportunity to make it act differently. But Max or Peter will need to opine on that

maxdml · 2025-09-08T17:02:07Z

@maxdml @apoliakov there is a small misunderstanding here. The problem we are solving with this PR is the non deterministic generation of stepIDs, resulting from the execution of steps in goroutines.

What this PR will do is to serialize the generation of step IDs from within the workflow. That way, step IDs will be generated deterministically, before the RunAsStep code executes, and regardless of the order in which they complete, they'll always have the same stepID.

maxdml

The low level implementation looks good, see my comments for the test.

I am realizing we need to change the API to support mocking. Specifically we should have a mirror Go method on the DBOSContext interface, that would be typeless (returns (stepOutcome[any], error)). The reason we've been doing this for all DBOS methods is to allow the mocking of DBOSContext in users' tests.

The package-level Go would, like the package-level RunAsStep does with its interface counterpart, call the interface Go with a typed-erased function and set the stepName in the options.

The interface level Go will do the step introspection, increment the stepID, then call the interface level RunAsStep and return a typeless (stepOutcome[any]) channel which we can pipe to the generic one (see an example in RunWorkflow.)

dbos/workflow.go

maxdml · 2025-09-12T17:44:43Z

dbos/workflows_test.go

+		// Test step IDs are deterministic and in the order of execution
+		steps, err := GetWorkflowSteps(dbosCtx, handle.GetWorkflowID())
+		require.NoError(t, err, "failed to get workflow steps")
+		require.Len(t, steps, numSteps, "expected %d steps, got %d", numSteps, len(steps))
+		for i := 0; i < numSteps; i++ {
+			assert.Equal(t, i, steps[i].StepID, "expected step ID to be %d, got %d", i, steps[i].StepID)
+		}


This does not test what you think it does: GetWorkflowSteps returns all the workflow steps sorted by ascending step ID, so you're testing the SQL, not the step ID attribution.

The way to test this would be to have each step take their ID as input and return their ID. Then, you can iterate over the channels and make sure that the iterator number == the step result from the channel.

Channels should be ordered by stepID

If the correct ID was attributed to the step, the step return value will be equal to the channel iterator

In a second part, we should also exercise the recovery part, either by running recoverPendingWorkflow, or simply by executing the workflow again with the same workflowID. If there was a non determinism step attribution, DBOS would throw an error during the second RunAsStep -- which it shouldn't.

For this to happen, however, we must ensure the workflow stays PENDING and does not return in the first, run, which we can achieve with an event (see this example). (If we don't do that, re-running the workflow will just get the workflow outcome, rather than going through the steps again.

maxdml · 2025-11-06T20:06:52Z

dbos/workflow.go

+// Go runs a step inside a Go routine and returns a channel to receive the result.
+// Go generates a deterministic step ID for the step before running the step in a routine, since routines are not deterministic.
+// The step ID is used to track the steps within the same workflow and use the step ID to perform recovery.
+// The folliwing examples shows how to use Go:


Suggested change

// The folliwing examples shows how to use Go:

// The following example shows how to use Go:

maxdml · 2025-11-06T20:14:34Z

dbos/workflow.go

+
+func (c *dbosContext) Go(ctx DBOSContext, fn StepFunc, opts ...StepOption) (chan StepOutcome[any], error) {
+	// create a determistic step ID
+	stepName := runtime.FuncForPC(reflect.ValueOf(fn).Pointer()).Name()


This is not the step name we want to display: it is the name of the typed-erase step. We can retrieve the user-provided function's name by inspecting the options

// Process functional options stepOpts := &stepOptions{} for _, opt := range opts { opt(stepOpts) } name := stepOpts.stepName

maxdml · 2025-11-06T20:17:58Z

dbos/workflow.go

+	// Step function could return a nil result
+	if result == nil {
+		return *new(chan StepOutcome[R]), err
+	}


we need to close the result chan. Let's do it after reading the outcome from it.

maxdml · 2025-11-06T20:21:44Z

dbos/workflow.go

+	// Otherwise type-check and cast the result
+	typedResult, ok := outcome.result.(R)
+	if !ok {
+		return *new(chan StepOutcome[R]), fmt.Errorf("unexpected result type: expected %T, got %T", *new(R), result)
+	}
+	outcomeChan <- StepOutcome[R]{
+		result: typedResult,
+		err:    nil,
+	}


We're going to have to modify this after #175 is merged. Specifically, it is possible that the step output was returned from the database, encoded, in which case we'll need to decode it. We can mimick the code in RunAsStep that does the same.

also in case of error just return outcomeChan, not a nil channel (which reading from blocks forever). That's a bit of a footgun for the user.

maxdml · 2025-11-06T20:30:39Z

dbos/workflows_test.go

+			for i, resultChan := range resultChans {
+				result1 := <-resultChan
+				if result1.err != nil {
+					errors <- result1.result.Error


Why are we piping result1.result.Error? Is it every set to anything? If not let's just remove it from the result struct to clarify the code

maxdml · 2025-11-06T20:31:36Z

dbos/workflows_test.go

+		close(results)
+		close(errors)


nit: defer the closing after creation

maxdml · 2025-11-06T20:38:17Z

dbos/workflows_test.go

+	stepDeterminismStartEvent.Set()
+	fmt.Println("stepThatBlocks: started to block")
+	stepDeterminismEvent.Wait()
+	fmt.Println("stepThatBlocks: unblocked")


remove print statements pls

maxdml · 2025-11-06T20:39:43Z

dbos/workflows_test.go

+		// Run the second workflow
+		handle2, err := RunWorkflow(dbosCtx, goWorkflow, "test-input", WithWorkflowID(handle.GetWorkflowID()))
+
+		// If it throws an error, it's because of steps not being deterministically executed when using Go routines in the first workflow


The comment is not exactly accurate: determinism errors would come from handle2.GetResult()

…implify result handling

…uce stepWithSleep function

…cution in tests

…und in context

…step ID generation

…es with result channel

…custom output types

…y removing unnecessary line

…nism checks in Go workflows

…e related function signatures

af-md requested a review from maxdml September 4, 2025 20:02

af-md changed the title ~~Run as step inside go routines~~ run a step inside a go routine Sep 7, 2025

af-md marked this pull request as ready for review September 10, 2025 08:05

af-md self-assigned this Sep 10, 2025

maxdml reviewed Sep 12, 2025

View reviewed changes

af-md force-pushed the runAsStep-inside-Go-routines branch from 5ce5275 to f01ca09 Compare October 23, 2025 11:37

maxdml reviewed Nov 6, 2025

View reviewed changes

af-md added 18 commits November 6, 2025 15:31

feat: add dbos.Go to run steps inside Go routine

f716ae9

run step and get result using channel

a89b214

add comments

20a2beb

append stepID to options

ba8d253

assign stepID

6d3a84a

remove extra stepID argument

723a6db

refactor: change Go function to return a channel of stepOutcome and s…

ac7a1b1

…implify result handling

refactor: remove Go function from DBOSContext interface

154ea28

test: add tests for Go function execution within workflows and introd…

09a949f

…uce stepWithSleep function

fix: ensure results and errors channels are closed after workflow exe…

3b21184

…cution in tests

fix: include step name in error message when workflow state is not fo…

81e86be

…und in context

docs: enhance documentation for Go function, detailing its usage and …

df31e88

…step ID generation

test: add validation for deterministic step IDs in Go workflow execution

7d0a4a1

feat: add Go function to DBOSContext for executing steps in Go routin…

c6f4f1d

…es with result channel

refactor: improve error handling in Go function and update tests for …

346bdc0

…custom output types

cleanup: remove commented TODO in Go function and tidy up test code b…

573a17b

…y removing unnecessary line

refactor: reorganize test code for step execution and enhance determi…

d53546f

…nism checks in Go workflows

refactor: rename StepOptions to stepOptions for consistency and updat…

27e3e9b

…e related function signatures

maxdml force-pushed the runAsStep-inside-Go-routines branch from ebeaa35 to 27e3e9b Compare November 6, 2025 23:31

	// The folliwing examples shows how to use Go:
	// The following example shows how to use Go:

run a step inside a go routine #109

Are you sure you want to change the base?

run a step inside a go routine #109

Uh oh!

Conversation

af-md commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Open Questions

ToDos

Uh oh!

maxdml commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

af-md commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apoliakov commented Sep 7, 2025

Uh oh!

maxdml commented Sep 8, 2025

Uh oh!

maxdml left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

af-md commented Sep 4, 2025 •

edited

Loading

maxdml commented Sep 5, 2025 •

edited

Loading

af-md commented Sep 7, 2025 •

edited

Loading