Skip to content

fix(stepfunctions): lambda invoke grant all versions #34330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Y-JayKim
Copy link
Contributor

@Y-JayKim Y-JayKim commented May 1, 2025

Issue # (if applicable)

Closes #17515 .

Reason for this change

AWS CDK-generated Step Function roles break in-flight Step Function executions when using versioned Lambda functions. During deployment, the Step Function’s IAM role is updated to include permissions for the new Lambda version but removes permissions for the previous version. This causes lambda:InvokeFunction permission failures in in-flight executions that were started before the deployment and are still trying to invoke the previous Lambda version.

This issue is particularly problematic when using Step Function Aliases with deployment preferences for traffic shaping, as a percentage of new executions are directed to the previous version of the state machine, which attempts to invoke a Lambda version it no longer has permissions for.

Description of changes

Implemented a feature flag STEPFUNCTIONS_TASKS_LAMBDA_INVOKE_GRANT_ALL_VERSIONS to control IAM permissions granted when using Lambda versions with Step Functions:

Added a new feature flag in cx-api/lib/features.ts with detailed documentation
Modified LambdaInvoke task implementation to check for this flag:
When enabled: grants permissions to both the specific Lambda version AND all versions using a wildcard pattern (function-arn:*)
When disabled (default behavior): maintains current behavior of granting permission only to the specific version
Updated API documentation to clearly explain the feature flag usage
Updated the README.md to include examples showing how to enable the feature flag
This approach maintains backward compatibility while giving users an opt-in solution to prevent in-flight executions from failing during deployments.

Describe any new or updated permissions being added

When the feature flag is enabled, the Step Function's IAM role will now include an additional IAM permission that grants access to all versions of the Lambda function using a wildcard pattern, e.g.:

  • Before: "Resource": ["arn:aws:lambda:region:account:function:name:version"]
  • After: "Resource": ["arn:aws:lambda:region:account:function:name:version", "arn:aws:lambda:region:account:function:name:*"]

Description of how you validated changes

  • Added comprehensive unit tests that verify both behaviors (with feature flag enabled and disabled)
  • Updated integration tests to demonstrate both scenarios with and without the feature flag
  • Created test suites to verify behavior with both versioned Lambda functions and non-versioned Lambda functions

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team May 1, 2025 17:43
@github-actions github-actions bot added bug This issue is a bug. effort/small Small work item – less than a day of effort p2 labels May 1, 2025
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label May 1, 2025
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter fails with the following errors:

❌ Fixes must contain a change to an integration test file and the resulting snapshot.

If you believe this pull request should receive an exemption, please comment and provide a justification. A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed, add Clarification Request to a comment.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 2476873
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@shikha372 shikha372 self-assigned this May 5, 2025

CDK's `LambdaInvoke` construct currently generates IAM permissions only for the specific Lambda version referenced, but these permissions don't persist across deployments when new versions are created.

## Proposed Solution: Configurable Version Permission Behavior
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this proposed solution, but i'm a bit confused cause i expected it to be implemented in this CR, what's the reason of not implementing it?

Copy link
Contributor

@shikha372 shikha372 May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had team discussion regarding this and decided its better to do more permissible permissions(if no concerns with security team)than adding a new prop, as it can be confusing for end customers and adds to maintenance burden.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file will be committed to the repo? when is it needed to create files like this? what's the purpose of it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is meant for repo, should be removed from this commit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this file, but shouldn't we have the same issue in the ECS here

@paulhcsun
Copy link
Contributor

Closing as a duplicate of #34398

@paulhcsun paulhcsun closed this May 12, 2025
Copy link
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug. contribution/core This is a PR that came from AWS. effort/small Small work item – less than a day of effort p2
Projects
None yet
5 participants