Skip to content

[BUG] [SM] Error Revoking Default security group ingress rule in Perimeter-Phase1 #1284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Mobilise-PALZ opened this issue May 9, 2025 · 1 comment · May be fixed by #1285
Open

[BUG] [SM] Error Revoking Default security group ingress rule in Perimeter-Phase1 #1284

Mobilise-PALZ opened this issue May 9, 2025 · 1 comment · May be fixed by #1285
Assignees

Comments

@Mobilise-PALZ
Copy link

Mobilise-PALZ commented May 9, 2025

Bug reports which fail to provide the required information will be closed without action.

Required Basic Info

  • Accelerator Version: 1.6.3
  • Install Type: Upgrade
  • Upgrade from version: 1.6.2 -> 1.6.3
  • Which State did the Main State Machine Fail in: Deploy Phase 1

Describe the bug
Failure Info

  • What error messages have you identified, if any:
    Following normal upgrade steps to 1.6.3, we encountered no errors until the MainState Machine failed on step Deploy Phase 1. In the CodeBuild Logs it stated stackName: PerimeterPhase1 in our external communications account had failed to deploy with a message returned of InvalidPermission.NotFound: The specified rule does not exit in this security group.
    In the External Communications Account, in the cloudformation stack Perimeter-Phase1-VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResource we see the Error:

This Custom::VpcDefaultSecurityGroup1 resource is in a CREATE_FAILED state.
Received response status [FAILED] from custom resource. Message returned: InvalidPermission.NotFound: The specified rule does not exist in this security group.

We have found that The default security group in the External Communications account has different security group rules than the those the lambda is trying to remove. We have not touched the default security group rules in this account so are unsure why they are different.

The rules on the security group currently are Type: Custom TCP, Protocol: TCP, Port range: 0, source 0.0.0.0/0.
This is different from the rules the lambda looks like it is trying to remove:
`
"ipProtocol": "-1",
"fromPort": -1,
"toPort": -1,

`

  • What symptoms have you identified, if any:
    The Main state machine failed.

Required files

  • Please provide a copy of your config.json file (sanitize if required)
  • If a CodeBuild step failed- please provide the full CodeBuild Log

`Failed resources:

PALZ-Perimeter-Phase1-VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResource | 7:22:49 PM | CREATE_FAILED | Custom::VpcDefaultSecurityGroup1 | PerimeterPhase1/VpcStackPerimeter/Perimeter/VpcDefaultSecurityGroup/Resource1/Default (PerimeterVpcDefaultSecurityGroupResource1) Received response status [FAILED] from custom resource. Message returned: InvalidPermission.NotFound: The specified rule does not exist in this security group.
PALZ-Perimeter-Phase1 | 7:22:58 PM | UPDATE_FAILED | AWS::CloudFormation::Stack | PerimeterPhase1/VpcStackPerimeter.NestedStack/VpcStackPerimeter.NestedStackResource (VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResourceXXX) Embedded stack arn:aws:cloudformation:eu-west-2:XXXX:stack/PALZ-Perimeter-Phase1-VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResource-XXX/XXX was not successfully updated. Currently in UPDATE_ROLLBACK_IN_PROGRESS with reason: The following resource(s) failed to create: [PerimeterVpcDefaultSecurityGroupResourceXXX].
{"stackName":"PerimeterPhase1 (PALZ-Perimeter-Phase1)","stackEnvironment":{"account":"XXX","region":"eu-west-2","name":"aws://XXX/eu-west-2"},"assumeRoleArn":"arn:aws:iam::XXX:role/PALZ-PipelineRole","message":"Failed to deploy: Error: The stack named PALZ-Perimeter-Phase1 failed to deploy: UPDATE_ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: InvalidPermission.NotFound: The specified rule does not exist in this security group., Embedded stack arn:aws:cloudformation:eu-west-2:XXX:stack/PALZ-Perimeter-Phase1-VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResource-XX/XX was not successfully updated. Currently in UPDATE_ROLLBACK_IN_PROGRESS with reason: The following resource(s) failed to create: [PerimeterVpcDefaultSecurityGroupResourceXXX]. ","messageType":"ERROR"}`

  • If a Lambda step failed - please provide the full Lambda CloudWatch Log
    In external communications account the Lambda PALZ-Perimeter-Phase1-Vpc-CustomVpcDefaultSecurity in Cloudtrail shows:

`
"eventTime": "2025-05-08T19:25:16Z",

24 | "eventSource": "ec2.amazonaws.com",
25 | "eventName": "RevokeSecurityGroupIngress",
26 | "awsRegion": "eu-west-2",
27 | "sourceIPAddress": "XXXXXXXX",
28 | "userAgent": "aws-sdk-nodejs/2.1473.0 linux/v22.14.0 exec-env/AWS_Lambda_nodejs22.x promise",
29 | "errorCode": "Client.InvalidPermission.NotFound",
30 | "errorMessage": "The specified rule does not exist in this security group.",
31 | "requestParameters": {
32 | "groupId": "sg-XXXXXXX",
33 | "ipPermissions": {
34 | "items": [
35 | {
36 | "ipProtocol": "-1",
37 | "fromPort": -1,
38 | "toPort": -1,
39 | "groups": {
40 | "items": [
41 | {
42 | "groupId": "sg-XXXXXXXX"
43 | }
44 | ]
45 | },
46 | "ipRanges": {},
47 | "ipv6Ranges": {},
48 | "prefixListIds": {}
49 | }
50 | ]
51 | }
52 | },

`

  • In many cases it would be helpful if you went into the failed sub-account and region, CloudFormation, and provided a screenshot of the Events section of the failed, deleted, or rolled back stack including the last successful item, including the first couple of error messages (bottom up)
    In the cloudformation stack after the error message presented above, an UPDATE_ROLLBACK_IN_PROGRESS occurred with [PerimeterVpcDefaultSecurityGroupResourceXXX] and other resources created prior (such as the Role and Role Policy CustomVpcDefaultSecurityGroup1RoleDefaultPolicy) were deleted and cleaned up also.

Steps To Reproduce

  1. Following normal upgrade behaviour (using the Github template) https://aws-samples.github.io/aws-secure-environment-accelerator/v1.5.6-a/installation/upgrades/#13-summary-of-upgrade-steps-all-versions-except-v150
  2. After updating template and releasing ASEA-InstallerPipeline, the Main state machine automatically ran and errored on step Phase 1

Expected behaviour
We expected to have a successful state machine execution to 1.6.3. No errors had presented itself until the state machine execution.

Additional context
The default security group in the External Communications account has different security group rules than the those the lambda is trying to remove. We have not touched the default security group rules in this account.

The rules on the security group currently are Type: Custom TCP, Protocol: TCP, Port range: 0, source 0.0.0.0/0. This is different from the rules the lambda looks like it is trying to remove.

We have currently rolled back to 1.6.2 which has been successful.

@oliviergaumond
Copy link
Contributor

This Custom Resource Lambda attempts to remove the default rules from the default security group to align with security best practices. The linked pull request updates the logic to avoid failing if the rules don't exist.

As a workaround you can:

  1. If you are not using the default security group (as recommended) you can manually remove all rules from the group and re-run the state machine
  2. If you are using the default security group and customized its rules, you can re-create the default security rules and re-run the state machine to let ASEA remove those default rules. The other rules should be left untouched, but we recommend you record a copy of their configuration before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants