Skip to content

[cdk_infra] Add node auto scaling to EKS clusters #1059

@bryan-aguilar

Description

@bryan-aguilar

There exists a possibility that during the Collector CI multiple EKS test cases will run against a single cluster. This could cause over utilization of node resources which will cause pods not to be scheduled. We have currently only seen issues with hitting caps due to CPU requests. Current resource quotas for deployments can be found by searching for limits = in the terraform directory. Example here. Currently, in most cases there is no request quota set but CPU limits set at .2.

EKS Clusters should be setup in a way that does not restrict how many tests that can be run in parallel. We should also not have to continually tweak requests/limits based on how many test cases may be running in parallel. To better accommodate this we could set up node a node autoscaler that can handle the increased test load on the clusters.

A temporary solution would also be to increase the minimum amount of nodes in the managed node group. This comes with a tradeoff in cost and should not be considered a long term solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EKSEKS related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions