-
Notifications
You must be signed in to change notification settings - Fork 67
Description
There exists a possibility that during the Collector CI multiple EKS test cases will run against a single cluster. This could cause over utilization of node resources which will cause pods not to be scheduled. We have currently only seen issues with hitting caps due to CPU requests. Current resource quotas for deployments can be found by searching for limits = in the terraform directory. Example here. Currently, in most cases there is no request quota set but CPU limits set at .2.
EKS Clusters should be setup in a way that does not restrict how many tests that can be run in parallel. We should also not have to continually tweak requests/limits based on how many test cases may be running in parallel. To better accommodate this we could set up node a node autoscaler that can handle the increased test load on the clusters.
A temporary solution would also be to increase the minimum amount of nodes in the managed node group. This comes with a tradeoff in cost and should not be considered a long term solution.