Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,27 @@ resources:
description: my awesome k8s cluster
network: <FIXME:network_name>
subnetwork: <FIXME:subnet_name>
nodeConfig:
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
nodePools:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No initialClusterVersion value like in other templates

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was because cluster version changes frequently and as an example, this will just default to the default version. the other could probably be changed though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be beneficial to have some consistency among the examples. If a user starts with the gke_regional.yaml example file, and does not know they need to add initialClusterVersion themselves (since it's not in the example for gke_regional.yaml), they will end up with a broken default version from gke.py.schema and the following error message:

Waiting for create [operation-1542046313989-57a7ba2c5c389-96c2ff4c-4b16d716]...failed.                                                                       
ERROR: (gcloud.deployment-manager.deployments.create) Error in Operation [operation-1542046313989-57a7ba2c5c389-96c2ff4c-4b16d716]: errors:
- code: RESOURCE_ERROR
  location: /deployments/my-gke-regional/resources/myk8sregional
  message: '{"ResourceType":"gcp-types/container-v1beta1:projects.locations.clusters","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"Master
    version \"1.9.7-gke.6\" is unsupported.","status":"INVALID_ARGUMENT","statusMessage":"Bad
    Request","requestPath":"https://container.googleapis.com/v1beta1/projects/cft-test-220920/locations/us-east1/clusters","httpMethod":"POST"}}'

They will then have to figure out themselves which versions are working at the moment and add those to the example themselves, or they will need to go ahead and start modifying the default in gke.py.schema, which is the less elegant but more intuitive thing to do. Both of those seem not ideal.

Additionally, I'm curious why are we specifying initialClusterVersion in gke.py as a required parameter (and not an optional one) to begin with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I'm curious why are we specifying initialClusterVersion in gke.py as a required parameter (and not an optional one) to begin with?

We shouldn't make initialClusterVersion a required parameter... I see this in the .py file

'properties':
            {
                'cluster':
                    {
                        'name':
                            name + '-cluster',
                        'initialNodeCount':
                            propc.get('initialNodeCount'),
                        'initialClusterVersion':
                            propc.get('initialClusterVersion')
                    }
            }

Completely missed it first pass - but this essentially makes this field required.

- name: uniq1
initialNodeCount: 1
config:
localSsdCount: 1
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
taints:
- key: mykey1
value: value1
effect: NO_SCHEDULE
Copy link

@erikkugel erikkugel Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying NO_SCHEDULE for a minNodeCount: 1 and maxNodeCount: 3 with the specified node type results in un-scheduled system pods for k8s. To address this without increasing autoscaling definitions, use PREFER_NO_SCHEDULE instead of NO_SCHEDULE

https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/NodeConfig#NodeTaint

autoscaling:
enabled: True
minNodeCount: 1
maxNodeCount: 3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration results in unschedulable pods for K8s management pods:

Problem
Your cluster has 7 unschedulable pods

Details
Unschedulable pods
event-exporter-v0.1.9-5c8fb98cdb-8ggrr
heapster-v1.5.2-6d67555d76-sqhn9
kube-dns-autoscaler-69c5cbdcdd-pbzfh
kube-dns-db496df86-j9k9b
kubernetes-dashboard-7dbf5b9c66-6zmvw
and other 2 unschedulable pods.

Possible Actions
Increase maximum size limit for autoscaling in one or more node pools that have autoscaling enabled.

management:
autoUpgrade: True
autoRepair: True
locations:
- us-east1-c
- us-east1-b
Original file line number Diff line number Diff line change
Expand Up @@ -15,45 +15,70 @@ imports:
name: gke.py

resources:
- name: myk8sregional
- name: myk8regprivate
type: gke.py
properties:
clusterLocationType: Regional
region: us-east1
cluster:
name: myk8sregional
description: my awesome k8s cluster
name: k8v1private
description: k8regional 2 pool
network: <FIXME:network_name>
subnetwork: <FIXME:subnet_name>
intialNodeCount: 1
initialClusterVersion: 1.10.6-gke.3
nodeConfig:
localSsdCount: 1
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
taints:
- key: mykey1
value: value1
effect: NO_SCHEDULE
- key: mykey2
value: value2
effect: NO_EXECUTE
subnetwork: <FIXME:subnet_name>
initialClusterVersion: 1.10.6-gke.6
nodePools:
- name: uniq1
initialNodeCount: 1
config:
localSsdCount: 1
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
taints:
- key: mykey1
value: value1
effect: NO_SCHEDULE
- key: mykey2
value: value2
effect: NO_EXECUTE
autoscaling:
enabled: True
minNodeCount: 1
maxNodeCount: 3
management:
autoUpgrade: True
autoRepair: True
- name: uniq2
initialNodeCount: 1
config:
localSsdCount: 1
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
taints:
- key: mykey3
value: value3
effect: NO_SCHEDULE
- key: mykey4
value: value4
effect: NO_EXECUTE
autoscaling:
enabled: True
minNodeCount: 1
maxNodeCount: 2
management:
autoUpgrade: True
autoRepair: True
locations:
- us-east1-c
- us-east1-b
autoScaling:
enabled: True
minNodeCount: 1
maxNodeCount: 2
management:
autoUpgrade: True
autoRepair: True
masterAuth:
username: <FIXME:user_name>
password: <FIXME:password>
username: <FIXME:username>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really be using basic auth here? It's considered a security weakness: https://cloud.google.com/kubernetes-engine/docs/concepts/security-overview

Also, trailing backspace might cause confusion in YAML

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was provided as an example and in no means best practice. Also, basic auth is still the default behavior when creating via Gui.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the GUI cautions against using basic auth with the following message "Basic authentication is not recommended because it provides no confidentiality protection for transmitted credentials".

Starting with version 1.12, clusters will have basic authentication and client certificate issuance disabled by default, even via GUI.

password: <FIXME:password>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really be using basic auth here? It's considered a security weakness: https://cloud.google.com/kubernetes-engine/docs/concepts/security-overview

Also, trailing backspace might cause confusion in YAML

loggingService: logging.googleapis.com
monitoringService: monitoring.googleapis.com
privateCluster: True
Expand Down
23 changes: 17 additions & 6 deletions community/cloud-foundation/templates/gke/examples/gke_zonal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,20 @@ resources:
description: my awesome k8s cluster
network: <FIXME:network_name>
subnetwork: <FIXME:subnet_name>
nodeConfig:
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
nodePools:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Malformed YAML, extra-spaces. nodePools should align with subnetwork

- name: uniq1
initialNodeCount: 1
config:
localSsdCount: 1
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
autoscaling:
Copy link

@erikkugel erikkugel Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no 'taints' specified for the zonal cluster, but we specify them for global and global private. We should consider being consistent.

enabled: True
minNodeCount: 1
maxNodeCount: 2
management:
autoUpgrade: True
autoRepair: True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best practice to add a line at EOF

10 changes: 5 additions & 5 deletions community/cloud-foundation/templates/gke/gke.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,14 @@ def generate_config(context):
{
'name':
name + '-cluster',
'initialNodeCount':
propc.get('initialNodeCount'),
'initialClusterVersion':
propc.get('initialClusterVersion')
}
}
}

if cluster_type == 'Regional':
#TODO: container-v1 was released and this will need to be updated
provider = 'gcp-types/container-v1beta1:projects.locations.clusters'
if not properties.get('region'):
raise KeyError(
Expand Down Expand Up @@ -69,7 +68,7 @@ def generate_config(context):

optional_props = [
'description',
'nodeConfig',
'nodePools',
'masterAuth',
'loggingService',
'monitoringService',
Expand Down Expand Up @@ -128,8 +127,9 @@ def generate_config(context):
output_obj['value'] = '$(ref.' + name + \
'.masterAuth.' + outprop + ')'
elif outprop == 'instanceGroupUrls':
output_obj['value'] = '$(ref.' + name + \
'.nodePools[0].' + outprop + ')'
for index, _ in enumerate(propc['nodePools']):
output_obj['value'] = '$(ref.' + name + \
'.nodePools[' + str(index) + '].' + outprop + ')'
else:
output_obj['value'] = '$(ref.' + name + '.' + outprop + ')'

Expand Down
Loading