From a7d045f711ef273a41ec7ea6ebec2fdbe2e4db41 Mon Sep 17 00:00:00 2001 From: Cory O'Daniel Date: Tue, 20 Aug 2024 10:47:32 -0700 Subject: [PATCH 1/4] Updating operator guide w/ design decisions and runbook --- operator.md | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 operator.md diff --git a/operator.md b/operator.md new file mode 100644 index 0000000..122f58f --- /dev/null +++ b/operator.md @@ -0,0 +1,137 @@ +### Azure AKS (Azure Kubernetes Service) + +Azure AKS (Azure Kubernetes Service) is a managed Kubernetes service that makes it easy to deploy, manage, and scale containerized applications using Kubernetes. AKS takes care of the heavy lifting of cluster management and provides features to enhance operational efficiency. + +### Design Decisions + +1. **Azure Policy**: Enabling Azure Policy ensures compliance and governance across the AKS resources. +2. **RBAC and AAD Integration**: Role-Based Access Control (RBAC) combined with Azure Active Directory (AAD) integration ensures secure access and management. +3. **Automatic Upgrade**: Clusters are set to stable automatic channel upgrade to ensure they are up-to-date with the latest stable features. +4. **Log Analytics**: Integration with Azure Log Analytics to provide monitoring and logging capabilities. +5. **Auto-scaling**: Both default and additional node pools are configured with auto-scaling capabilities to manage workloads efficiently. +6. **Networking**: The networking profile uses the "azure" network plugin and policy to integrate seamlessly with Azure's ecosystem. + +### Runbook + +#### Unable to Connect to AKS + +If you encounter connectivity issues with your AKS cluster: + +Use `kubectl` to check the cluster nodes' status. + +```sh +kubectl get nodes +``` + +Ensure the nodes are all in a "Ready" state. + +```sh +kubectl describe node +``` + +Check for events or descriptions that might indicate issues. + +#### Pod Not Starting + +If a pod is stuck in `Pending` or `CrashLoopBackOff`: + +Describe the pod to check for detailed error messages. + +```sh +kubectl describe pod +``` + +Look into the events section for reasons like insufficient resources or failed image pulls. + +Check the logs of the pod to understand why it is crashing. + +```sh +kubectl logs +``` + +#### DNS Resolution Not Working + +If the applications inside the cluster have DNS issues: + +Check if the CoreDNS pod is running correctly. + +```sh +kubectl get pods -n kube-system -l k8s-app=kube-dns +``` + +If CoreDNS is running, describe the CoreDNS pod to check for issues. + +```sh +kubectl describe pod -n kube-system +``` + +#### AKS Cluster Scaling Issues + +If your cluster seems not to scale pods correctly: + +Check the cluster autoscaler logs. + +```sh +kubectl -n kube-system logs -l component=cluster-autoscaler +``` + +Identify if there are any errors or warnings that prevent scaling events from being processed. + +#### Permissions and Roles Issue + +If users are having trouble accessing resources: + +List current role bindings. + +```sh +kubectl get rolebinding --all-namespaces +``` + +Describe a particular role binding to verify its settings. + +```sh +kubectl describe rolebinding -n +``` + +#### Checking Azure AD Integration + +If there are authentication issues via Azure AD: + +Verify the AKS cluster's AAD integration status. + +```sh +az aks show --resource-group --name --query "enableAzureRBAC" +``` + +Ensure it returns `true` if you have RBAC configured. + +#### Checking Cluster Metrics & Logs + +To check metrics and logs if Azure Monitor is configured: + +Use Azure CLI to query logs in Log Analytics workspace. + +```sh +az monitor log-analytics query -w --analytics-query "KubePodInventory | summarize count() by ClusterId, Computer" +``` + +This returns a summary of pod counts by cluster and node. + +#### Certificate Issues + +For issues with cert-manager in obtaining certificates: + +Describe the certificate request: + +```sh +kubectl describe certificaterequest +``` + +Check If the challenge is failing: + +```sh +kubectl describe challenge +``` + +This should provide details on why the validation is failing, such as DNS issues or misconfiguration. + From db5ebf3f1fef409724110309aad0f83778e67de2 Mon Sep 17 00:00:00 2001 From: Cory O'Daniel Date: Tue, 20 Aug 2024 17:42:35 -0700 Subject: [PATCH 2/4] removing draft --- operator.mdx | 103 --------------------------------------------------- 1 file changed, 103 deletions(-) delete mode 100644 operator.mdx diff --git a/operator.mdx b/operator.mdx deleted file mode 100644 index 66613fd..0000000 --- a/operator.mdx +++ /dev/null @@ -1,103 +0,0 @@ -# azure-aks-cluster - -Azure Kubernetes Service (AKS) is a simple means of creating a managed Kubernetes cluster in Microsoft Azure. Rather than manually configuring the cluster, you allow Azure to manage the Kubernetes masters and handle most of the mundane but critical tasks including health monitoring and maintenance, leaving you responsible only for the agent nodes. AKS enables rapid development and deployment of cloud-native apps with less management effort and the added protection of interoperability with Microsoft Azure security. - -## Use Cases -When you deploy an AKS cluster, the Kubernetes control plane and all nodes are deployed and configured for you. Azure handles management of the masters, allowing you to focus on the agent nodes. -### Web applications -Serve your web application out of Kubernetes, and leverage the high availability of running across availability zones and the ease of autoscaling your servers with web traffic. -### Microservices -Build large complex systems out of many small microservices, increasing your overall resiliency by isolating failure domains. -### Workflows -Gain the power of the open-source community by using services like Kubeflow and Argo Workflows for ETL (extract, transform, load) or machine-learning capabilities. -### Cloud agnostic -If your application can run on Kubernetes, you can run on any cluster, whether it's Amazon Elastic Kubernetes (EKS), Google Kubernetes Engine (GKE), AKS, or even your own on-premises cluster. - -## Configuration Presets -### Development -The development preset creates the default node group using a two-core burstable vCPU with 4 GB of memory. No additional node groups are created. Use this preset for development only. -### Production -The production preset creates a default node group using an autoscaling standard vCPU (starting with two cores) with 8 GB of memory. One additional node group is also created using another autoscaling standard vCPU (starting with two cores) with 8 GB of memory. This preset has sufficient performance for production environments. - -## Design -Our bundle includes the following design choices to help simplify your deployment: -### Cluster Autoscaling -The AKS Cluster Autoscaler is enabled by default to adjust automatically the number of nodes that run your workloads. The cluster autoscaler component can watch for pods in your cluster that cannot be scheduled because of resource constraints. When the autoscaler detects issues, it will increase the number of nodes in a node pool to meet the application demand. -### Azure CNI -With Azure Container Networking Interface (CNI), every pod gets an IP address from the subnet and can be accessed directly. These IP addresses must be unique across your network space and can be used to connect resources together. -### Ingress Controller -The Ingress is a Kubernetes resource that lets you configure an HTTP load balancer for applications running on Kubernetes, represented by one or more Services, which are abstractions to permit these applications to appear as network services. Such a load balancer is necessary to deliver those applications to clients outside of the Kubernetes cluster. -### DNS and SSL -If you choose to specify an Azure DNS Zone, external-dns and cert-manager will be automatically installed to manage your DNS records dynamically and to generate SSL certificates to ensure that all internet traffic is encrypted. - -## Best Practices -The bundle includes a number of best practices without needing any additional work on your part. It uses Azure CNI instead of Kubenet so that other resources can use node IPs. We have also enabled autoscaling for all node pools. For monitoring and collecting metrics, we have set up metrics-server and kube-state-metrics. - -## Security -To improve security, node groups are deployed into a private subnet. Also, an Azure service principal with minimal privileges is created for AKS to manage Azure DNS zones and Azure Container Registry. - -## Observability -Both metrics-server and kube-state-metrics are installed automatically to provide you with metrics. - -## Connecting -After you have deployed a Kubernetes cluster through Massdriver, you may want to interact with the cluster using the powerful [kubectl](https://kubernetes.io/docs/reference/kubectl/) command line tool. - -### Install Kubectl - -You will first need to install `kubectl` to interact with the kubernetes cluster. Installation instructions for Windows, Mac and Linux can be found [here](https://kubernetes.io/docs/tasks/tools/#kubectl). - -Note: While `kubectl` generally has forwards and backwards compatibility of core capabilities, it is best if your `kubectl` client version is matched with your kubernetes cluster version. This ensures the best stability and compability for your client. - - -The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a [`kubeconfig`](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) file. - -### Download the Kubeconfig File - -The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a [`kubeconfig`](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) file. The `kubernetes-cluster` artifact that is created when you make a kubernetes cluster in Massdriver contains the basic information needed to create a `kubeconfig` file. Because of this, Massdriver makes it very easy for you to download a `kubeconfig` file that will allow you to use `kubectl` to query and administer your cluster. - -To download a `kubeconfig` file for your cluster, navigate to the project and target where the kubernetes cluster is deployed and move the mouse so it hovers over the artifact connection port. This will pop a windows that allows you to download the artifact in raw JSON, or as a `kubeconfig` yaml. Select "Kube Config" from the drop down, and click the button. This will download the `kubeconfig` for the kubernetes cluster to your local system. - -![Download Kubeconfig](https://github.com/massdriver-cloud/azure-aks-cluster/blob/main/images/kubeconfig-download.gif?raw=true) - -### Use the Kubeconfig File - -Once the `kubeconfig` file is downloaded, you can move it to your desired location. By default, `kubectl` will look for a file named `config` located in the `$HOME/.kube` directory. If you would like this to be your default configuration, you can rename and move the file to `$HOME/.kube/config`. - -A single `kubeconfig` file can hold multiple cluster configurations, and you can select your desired cluster through the use of [`contexts`](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/#context). Alternatively, you can have multiple `kubeconfig` files and select your desired file through the `KUBECONFIG` environment variable or the `--kubeconfig` flag in `kubectl`. - -Once you've configured your environment properly, you should be able to run `kubectl` commands. Here are some commands to try: - -```bash -# get a list of all pods in the current namespace -kubectl get pods - -# get a list of all pods in the kube-system namespace -kubectl get pods --namespace kube-system - -# get a list of all the namespaces -kubectl get namespaces - -# view the logs of a running pod in the default namespace -kubectl logs --namespace default - -# describe the status of a deployment in the foo namespace -kubectl describe deployment --namespace foo - -# get a list of all the resources the kubernetes cluster can manage -kubectl api-resources -``` - -## Addons - -### Grafana - -Connecting to [Grafana](https://grafana.com/docs/grafana/latest/introduction/) on your AKS cluster requires setting up `kubectl` from above. After `kubectl` is set up, you can [port forward](https://grafana.com/docs/grafana/latest/setup-grafana/installation/kubernetes/#access-grafana-on-managed-k8s-providers) the service locally using: `kubectl port-forward service/massdriver-grafana 3000:80 --namespace=md-observability` and then browsing to `http://localhost:3000`. The username is `admin` and the password is the password you set on the bundle configuration page. - -## Trade-offs -* Please note that a default node group must be created and cannot be manipulated, as there can be only one default node group. Additional node groups can be provisioned as needed. -* We do not currently support filtering compute size options by region or subscription-plan availability. -* We do not support the following: - * API server availability with SLA configuration - * Kubenet network configuration - * Enabling Azure Policy - * Integrating AKS and Azure Key Vault From ce1cab8919b8236c011ba4b7ce5d4d74539a4b9b Mon Sep 17 00:00:00 2001 From: Michael Lacore Date: Wed, 21 Aug 2024 11:36:48 -0700 Subject: [PATCH 3/4] Update operator.md --- operator.md | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/operator.md b/operator.md index 122f58f..1df5b4d 100644 --- a/operator.md +++ b/operator.md @@ -9,7 +9,36 @@ Azure AKS (Azure Kubernetes Service) is a managed Kubernetes service that makes 3. **Automatic Upgrade**: Clusters are set to stable automatic channel upgrade to ensure they are up-to-date with the latest stable features. 4. **Log Analytics**: Integration with Azure Log Analytics to provide monitoring and logging capabilities. 5. **Auto-scaling**: Both default and additional node pools are configured with auto-scaling capabilities to manage workloads efficiently. -6. **Networking**: The networking profile uses the "azure" network plugin and policy to integrate seamlessly with Azure's ecosystem. +6. **Networking**: The networking profile uses the `azure` network plugin and policy to integrate seamlessly with Azure's ecosystem. + +### Connecting + +After you have deployed a Kubernetes cluster through Massdriver, you may want to interact with the cluster using the powerful [kubectl](https://kubernetes.io/docs/reference/kubectl/) command line tool. + +#### Install Kubectl + +You will first need to install `kubectl` to interact with the kubernetes cluster. Installation instructions for Windows, Mac and Linux can be found [here](https://kubernetes.io/docs/tasks/tools/#kubectl). + +Note: While `kubectl` generally has forwards and backwards compatibility of core capabilities, it is best if your `kubectl` client version is matched with your kubernetes cluster version. This ensures the best stability and compability for your client. + + +The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a [`kubeconfig`](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) file. + +#### Download the Kubeconfig File + +The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a [`kubeconfig`](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) file. The `kubernetes-cluster` artifact that is created when you make a kubernetes cluster in Massdriver contains the basic information needed to create a `kubeconfig` file. Because of this, Massdriver makes it very easy for you to download a `kubeconfig` file that will allow you to use `kubectl` to query and administer your cluster. + +To download a `kubeconfig` file for your cluster, navigate to the project and target where the kubernetes cluster is deployed and move the mouse so it hovers over the artifact connection port. This will pop a windows that allows you to download the artifact in raw JSON, or as a `kubeconfig` yaml. Select "Kube Config" from the drop down, and click the button. This will download the `kubeconfig` for the kubernetes cluster to your local system. + +![Download Kubeconfig](https://github.com/massdriver-cloud/azure-aks-cluster/blob/main/images/kubeconfig-download.gif?raw=true) + +#### Use the Kubeconfig File + +Once the `kubeconfig` file is downloaded, you can move it to your desired location. By default, `kubectl` will look for a file named `config` located in the `$HOME/.kube` directory. If you would like this to be your default configuration, you can rename and move the file to `$HOME/.kube/config`. + +A single `kubeconfig` file can hold multiple cluster configurations, and you can select your desired cluster through the use of [`contexts`](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/#context). Alternatively, you can have multiple `kubeconfig` files and select your desired file through the `KUBECONFIG` environment variable or the `--kubeconfig` flag in `kubectl`. + +Once you've configured your environment properly, you should be able to run `kubectl` commands. ### Runbook From f035beea62be0846011b546c421c4ad9211ab153 Mon Sep 17 00:00:00 2001 From: Michael Lacore Date: Wed, 21 Aug 2024 11:39:27 -0700 Subject: [PATCH 4/4] Update operator.md --- operator.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/operator.md b/operator.md index 1df5b4d..5be8b2b 100644 --- a/operator.md +++ b/operator.md @@ -28,9 +28,7 @@ The standard way to manage connection and authentication details for kubernetes The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a [`kubeconfig`](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) file. The `kubernetes-cluster` artifact that is created when you make a kubernetes cluster in Massdriver contains the basic information needed to create a `kubeconfig` file. Because of this, Massdriver makes it very easy for you to download a `kubeconfig` file that will allow you to use `kubectl` to query and administer your cluster. -To download a `kubeconfig` file for your cluster, navigate to the project and target where the kubernetes cluster is deployed and move the mouse so it hovers over the artifact connection port. This will pop a windows that allows you to download the artifact in raw JSON, or as a `kubeconfig` yaml. Select "Kube Config" from the drop down, and click the button. This will download the `kubeconfig` for the kubernetes cluster to your local system. - -![Download Kubeconfig](https://github.com/massdriver-cloud/azure-aks-cluster/blob/main/images/kubeconfig-download.gif?raw=true) +To download a `kubeconfig` file for your cluster, navigate to the project and environment where the kubernetes cluster is deployed open the Details configuration pane. Click on the download and select `Kube Config` which downloads the artifact in raw JSON, or as a `kubeconfig` yaml. Select "Kube Config" from the drop down, and click the button. This will download the `kubeconfig` for the kubernetes cluster to your local system. #### Use the Kubeconfig File