[Track] [Enhancement][WIP] KCL x Webhook Gatekeeper

## Motivation

Kubernetes supports Webhook, RABC, builtin CEL policy, and other methods for permission control. However, in the process of using Webhook for resource management, there are some more complex permission management requirements. For example, we hope to manage the permissions of k8s webhook, which can mutate/validate specific resources. However, k8s itself does not set permission mechanisms for webhooks to constrain their scope, which may cause webhooks to affect resources that should not have been effective, thereby affecting cluster behavior.

## User Story

As a Kubernetes cluster administrator, I hope to have fine-grained permission control over k8s webhooks to ensure that only specific webhooks can operate specific resources, and the granularity of permissions can be precise to a certain mutation and validation. I need a way to define and execute these fine-grained permission rules to prevent webhooks from accidentally affecting resources that should not be affected, leading to abnormal cluster behavior.

## Goals

The specific scenario of the problem mainly includes these two parts, and the goal is to try to solve them

Goal 1: Panic in webhook

- K8s webhooks can either admit or reject API requests. **There is one problem with webhooks that make them more dangerous though: admission request failures also result in rejection by default.** That's a serious problem

Goal 2: Webhook works fine, but there is bugs with its logic

- **The `kube-system` namespace deserves its own section, because a mistake in the configuration can easily lead to complete cluster failure. The most common mistake is a missing label on the `kube-system` namespace object that would exclude it from request matching. A single webhook request failure can prevent Kubernetes components from starting, leading to a ripple effect causing the whole cluster to fail. Bottom line is: make sure to always exclude `kube-system` from mutations/validations unless you have a very good reason not to.**

## Proposal

### Goal 1: Panic in webhook

The behavior taken after Webhook failure depends on the specific requirements. Webhook resources should provide post-recovery policies, which users can freely choose according to specific usage scenarios. Further, users can also write post-recovery policies by themselves

```yaml
apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: conditionally-add-annotations
spec:
  params:
    toMatch:
      config.kubernetes.io/local-config: "true"
    toAdd:
      configmanagement.gke.io/managed: disabled
    failureAction: "abort"  # or "warn”, “skip”, or a function for more action based on needs
  source: < kcl code >
```

### Goal 2: Webhook works fine, but there is bugs with its logic

1. The RABC authority is inherited, and the mutation and validation resources of the account apply inherit the RABC authority of the account.

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: restricted-namespace # The specific namespace
  name: restricted-role
rules:
- apiGroups: [""]
  resources: ["pods", "services"] # The specific resources
  verbs: ["get", "list", "watch", "create", "update", "delete"] # The specific action
```

2. On the basis of RABC, the label selector is used to further fine-grained the scope of webhook validity.

```yaml
apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: conditionally-add-annotations
spec:
  selector: # select the object
    matchLabels:
      app: my-app
    namespace: my-namespace
    resourceKind: Pod
    resourceName: my-pod
  params:
    toMatch:
      config.kubernetes.io/local-config: "true"
    toAdd:
      configmanagement.gke.io/managed: disabled
  source: < kcl code >
```

3. The bidirectional selection mechanism

The webhook resource describes the object it wants to work on

```yaml
apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: conditionally-add-annotations
spec:
  selector:
    matchLabels:
      app: my-app
    namespace: my-namespace
    resourceKind: deployment
    resourceName: my-deployment
    strictmode: true
  params:
    toMatch:
      config.kubernetes.io/local-config: "true"
    toAdd:
      configmanagement.gke.io/managed: disabled
    failureAction: "abort"  # or "warn”, “skip”, or more action based on needs
  source: < kcl code >
```

Among the resources, describe those webhooks that are available to itself.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    app: nginx
  annotations:
    canMutate: [“conditionally-add-annotations”]
    canValidate:  [“conditionally-validate-annotations”]
```

Further detailed design is required:

1. The correspondence between RABC verbs and validation, mutation webhook or custom verbs.
2. The appropriate selector helps the webhook select the resource accurately.
3. The appropriate selector helps the resource select the webhook accurately.

## [WIP] Design Details

1. Write Webhooks through KCL

```kcl
params = option("params") or {} # hidden this for user 
set_func = lambda params {
    annotations: {str:str} = {k = v for k, v in params.annotations or {}}
    items = [item | {
        metadata.annotations: annotations
    } for item in option("items")]
}
items = set_func(params) # hidden this for user 
```

2. RBAC binding to webhook

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mutater
  namespace: default 

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: mutater-role
  namespace: default 
rules:
  - apiGroups: ["*"]
    resources: ["*"]
    verbs: ["*"]
---
apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: set-annotations
spec:
  serviceAccountName: mutater
  params:
    annotations:
      config.kubernetes.io/local-config: "true"
  source: oci://ghcr.io/kcl-lang/set-annotations
```

3. Debug/Test with KCL

```kcl
test_set_func = lambda {
    ...
    item = set_func(param)
    assert item.annotation == 'kcl2'
}
```

4. Extend `Print` or Provide builtin `log` 

e.g.
```
Print("aaa", io.stdout)
log.SetLevel(Debug)
log.Print("aaa", io.stdout)
```

5. Error Recovery
```yaml
apiVersion: krm.kcl.dev/v1alpha1
kind: KCLRun
metadata:
  name: set-annotations
spec:
  serviceAccountName: mutater
  recoveryPolicy: panic # or skip
  params:
    annotations:
      config.kubernetes.io/local-config: "true"
  source: oci://ghcr.io/kcl-lang/set-annotations
```

6. KCL as glue

```
req --> KCL-operator --> KCLRun --> KCL --> kclplugin --> go/py/rust...
               |--- Filters out the list of resources that 
               |--- the webhook can access based on RBAC
```

## Community Tach

https://www.likakuli.com/posts/kinitiras-all/

### Kubernetes Webhook

k8s webhook supports scoping when registering services

```yaml
webhooks:
  - name: webhook-example.github.com
    clientConfig:
      service:
        name: webhook-example
        namespace: default
        path: "/mutate"                    
      caBundle: ${CA_BUNDLE}
    admissionReviewVersions: [ "v1beta1" ]
    sideEffects: None
    rules:                                  
      - operations: [ "CREATE" ]
        apiGroups: ["apps", ""]
        apiVersions: ["v1"]
        resources: ["deployments"] # Here !
    namespaceSelector:                      
      matchLabels:
        webhook-example: enabled # Here !
```

### Kubernetes CEL Policy

k8s CEL specifies the object for which the rule takes effect
```yaml
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingAdmissionPolicy
metadata:
  name: "demo-policy.example.com"
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups:   ["apps"]
      apiVersions: ["v1"]
      operations:  ["CREATE", "UPDATE"]
      resources:   ["deployments"] # Here !
  validations:
    - expression: "object.spec.replicas <= 5"
```

Specify a namespace using Binding
```yaml
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: "demo-binding-test.example.com"
spec:
  policyName: "demo-policy.example.com"
  validationActions: [Deny]
  matchResources:
    namespaceSelector:
      matchLabels:
        environment: test  # Here !
```

### OPA Gatekeeper

OPA can create rules to prevent users from accessing the namespace

```rego
package kubernetes.admission
    operations = {"CREATE", "UPDATE", "DELETE"}

    deny[msg] {
        username := input.request.userInfo.username
        username == "user1"
        operations[input.request.operation]
        namespaces:= input.request.object.metadata.namespace]
        namespace == ns1
        msg := sprintf("Unauthorized: %v is not permitted to modify objects in namespace %v", [username, namespace])
    }
```

https://support.tools/post/opa-gatekeeper-require-labels/
https://stackoverflow.com/questions/71547292/opa-rego-policy-to-block-access-to-kubernetes-namespace

### Kyverno

Kyverno can create rules to prevent users from accessing the namespace

```
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-create-in-forbidden-namespace
spec:
  background: false
  rules:
  - name: disallow-create-in-forbidden-namespace
    match:
      resources:
        kinds:
        - '*'
    exclude:
      namespaceSelector:
        matchNames:
        - forbidden-namespace
    mutate:
      overlay: {}
    validate:
      message: "Creating resources in the forbidden-namespace is not allowed."
      deny: true
```

### Chainsaw

chainsaw: An end-to-end, declarative testing tool anyone can use to test Kubernetes operators.

```yaml
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: example
spec:
  steps:
  - try:
    - assert:
        resource:
          apiVersion: apps/v1
          kind: Deployment
          metadata:
            name: coredns
            namespace: kube-system
          spec:
            replicas: 2
```

When asking Chainsaw to execute the assertion above, it will look for a deployment named coredns in the kube-system namespace and will compare the existing resource with the (partial) resource definition contained in the assertion.
In this specific case, if the field spec.replicas is set to 2 in the existing resource, the assertion will be considered valid. If it is not equal to 2 the assertion will be considered failed.

### [WIP] FluxCD Multi Tenancy

**Flux defers to Kubernetes’ native RBAC to specify which operations are authorised when processing its custom resources.** By default, this means operations are constrained by the service account under which the controllers run, which has the cluster-admin role bound to it. This is convenient for a deployment in which all users are trusted.

In a multi-tenant deployment, each tenant needs to be restricted in the operations that can be done on their behalf. Since tenants control Flux via its API objects, **this becomes a matter of attaching RBAC rules to Flux API objects**.

To give users control over the authorisation, **the Flux controllers can impersonate (assume the identity of) a service account mentioned in the apply specification** (e.g., the field .spec.serviceAccountName in a [Kustomization object](https://fluxcd.io/flux/components/kustomize/kustomizations/#role-based-access-control) or in a [HelmRelease object](https://fluxcd.io/flux/components/helm/helmreleases/#role-based-access-control)) for both accessing resources and applying configuration. This lets a user constrain the operations performed by the Flux controllers with RBAC.

```yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
 name: podinfo
 namespace: webapp
spec:
 serviceAccountName: webapp-reconciler
 interval: 5m
 chart:
   spec:
     chart: podinfo
     sourceRef:
       kind: HelmRepository
       name: podinfo
```
https://fluxcd.io/flux/components/helm/helmreleases/#role-based-access-control
https://fluxcd.io/flux/installation/configuration/multitenancy/ 

### KusionStack Controller Mesh

```yaml
apiVersion: ctrlmesh.kusionstack.io/v1alpha1
kind: ShardingConfig
metadata:
  name: sharding-demo
  namespace: operator-demo
spec:
  controller:
    leaderElectionName: operator-leader
  webhook:
    certDir: /tmp/webhook-certs
    port: 9443
  selector:
    matchExpressions:
    - key: statefulset.kubernetes.io/pod-name
      operator: In
      values:
      - operator-demo-0
```

## Reference

+ https://hookdeck.com/webhooks/guides/best-practices-deploy-webhooks-production
+ https://mp.weixin.qq.com/s/v7y6i4uLwjf9gKWsW944-g
+ https://liangyuanpeng.com/post/k8s-admissionregistration-with-cel/
+ https://blog.fleeto.us/post/opa-gatekeeper-101/
+ https://cloud.google.com/kubernetes-engine/docs/how-to/pod-security-policies-with-gatekeeper?hl=zh-cn
+ https://www.cnblogs.com/charlieroro/p/15829201.html
+ https://www.alibabacloud.com/help/zh/ack/product-overview/gatekeeper
+ https://yusank.space/posts/policy-engine/
+ https://www.51cto.com/article/650191.html
+ https://zhuanlan.zhihu.com/p/452996426
+ https://github.com/fluxcd/flux2-multi-tenancy/blob/main/README.md
+ https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
+ KusionStack Controller Mesh: https://github.com/KusionStack/controller-mesh



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Track] [Enhancement][WIP] KCL x Webhook Gatekeeper #35

Motivation

User Story

Goals

Proposal

Goal 1: Panic in webhook

Goal 2: Webhook works fine, but there is bugs with its logic

[WIP] Design Details

Community Tach

Kubernetes Webhook

Kubernetes CEL Policy

OPA Gatekeeper

Kyverno

Chainsaw

[WIP] FluxCD Multi Tenancy

KusionStack Controller Mesh

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Track] [Enhancement][WIP] KCL x Webhook Gatekeeper #35

Description

Motivation

User Story

Goals

Proposal

Goal 1: Panic in webhook

Goal 2: Webhook works fine, but there is bugs with its logic

[WIP] Design Details

Community Tach

Kubernetes Webhook

Kubernetes CEL Policy

OPA Gatekeeper

Kyverno

Chainsaw

[WIP] FluxCD Multi Tenancy

KusionStack Controller Mesh

Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions