Skip to content

Cortex Kubernetes Image Pull Policy #483

@ihmad007

Description

@ihmad007

Add configurable imagePullPolicy for Kubernetes Job Runner

Request Type

Feature Request

Work Environment

Question Answer
OS version (server) Kubernetes-based deployment
Cortex version / git hash Latest / main branch
Package Type Docker, Kubernetes
Deployment Environment Private/On-premise with Harbor registry

Problem Description

When deploying Cortex analyzers in a Kubernetes environment with a private image registry (such as Harbor), the current implementation of K8sJobRunnerSrv.scala does not provide a way to configure the imagePullPolicy for the Kubernetes Jobs it creates.

This causes issues in the following scenarios:

  1. Private Registry Deployments: When new analyzer versions are pushed to a private registry with the same tag (e.g., latest, stable, or environment-specific tags like dev, staging), Kubernetes will not pull the updated image if the local node already has an image with that tag cached.

  2. Development/Testing Environments: In rapidly iterating development environments, developers frequently push updated analyzer images with the same tag. Without the ability to set imagePullPolicy: Always, these updates are not picked up automatically.

  3. CI/CD Pipeline Integration: Modern CI/CD pipelines (e.g., GitLab CI with Harbor) often use consistent tagging strategies (like ${OWNER}-latest or build-${COMMIT_SHA}). The inability to force image pulls can lead to stale analyzer versions being executed.

Current Behavior

The K8sJobRunnerSrv.scala creates Kubernetes Jobs without specifying an imagePullPolicy, which defaults to:

  • IfNotPresent - Only pulls if the image doesn't exist locally
  • This prevents automatic updates when new images are pushed to the registry with existing tags

Desired Behavior

Add a configurable imagePullPolicy parameter that:

  1. Can be set via configuration file (application.conf or reference.conf)
  2. Defaults to IfNotPresent for backward compatibility
  3. Can be overridden to Always, IfNotPresent, or Never as needed
  4. Is applied to the Kubernetes Job container specification
  5. Can be configured via Helm chart values for Kubernetes deployments

Proposed Solution

1. Modify K8sJobRunnerSrv.scala:

Add an imagePullPolicy parameter to the class constructor and apply it to the Kubernetes Job container spec:

@Singleton
class K8sJobRunnerSrv(
    client: DefaultKubernetesClient,
    jobBaseDirectory: Path,
    persistentVolumeClaimName: Option[String],
    imagePullPolicy: String,  // Add this parameter
    implicit val system: ActorSystem
) {

  @Inject()
  def this(config: Configuration, system: ActorSystem) =
    this(
      new DefaultKubernetesClient(),
      Paths.get(config.get[String]("job.directory")),
      config.getOptional[String]("job.kubernetes.persistentVolumeClaimName"),
      config.getOptional[String]("job.kubernetes.imagePullPolicy").getOrElse("IfNotPresent"),  // Add this line
      system: ActorSystem
    )

Apply the policy in the run method:

.addNewContainer()
  .withName("neuron")
  .withImage(dockerImage)
  .withImagePullPolicy(imagePullPolicy)  // Add this line
  .withArgs("/job")

2. Update conf/reference.conf:

job {
  timeout = 30 minutes
  runners = [kubernetes, docker, process]
  directory = ${java.io.tmpdir}
  dockerDirectory = ${job.directory}
  keepJobFolder = false
  
  kubernetes {
    # Name of the PersistentVolumeClaim to use for job storage (required for k8s runner)
    # persistentVolumeClaimName = "cortex-jobs-pvc"
    
    # Image pull policy for Kubernetes jobs
    # Options: Always, IfNotPresent, Never
    # Default: IfNotPresent
    # Set to "Always" for private registries with frequently updated images
    imagePullPolicy = "IfNotPresent"
  }
}

3. Helm Chart Integration (Optional):

For Kubernetes deployments, this can be exposed via Helm chart values.yaml:

cortex:
  kubernetes:
    persistentVolumeClaimName: "cortex-jobs-pvc"
    imagePullPolicy: "Always"  # or "IfNotPresent", "Never"

Benefits

  1. Private Registry Support: Enables proper operation with private registries (Harbor, ECR, ACR, GCR)
  2. Backward Compatible: Defaults to IfNotPresent maintaining current behavior
  3. Flexible Deployment: Different policies can be used for dev/staging/production environments
  4. CI/CD Friendly: Supports modern continuous deployment workflows
  5. Industry Standard: Aligns with Kubernetes best practices and common patterns

Use Cases

  • Development Environment: Set to Always to ensure latest analyzer versions are always pulled
  • Production Environment: Set to IfNotPresent to reduce registry load and improve startup time
  • Air-Gapped/Offline: Set to Never to require pre-loaded images on all nodes

Implementation Status

This feature has been implemented in a fork and tested successfully with:

  • Harbor private registry
  • GitLab CI/CD pipeline
  • LinCloud Kubernetes environment

Related Documentation

Complementary Information

Configuration Example for Private Registry:

job {
  kubernetes {
    persistentVolumeClaimName = "cortex-jobs-pvc"
    imagePullPolicy = "Always"
  }
}

Environment Variable Override:

JOB_KUBERNETES_IMAGEPULLPOLICY=Always

This enhancement would significantly improve Cortex's usability in enterprise and private cloud environments where private image registries are the norm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions