Skip to content

aws-samples/sample-emr-eks-yunikorn-scheduler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Deploy Apache YuniKorn batch scheduler for Amazon EMR on EKS

As organizations successfully grow their Apache Spark workloads on Amazon EMR on EKS, they may seek to optimize resource scheduling to further enhance cluster utilization, minimize job queuing, and maximize performance. Although Kubernetes’ default scheduler, kube-scheduler, works well for most containerized applications, it lacks feature sets capable of managing complex big data workloads with specific requirements such as gang scheduling, resource quotas, job priorities, multi-tenancy, and hierarchical queue management. This limitation can result in inefficient resource utilization, longer job completion times, and increased operational costs for organizations running large-scale data processing workloads.

Apache YuniKorn addresses these limitations by providing a custom resource scheduler specifically designed for big data and machine learning (ML) workloads running on Kubernetes. Unlike kube-scheduler, YuniKorn offers features such as gang scheduling, making sure all containers of a Spark application start together, resource fairness amongst multiple tenants, priority and preemption capabilities, and queue management with hierarchical resource allocation. For data engineering and platform teams managing large-scale Spark workloads on Amazon EMR on EKS, YuniKorn can improve resource utilization rates, reduce job completion times, and provide improved resource allocation for multi-tenant clusters. This is particularly valuable for organizations running mixed workloads with varying resource requirements, strict SLA requirements, or complex resource sharing policies across different teams and applications.

This repository and the accompanining blog post shows how to deploy Apache YuniKorn as a custom scheduler for Amazon EMR on EKS, configure scheduling policies including gang scheduling, establish resource quotas, and manage queues for big data workloads, and demonstrates the feature sets using practical Spark job examples.

Solution Overview

The below diagram shows the high-level architecture of YuniKorn Scheduler running on EMR on EKS.

Alt text

This solution includes a secure bastion host not shown in the architecture diagram that provides access to the EKS cluster via AWS Systems Manager (SSM) Session Manager. The bastion host is deployed in a private subnet with all necessary tools pre-installed, including kubectl configured with proper permissions for seamless cluster interaction.

YuniKorn Queue Architecture

To demonstrate the various YuniKorn features, below four queues are configured:

# Analytics Queue - Time-sensitive workloads
analytics-queue:
  guaranteed: 10 vCPUs, 38GB memory (30% of cluster)
  max: 24 vCPUs, 96GB memory (80% burst capacity)
  priority: 100 (highest)
  policy: FIFO (predictable scheduling)

# Marketing Queue - Large batch jobs
marketing-queue:
  guaranteed: 8 vCPUs, 32GB memory (25% of cluster)
  max: 24 vCPUs, 96GB memory (80% burst capacity)
  priority: 75 (medium)
  policy: Fair Share (balanced resource distribution)

# Data Science Queue - Experimental workloads
datascience-queue:
  guaranteed: 6 vCPUs, 26GB memory (20% of cluster)
  max: 24 vCPUs, 96GB memory (80% burst capacity)
  priority: 50 (lower)
  policy: Fair Share (experimental workload balancing)

# Default Queue - Fallback for unmatched jobs
default:
  guaranteed: 6 vCPUs, 26GB memory (20% of cluster)
  max: 24 vCPUs, 96GB memory (80% burst capacity)
  priority: 25 (Lowest - Lowest priority for fallback jobs)
  policy: FIFO (predictable job submission)

YuniKorn Placement Rule

Our demonstration used the provided placement rule for queue matching and a fixed placement rule for all unmatched or unset queues.

placementrules:
  - name: provided
    create: false
  - name: fixed
    value: root.default

Job-Level Queue Assignment

The Spark job-level annotations are set in the Spark Operator Custom Resource Definition (CRD).

driver:
  annotations:
    yunikorn.apache.org/queue: "root.analytics-queue"
executor:
  annotations:
    yunikorn.apache.org/queue: "root.analytics-queue"

Prerequisites

Before you deploy this solution, make sure the following prerequisites are in place:

Deployment Guide

1. Set up Prerequisite Infrastructure

Complete the following steps to set up the prerequisite infrastructure:

1.1. Clone the repository and set environment variables

Clone the repository to your local machine and set the two environment variables. Replace <AWS_REGION> with the AWS Region where you want to deploy these resources.

git clone https://github.com/aws-samples/sample-emr-eks-yunikorn-scheduler.git
cd sample-emr-eks-yunikorn-scheduler
export REPO_DIR=$(pwd)
export AWS_REGION=<AWS_REGION>

1.2. Execute the infrastructure setup script

cd $REPO_DIR/infrastructure
./setup-infra.sh

1.3. Verify successful deployment

To verify successful infrastructure deployment, open the AWS CloudFormation console, choose your stack, and check the Events, Resources, and Outputs tabs for completion status, details, and list of resources created.

2. Deploy YuniKorn on EMR on EKS

Execute the below YuniKorn setup script. The script deploys the Yunikorn helm chart and updates the configmap with the Queues and Placement rules.

cd $REPO_DIR/yunikorn/
./deploy-yunikorn-via-bastion.sh

3. Establish EKS Cluster Connectivity

Complete the following steps to establish secure connectivity to your private EKS cluster:

3.1. Start port forwarding for EKS and YuniKorn UI

Execute the below script in a new terminal window. This script establishes port forwarding through the bastion host to make your private EKS cluster accessible from your local machine and enables access to the YuniKorn web UI. Keep this terminal window open and running throughout your work session. Please review the 'Prerequisites' section for AWS Session Manager plugin setup.

export REPO_DIR=/path/to/repository
export AWS_REGION=<AWS_REGION>

cd $REPO_DIR/port-forward
./eks-connect.sh --start

3.2. Test kubectl connectivity

Test kubectl connectivity in the main terminal window to verify that you can successfully communicate with the EKS cluster. You should see the EKS worker nodes listed, confirming that the port forwarding is working correctly and you can proceed with YuniKorn deployment verification.

kubectl get nodes

3.3. Verify successful Yunikorn deployment

Verify the successful deployment by listing all kubernetes objects in the yunikorn namespace.

kubectl get all -n yunikorn

3.4. Access the YuniKorn web UI

The YuniKorn UI is made available when you run ./eks-connect.sh --start in step 2.1. Access the YuniKorn web UI by navigating to http://127.0.0.1:9889 in your browser. Port 9889 is the default port for the YuniKorn web UI.

# macOS
open http://127.0.0.1:9889

# Linux
xdg-open http://127.0.0.1:9889

# Windows
start http://127.0.0.1:9889

4. Setup Spark Jobs

Execute the following steps to set up the Spark jobs environment.

# Navigate to the spark-jobs directory
cd $REPO_DIR/spark-jobs

# Run the setup script
./setup-spark-jobs.sh

# Verify Spark Operator installation
kubectl get pods -n spark-operator

5. Run Demo Jobs

Submit analytics, marketing, and data science Spark jobs using the following commands.

# Spark Operator approach
kubectl apply -f spark-operator/analytics-job.yaml
kubectl apply -f spark-operator/marketing-job.yaml
kubectl apply -f spark-operator/datascience-job.yaml

# Monitor job status
kubectl get sparkapplications -A
kubectl get pods -n emr

6. Check the Spark jobs execution in Yunikorn UI

Check the spark jobs queue and resource allocation in Yunikorn UI

Cleanup

1. Stop Port Forwarding Sessions

First, stop the port forwarding sessions:

# Navigate to port-forwarding directory
cd $REPO_DIR/port-forward

# Stop all sessions
./eks-connect.sh --stop

2. Clean Up Infrastructure

Remove all created AWS resources:

# Navigate to root directory
cd $REPO_DIR

# Run cleanup script
./cleanup.sh

Additional Notes:

Connection Management

The eks-connect.sh script provides several management options:

# Change directory
cd $REPO_DIR/port-forward

# Start all port forwarding (kubectl + YuniKorn UI) - run in dedicated terminal
./eks-connect.sh --start

# Check status of port forwarding sessions - run from any terminal
./eks-connect.sh --status

# Stop all sessions when done - run from any terminal
./eks-connect.sh --stop

# Show help and usage information
./eks-connect.sh --help

Contributing

See CONTRIBUTING for more information.

License

See the LICENSE for more information.

Disclaimer

This solution deploys the Open Source software Apache Yunikorn in the AWS cloud. AWS makes no claims regarding security properties of any Open Source Software. Please evaluate all Open Source Software, including Apache Yunikorn, according to your organization's security best practices before implementing the solution.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •