As organizations successfully grow their Apache Spark workloads on Amazon EMR on EKS, they may seek to optimize resource scheduling to further enhance cluster utilization, minimize job queuing, and maximize performance. Although Kubernetes’ default scheduler, kube-scheduler, works well for most containerized applications, it lacks feature sets capable of managing complex big data workloads with specific requirements such as gang scheduling, resource quotas, job priorities, multi-tenancy, and hierarchical queue management. This limitation can result in inefficient resource utilization, longer job completion times, and increased operational costs for organizations running large-scale data processing workloads.
Apache YuniKorn addresses these limitations by providing a custom resource scheduler specifically designed for big data and machine learning (ML) workloads running on Kubernetes. Unlike kube-scheduler, YuniKorn offers features such as gang scheduling, making sure all containers of a Spark application start together, resource fairness amongst multiple tenants, priority and preemption capabilities, and queue management with hierarchical resource allocation. For data engineering and platform teams managing large-scale Spark workloads on Amazon EMR on EKS, YuniKorn can improve resource utilization rates, reduce job completion times, and provide improved resource allocation for multi-tenant clusters. This is particularly valuable for organizations running mixed workloads with varying resource requirements, strict SLA requirements, or complex resource sharing policies across different teams and applications.
This repository and the accompanining blog post shows how to deploy Apache YuniKorn as a custom scheduler for Amazon EMR on EKS, configure scheduling policies including gang scheduling, establish resource quotas, and manage queues for big data workloads, and demonstrates the feature sets using practical Spark job examples.
The below diagram shows the high-level architecture of YuniKorn Scheduler running on EMR on EKS.
This solution includes a secure bastion host not shown in the architecture diagram that provides access to the EKS cluster via AWS Systems Manager (SSM) Session Manager. The bastion host is deployed in a private subnet with all necessary tools pre-installed, including kubectl configured with proper permissions for seamless cluster interaction.
To demonstrate the various YuniKorn features, below four queues are configured:
# Analytics Queue - Time-sensitive workloads
analytics-queue:
guaranteed: 10 vCPUs, 38GB memory (30% of cluster)
max: 24 vCPUs, 96GB memory (80% burst capacity)
priority: 100 (highest)
policy: FIFO (predictable scheduling)
# Marketing Queue - Large batch jobs
marketing-queue:
guaranteed: 8 vCPUs, 32GB memory (25% of cluster)
max: 24 vCPUs, 96GB memory (80% burst capacity)
priority: 75 (medium)
policy: Fair Share (balanced resource distribution)
# Data Science Queue - Experimental workloads
datascience-queue:
guaranteed: 6 vCPUs, 26GB memory (20% of cluster)
max: 24 vCPUs, 96GB memory (80% burst capacity)
priority: 50 (lower)
policy: Fair Share (experimental workload balancing)
# Default Queue - Fallback for unmatched jobs
default:
guaranteed: 6 vCPUs, 26GB memory (20% of cluster)
max: 24 vCPUs, 96GB memory (80% burst capacity)
priority: 25 (Lowest - Lowest priority for fallback jobs)
policy: FIFO (predictable job submission)
Our demonstration used the provided placement rule for queue matching and a fixed placement rule for all unmatched or unset queues.
placementrules:
- name: provided
create: false
- name: fixed
value: root.default
The Spark job-level annotations are set in the Spark Operator Custom Resource Definition (CRD).
driver:
annotations:
yunikorn.apache.org/queue: "root.analytics-queue"
executor:
annotations:
yunikorn.apache.org/queue: "root.analytics-queue"
Before you deploy this solution, make sure the following prerequisites are in place:
- Access to a valid AWS account
- The AWS Command Line Interface (AWS CLI) is installed on your local machine
- AWS Session Manager plugin installed for secure bastion host access
- Git, Docker, eksctl, kubectl, Helm, and jq utilities are installed on your local machine
- Permission to create AWS resources
- Familiarity with Kubernetes, Apache Spark, Amazon EKS, and Amazon EMR on EKS
Complete the following steps to set up the prerequisite infrastructure:
Clone the repository to your local machine and set the two environment variables. Replace <AWS_REGION> with the AWS Region where you want to deploy these resources.
git clone https://github.com/aws-samples/sample-emr-eks-yunikorn-scheduler.git
cd sample-emr-eks-yunikorn-scheduler
export REPO_DIR=$(pwd)
export AWS_REGION=<AWS_REGION>cd $REPO_DIR/infrastructure
./setup-infra.shTo verify successful infrastructure deployment, open the AWS CloudFormation console, choose your stack, and check the Events, Resources, and Outputs tabs for completion status, details, and list of resources created.
Execute the below YuniKorn setup script. The script deploys the Yunikorn helm chart and updates the configmap with the Queues and Placement rules.
cd $REPO_DIR/yunikorn/
./deploy-yunikorn-via-bastion.shComplete the following steps to establish secure connectivity to your private EKS cluster:
Execute the below script in a new terminal window. This script establishes port forwarding through the bastion host to make your private EKS cluster accessible from your local machine and enables access to the YuniKorn web UI. Keep this terminal window open and running throughout your work session. Please review the 'Prerequisites' section for AWS Session Manager plugin setup.
export REPO_DIR=/path/to/repository
export AWS_REGION=<AWS_REGION>
cd $REPO_DIR/port-forward
./eks-connect.sh --startTest kubectl connectivity in the main terminal window to verify that you can successfully communicate with the EKS cluster. You should see the EKS worker nodes listed, confirming that the port forwarding is working correctly and you can proceed with YuniKorn deployment verification.
kubectl get nodesVerify the successful deployment by listing all kubernetes objects in the yunikorn namespace.
kubectl get all -n yunikornThe YuniKorn UI is made available when you run ./eks-connect.sh --start in step 2.1. Access the YuniKorn web UI by navigating to http://127.0.0.1:9889 in your browser. Port 9889 is the default port for the YuniKorn web UI.
# macOS
open http://127.0.0.1:9889
# Linux
xdg-open http://127.0.0.1:9889
# Windows
start http://127.0.0.1:9889Execute the following steps to set up the Spark jobs environment.
# Navigate to the spark-jobs directory
cd $REPO_DIR/spark-jobs
# Run the setup script
./setup-spark-jobs.sh
# Verify Spark Operator installation
kubectl get pods -n spark-operatorSubmit analytics, marketing, and data science Spark jobs using the following commands.
# Spark Operator approach
kubectl apply -f spark-operator/analytics-job.yaml
kubectl apply -f spark-operator/marketing-job.yaml
kubectl apply -f spark-operator/datascience-job.yaml
# Monitor job status
kubectl get sparkapplications -A
kubectl get pods -n emrCheck the spark jobs queue and resource allocation in Yunikorn UI
First, stop the port forwarding sessions:
# Navigate to port-forwarding directory
cd $REPO_DIR/port-forward
# Stop all sessions
./eks-connect.sh --stop
Remove all created AWS resources:
# Navigate to root directory
cd $REPO_DIR
# Run cleanup script
./cleanup.shThe eks-connect.sh script provides several management options:
# Change directory
cd $REPO_DIR/port-forward
# Start all port forwarding (kubectl + YuniKorn UI) - run in dedicated terminal
./eks-connect.sh --start
# Check status of port forwarding sessions - run from any terminal
./eks-connect.sh --status
# Stop all sessions when done - run from any terminal
./eks-connect.sh --stop
# Show help and usage information
./eks-connect.sh --helpSee CONTRIBUTING for more information.
See the LICENSE for more information.
This solution deploys the Open Source software Apache Yunikorn in the AWS cloud. AWS makes no claims regarding security properties of any Open Source Software. Please evaluate all Open Source Software, including Apache Yunikorn, according to your organization's security best practices before implementing the solution.
