GitHub - OKDP/spark-web-proxy: Monitor live Spark applications within Spark History Server UI on Kubernetes.

spark-web-proxy acts as a reverse proxy for Spark History Server and Spark UI. It completes Spark History Server by seamlessly integrating live (running) Spark applications UIs. The web proxy enables real-time dynamic discovery and monitoring of running spark applications (without delay) alongside completed applications, all within your existing Spark History Server Web UI.

The proxy is non-intrusive and independent of any specific version of Spark History Server or Spark. It supports all Spark application deployment modes, including Kubernetes Jobs, Spark Operator, notebooks (Jupyter, etc), etc.

Requirements

Kubernetes cluster
Spark History Server
Helm installed

Note

You can use the following Spark History Server helm chart.

Installation

To deploy the Spark Web Proxy, refer to helm chart README for customization options and installation guidelines.

The web proxy can also be deployed as a sidecar container alongside your existing Spark History Server. Ensure to set the property configuration.spark.service to localhost.

In both cases, you need to use the Spark Web Proxy ingress instead of your spark history ingress.

Spark History and spark jobs Configuration

Both Spark History and Spark jobs themselves must be configured to log events, and to log them to the same shared, writable directory.

Spark History:

spark.history.fs.logDirectory /path/to/the/same/shared/event/logs

Spark Jobs:

spark.eventLog.enabled true
spark.eventLog.dir /path/to/the/same/shared/event/logs

Spark Reverse Proxy Support

The web proxy supports Spark Reverse Proxy feature for Spark web UIs by enabling the property spark.ui.reverseProxy=true in your spark jobs. In that case, the web proxy configuration property configuration.spark.ui.proxyBase should be set to /proxy

For more configuration properties, refer to Spark Monitoring configuration page.

Spark jobs deployment

Cluster mode

In a cluster mode, no additional configuration is needed as spark by default adds the label spark-role: driver and the spark-ui port in the spark driver pods as shown in the following:

apiVersion: v1
kind: Pod
metadata:
  labels:
    ...
    spark-role: driver
spec:
  containers:
  - args:
    - driver
    name: spark-kubernetes-driver
    ports:
    ...
    - containerPort: 4040
      name: spark-ui
      protocol: TCP

Notebooks and Client mode

In a client mode, the web proxy relies on /api/v1/applications/[app-id]/environment Spark History Rest API to get the Spark driver IP and UI port and /api/v1/applications/[app-id] to get the application status.

By default, Spark does not render the property spark.ui.port in the environment properties. So, you should set the property during the job submission or using a listener.

Here is an example of how to set the spark.ui.port on a jupyter notebook:

import socket
def find_available_port(start_port=4041, max_port=4100):
    """Find the next available port starting from start_port."""
    for port in range(start_port, max_port):
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
            if s.connect_ex(("localhost", port)) != 0:
                return port
    raise Exception(f"No available ports found in range {start_port}-{max_port}")

conf.set("spark.ui.port", find_available_port())

Authentication

The Spark Web Proxy is independent of any specific authentication mechanism. It simply forwards credentials and headers to the running Spark instances without modifying or enforcing authentication itself.

This allows to use the Spark Authentication Filter or any other authentication solution to secure both the Spark History Server and Spark Jobs to ensure user authentication and authorization.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.local		.local
cmd		cmd
docs/images		docs/images
helm		helm
internal		internal
.ct.yml		.ct.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.lintconf.yaml		.lintconf.yaml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Readme.md		Readme.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
package.json		package.json
release-please-config.json		release-please-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Requirements

Installation

Spark History and spark jobs Configuration

Spark History:

Spark Jobs:

Spark Reverse Proxy Support

Spark jobs deployment

Cluster mode

Notebooks and Client mode

Authentication

About

Uh oh!

Releases 2

Packages

Languages

Uh oh!

License

Uh oh!

OKDP/spark-web-proxy

Folders and files

Latest commit

History

Repository files navigation

Requirements

Installation

Spark History and spark jobs Configuration

Spark History:

Spark Jobs:

Spark Reverse Proxy Support

Spark jobs deployment

Cluster mode

Notebooks and Client mode

Authentication

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages