From 805d955290e377f69a0c89965678700a3ea3af27 Mon Sep 17 00:00:00 2001 From: Wong Songhan Date: Mon, 21 Apr 2025 15:01:09 +0000 Subject: [PATCH] Add Databricks setup instructions. --- docs/examples/how-to-run.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/examples/how-to-run.md b/docs/examples/how-to-run.md index f78e6beb3..b2c5c1351 100644 --- a/docs/examples/how-to-run.md +++ b/docs/examples/how-to-run.md @@ -26,3 +26,25 @@ cd lithops/aws # or whichever executor/cloud combination you are using | | Google | [modal/gcp/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/modal/gcp/README.md) | | Coiled | AWS | [coiled/aws/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/coiled/aws/README.md) | | Beam | Google | [dataflow/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/dataflow/README.md) | + +## Databricks + +If you want to run Cubed on Databricks, we recommend using the Spark executor (experimental stage, see [#499](https://github.com/cubed-dev/cubed/issues/499)). + +You will need to setup your compute cluster with [Dedicated Access Mode](https://docs.databricks.com/aws/en/compute/single-user-fgac), as Spark executor requires use of Spark RDDs that are not supported by [Serverless](https://docs.databricks.com/aws/en/compute/serverless/limitations#limitations-overview) or [Standard mode](https://docs.databricks.com/aws/en/compute/access-mode-limitations#standard-access-mode-limitations-on-unity-catalog). + +### Configuration + +Note that if you are using a local directory for `work_dir`, you can only use a single node Spark cluster since the Spark worker nodes will not have access to your driver node local directory. + +Using Unity Catalog Volume is not recommended for `work_dir` since it is significantly slower. + +```py +spec = cubed.Spec( + executor_name="spark", + work_dir="/tmp/", # this is using local directory of the driver node, your cluster will need to run in single node + allowed_mem="2GB" +) +``` + +