You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-22Lines changed: 28 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,20 +2,18 @@
2
2
3
3
This project is an example implementation of a [Databricks Asset Bundle](https://docs.databricks.com/aws/en/dev-tools/bundles/) using a [Databricks Free Edition](https://www.databricks.com/learn/free-edition) workspace.
4
4
5
-
The project ist configured using `pyproject.toml` (Python specifics) and `databricks.yaml` (Databricks Bundle specifics) and uses [uv](https://docs.astral.sh/uv/) to manage the Python project and dependencies.
5
+
The project is configured using `pyproject.toml` (Python specifics) and `databricks.yaml` (Databricks Bundle specifics) and uses [uv](https://docs.astral.sh/uv/) to manage the Python project and dependencies.
6
6
7
-
## Repo Overview
7
+
## Repository Structure
8
8
9
-
*`.github/workflows`: CI/CD jobs to test and dpeloy bundle
10
-
*`dab_project`: Python project (Used in Databricks Workflow as Python-Wheel-Task)
11
-
*`dbt`: [dbt](https://github.com/dbt-labs/dbt-core) project (Used in Databricks Workflow as dbt-Task)
12
-
* dbt-Models used from https://github.com/dbt-labs/jaffle_shop_duckdb
13
-
*`resources`: Resources such as Databricks Workflows or Databricks Volumes/Schemas
*`scripts`: Python script to setup groups, service principals and catalogs used in a Databricks (Free Edition) workspace
17
-
*`tests`: Unit-tests running on Databricks (via Connect) or locally
18
-
* Used in [ci.yml](.github/workflows/ci.yml) jobs
9
+
| Directory | Description |
10
+
|-----------|-------------|
11
+
|`.github/workflows`| CI/CD jobs to test and deploy bundle |
12
+
|`dab_project`| Python project (Used in Databricks Workflow as Python-Wheel-Task) |
13
+
|`dbt`|[dbt](https://github.com/dbt-labs/dbt-core) project<br/>* Used in Databricks Workflow as dbt-Task<br/>* dbt-Models used from https://github.com/dbt-labs/jaffle_shop_duckdb|
14
+
|`resources`| Resources such as Databricks Workflows or Databricks Volumes/Schemas<br/>* Python-based workflow: https://docs.databricks.com/aws/en/dev-tools/bundles/python<br/>* YAML-based Workflow: https://docs.databricks.com/aws/en/dev-tools/bundles/resources#job|
15
+
|`scripts`| Python script to setup groups, service principals and catalogs used in a Databricks (Free Edition) workspace |
16
+
|`tests`| Unit-tests running on Databricks (via Connect) or locally<br/>* Used in [ci.yml](.github/workflows/ci.yml) jobs |
19
17
20
18
## Databricks Workspace
21
19
@@ -52,7 +50,7 @@ Sync entire `uv` environment with dev dependencies:
52
50
uv sync --extra dev
53
51
```
54
52
55
-
> **Note:**`dev` uses Databricks Connect, while `dev_local` uses local Spark
53
+
> **Note:**we install Databricks Connect in a follow-up step
56
54
57
55
#### (Optional) Activate virtual environment
58
56
@@ -66,30 +64,38 @@ Windows:
66
64
.venv\Scripts\activate
67
65
```
68
66
67
+
### Databricks Connect
68
+
69
+
Install `databricks-connect` in active environment. This requires authentication being set up via Databricks CLI.
70
+
71
+
```bash
72
+
uv pip uninstall pyspark
73
+
uv pip install databricks-connect==16.3.5
74
+
```
75
+
> **Note:** For Databricks Runtime 16.3
76
+
77
+
See https://docs.databricks.com/aws/en/dev-tools/vscode-ext/ for using Databricks Connect extension in VS Code.
78
+
69
79
### Unit-Tests
70
80
71
81
```bash
72
82
uv run pytest -v
73
83
```
74
84
75
-
Based on whether Databricks Connect (the `dev` default) is enabled or not the Unit-Tests try to use a Databricks Cluster or start a local Spark session with Delta support.
76
-
* On Databricks the unit-tests currently assume the catalog `unit_tests` exists (not ideal).
85
+
Based on whether Databricks Connect is enabled or not the Unit-Tests try to use a Databricks Cluster or start a local Spark session with Delta support.
86
+
* On Databricks the unit-tests currently assume the catalog `lake_dev` exists.
77
87
78
88
> **Note:** For local Spark Java is required. On Windows Spark/Delta requires HADOOP libraries and generally does not run well, opt for `wsl` instead.
79
89
80
90
### Checks
81
91
82
92
```bash
83
93
# Linting
84
-
ruff check --fix
94
+
uv run ruff check --fix
85
95
# Formatting
86
-
ruff format
96
+
uv run ruff format
87
97
```
88
98
89
-
### Databricks Connect
90
-
91
-
See https://docs.databricks.com/aws/en/dev-tools/vscode-ext/ for using Databricks Connect extension in VS Code.
92
-
93
99
### Setup Databricks Workspace
94
100
95
101
The following script sets up a Databricks (Free Edition) Workspace for this project with additional catalogs, groups and service principals. It uses both Databricks-SDK and Databricks Connect (Serverless).
@@ -150,7 +156,7 @@ uv run ./scripts/setup_workspace.py
150
156
The `dbt` project is based on https://github.com/dbt-labs/jaffle_shop_duckdb with following changes:
0 commit comments