Write-Audit-Publish with Bauplan and Temporal

A reference implementation of the write-audit-publish (WAP) pattern with Bauplan and Temporal

Overview

A common need on S3-backed analytics systems (e.g. a data lakehouse) is safely ingesting new data into tables available to downstream consumers.

Due to their distributed nature and large quantity of data to be bulk-inserted, a lakehouse ingestion is more delicate than the equivalent operation on a traditional database.

Data engineering best practices suggest the Write-Audit-Publish (WAP) pattern, which consists of three main logical steps:

Write: ingest data into a ''staging'' / ''temporary'' section of the lakehouse - the data is not visible yet to downstream consumers;
Audit: run quality checks on the data, to verify integrity and quality (avoid the ''garbage in, garbage out'' problem);
Publish: if the quality checks succeed, proceed to publish the data to the main section of the lakehouse - the data is now visible to downstream consumers; otherwise, raise an error / clean-up etc.

This repository showcases how Temporal and Bauplan can be used to implement WAP in a few lines of no-nonsense Python code: no knowledge of the JVM, SQL or Iceberg is required.

Quick link: if you only have 3 minutes, you can start by watching our video walkthrough on how we run the repo from scratch installing the packages locally and launching the workflow!

Setup

Bauplan

Bauplan is the programmable lakehouse: you can load, transform, query data all from your code (CLI or Python).

If you don't have a Bauplan key for the free sandbox, require one here. Complete the 3 min tutorial to check your setup and get familiar with the platform.

Note: the current SDK version is 0.0.3a337 but it is subject to change as the platform evolves - ping us if you need help with any of the APIs used in this project.

Setup your Python environment

Install the required dependencies in a virtual environment:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Make a copy of the local.env file as .env and fill in the required values for the WAP flow: the name of the table, the name of the branch (in the usual format: user_name.branch_name), the S3 path containing the files to ingest, the namespace. For example:

TABLE_NAME=temporal_test
BRANCH_NAME=jacopo.temporal_test
S3_PATH=s3://my-public-bucket/taxi-2024/yellow_tripdata_2024-01.parquet
NAMESPACE=temporal

Temporal

Install Temporal server locally, for example using brew or other supported methods, and start it up in one terminal:

brew install temporal
temporal server start-dev

Run the flow

To run the flow we need to run the Temporal application. We will need to open a new terminal and run the following command:

python run_workflow.py

Where to go from here

You can continue your exploration of Bauplan by checking end-to-end examples covering common use cases in ML / AI / data infrastructure: if you want to learn more about the underlying architecture and design choices, please refer to our latest paper.

To learn more about Temporal Cloud, you can head over to their website to start a free trial.

License

The code in the project is licensed under the MIT License (Temporal and Bauplan are owned by their respective owners and have their own licenses).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Write-Audit-Publish with Bauplan and Temporal

Overview

Setup

Bauplan

Setup your Python environment

Temporal

Run the flow

Where to go from here

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

BauplanLabs/wap-with-bauplan-and-temporal

Folders and files

Latest commit

History

Repository files navigation

Write-Audit-Publish with Bauplan and Temporal

Overview

Setup

Bauplan

Setup your Python environment

Temporal

Run the flow

Where to go from here

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages