Skip to content

Commit 773a649

Browse files
alkaline-0rudolfixsh-rp
authored
Feat/3154 convert script preprocess docs to python and add destination capabilities section to destination pages (#3188)
* Add DLT destination capabilities tags to documentation files This commit introduces the `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags to various destination documentation files. The following files were updated: - athena.md - bigquery.md - clickhouse.md - databricks.md - destination.md - dremio.md - duckdb.md - ducklake.md - filesystem.md - lancedb.md - motherduck.md - mssql.md - postgres.md - qdrant.md - redshift.md - snowflake.md - sqlalchemy.md - synapse.md - weaviate.md * Enhance documentation by adding destination capabilities sections This commit adds the `## Destination capabilities` section along with the corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags to various destination documentation files. The following files were updated: - athena.md - bigquery.md - clickhouse.md - databricks.md - destination.md - dremio.md - duckdb.md - ducklake.md - filesystem.md - lancedb.md - motherduck.md - mssql.md - postgres.md - qdrant.md - redshift.md - snowflake.md - sqlalchemy.md - synapse.md - weaviate.md * Add new script for inserting DLT destination capabilities * Update package.json and package-lock.json to include new script for inserting destination capabilities This commit modifies the `package.json` to add a new script for inserting destination capabilities and updates the `package-lock.json` to reflect the changes in dependencies. The new script allows for better integration of destination capabilities into the documentation process. * Revert "Update package.json and package-lock.json to include new script for inserting destination capabilities" This reverts commit cd5d6c2. * Add script for inserting destination capabilities into documentation This commit introduces a new Python script, `insert_destination_capabilities.py`, It contains only place holder for now that prints to the console for testing the setup. * Add destination capabilities execution This commit introduces a new function, `executeDestinationCapabilities`, which executes a Python script to insert destination capabilities into the documentation process. * Enhance destination capabilities insertion script This commit refines the `insert_destination_capabilities.py` script by adding functionality to dynamically generate and insert destination capabilities tables into documentation files. It introduces a new data structure for capabilities, improves file processing logic, and ensures that only relevant files are processed. Additionally, it enhances error handling and logging for better traceability during execution. * Refactor destination capabilities insertion script This commit updates the `insert_destination_capabilities.py` script to improve its functionality by dynamically retrieving supported destination names from the source directory. It enhances the file processing logic to ensure only relevant files are processed based on available destinations. Additionally, it improves error handling and logging for better execution traceability. * Refactor and enhance destination capabilities insertion script This commit refines the `insert_destination_capabilities.py` script by adding functionality to dynamically retrieve and format destination capabilities into markdown tables. It introduces improved error handling, validation for destination names, and enhances the file processing logic to ensure only relevant files are processed. Additionally, it updates the main function to include pre-checks for source and target directories, ensuring a more robust execution flow. * Refactor and improve destination capabilities insertion script This commit enhances the `insert_destination_capabilities.py` script by refining the logic for generating markdown tables of destination capabilities. It introduces new patterns for documentation links, improves error handling, and optimizes the processing of relevant capabilities. Additionally, it streamlines the file processing logic and ensures that only valid capabilities are included in the output, resulting in cleaner and more informative documentation. * Remove destination capabilities sections from various documentation files This commit removes the `## Destination capabilities` sections and their corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags from multiple destination documentation files, including athena.md, bigquery.md, clickhouse.md, databricks.md, dremio.md, duckdb.md, ducklake.md, filesystem.md, lancedb.md, motherduck.md, mssql.md, postgres.md, qdrant.md, redshift.md, snowflake.md, sqlalchemy.md, synapse.md, and weaviate.md. This cleanup helps streamline the documentation and focuses on relevant content. * Add destination capabilities sections to various documentation files This commit introduces `## Destination capabilities` sections along with their corresponding `<!--@@@DLT_DESTINATION_CAPABILITIES <destination>-->` tags in multiple destination documentation files, including athena.md, bigquery.md, clickhouse.md, databricks.md, dremio.md, duckdb.md, ducklake.md, filesystem.md, lancedb.md, motherduck.md, mssql.md, postgres.md, qdrant.md, redshift.md, snowflake.md, sqlalchemy.md, synapse.md, and weaviate.md. This addition enhances the documentation by providing clear insights into the capabilities of each destination, improving user understanding and usability. * Update documentation for various destinations with formatting improvements This commit enhances the documentation for multiple destinations, including BigQuery, ClickHouse, Databricks, Dremio, DuckDB, DuckLake, Filesystem, LanceDB, MotherDuck, MSSQL, Postgres, Qdrant, Redshift, Snowflake, SQLAlchemy, Synapse, and Weaviate. Changes include improved formatting for warnings, notes, and tips, as well as minor adjustments to the content for clarity and consistency. These updates aim to enhance the readability and usability of the documentation for users. * Remove destination capabilities sections from various documentation files * Update destinations with capabilities marker * Added type guard to guard against Any * Temporarily commit preprocessed docs * Add new constants for documentation preprocessing and update requirements This commit introduces a new `constants.py` file containing various constants for documentation preprocessing, including directory paths, file extensions, timing settings, and markers. Additionally, the `requirements.txt` file is updated to include `watchdog` and `requests` packages, enhancing the project's dependencies. * Add tuba links processing script and remove unused line from constants This commit introduces a new script, `preprocess_tuba.py`, which handles the fetching and formatting of tuba links for documentation. It includes functions for fetching configuration, extracting tags, and inserting links into markdown files. Additionally, an unused line has been removed from `constants.py` to clean up the code. * Refactor tuba link processing and extract utility function This commit refactors the `preprocess_tuba.py` script by moving the `extract_marker_content` function to a new `utils.py` file for better organization and reusability. The logic for checking the presence of the TUBA marker has been simplified, and the formatting function for tuba links has been updated to improve clarity and maintainability. These changes enhance the overall structure of the documentation preprocessing tools. * Add snippet processing functionality for documentation This commit introduces a new script, `preprocess_snippets.py`, which provides functions for building a map of code snippets, retrieving snippets from files, and inserting them into markdown documents. The script enhances the documentation preprocessing tools by allowing for better management and formatting of code snippets. Additionally, the `utils.py` file is updated with new utility functions for directory traversal and marker content extraction, improving overall code organization and reusability. * Add example processing script for documentation generation This commit introduces a new script, `process_examples.py`, which automates the generation of example documentation from Python files. The script includes functionality to build documentation by extracting headers, comments, and code snippets, while also handling exclusions and errors gracefully. Additionally, the `utils.py` file is updated with a new utility function, `trim_array`, to enhance the management of line arrays. These changes improve the documentation process by streamlining example integration and ensuring better formatting. * Enhance documentation preprocessing with Python integration and new script This commit updates the `package.json` to include a new script for installing Python dependencies and modifies the start and build scripts to incorporate Python preprocessing. Additionally, a new `preprocess_docs.py` script is introduced, which automates the processing of markdown files by inserting code snippets, managing links, and syncing examples. The `requirements.txt` is also updated to include a new dependency, `python-debouncer`, improving the documentation workflow. * Refactor documentation preprocessing scripts for improved async handling and example processing This commit enhances the `preprocess_docs.py` script by integrating asynchronous file handling and introducing a lock mechanism to manage concurrent processing. The `package.json` is updated to modify the start script for better coordination of preprocessing tasks. Additionally, a new `preprocess_examples.py` script is added to streamline the generation of example documentation, ensuring proper formatting and error handling. The `preprocess_snippets.py` script is also updated to maintain consistency in line reading methods. These changes collectively improve the efficiency and reliability of the documentation workflow. * Refactor documentation preprocessing scripts for improved efficiency and caching This commit updates the `package.json` to streamline the start script by removing the lock file mechanism and enhancing the coordination of preprocessing tasks. The `preprocess_docs.py` script is refactored to eliminate the lock file usage, simplifying the processing flow. Additionally, the `preprocess_tuba.py` script introduces a caching mechanism for tuba configuration to reduce redundant network requests, improving performance. These changes collectively enhance the documentation workflow and processing efficiency. * Refactor file change handling in documentation preprocessing scripts This commit enhances the `preprocess_docs.py` script by simplifying the file change handling logic through the introduction of a new `handle_change_impl` function. The previous `should_process` function is removed to streamline the decision-making process for file processing. Additionally, whitespace cleanup is performed for better code readability. The `preprocess_tuba.py` script also receives minor whitespace adjustments. These changes collectively improve the maintainability and clarity of the documentation preprocessing workflow. * Add destination capabilities processing and refactor related scripts This commit introduces a new script, `preprocess_destination_capabilities.py`, which handles the generation of destination capabilities tables for documentation. It includes caching mechanisms for improved performance and integrates with existing constants for consistency. The `insert_destination_capabilities` function is now called within `preprocess_docs.py` to streamline the documentation processing workflow. Additionally, the `insert_destination_capabilities.py` script is removed as its functionality is now encapsulated in the new script. These changes enhance the documentation generation process by providing structured capabilities information. * Update package-lock.json and package.json for improved documentation preprocessing This commit updates the `package-lock.json` to reflect changes in dependencies and their versions, ensuring compatibility and performance enhancements. The `package.json` is modified to streamline the `start` and `preprocess-docs` scripts by removing the installation of Python dependencies from the start command and adjusting the environment variable settings. These changes collectively enhance the efficiency and reliability of the documentation generation workflow. * Add processed docs entry to .gitignore This commit updates the .gitignore file to include the 'docs_processed' entry, ensuring that preprocessed documentation files are excluded from version control. This change helps maintain a cleaner repository by preventing unnecessary files from being tracked. * Stop tracking docs_processed directory * Remove the `preprocess_docs.js` script, which handled documentation preprocessing tasks including snippet insertion and link management. This deletion streamlines the codebase by eliminating unused functionality, following recent refactoring efforts to improve documentation processing workflows. * Refactor destination capabilities processing script for type hinting and formatting improvements This commit updates the `preprocess_destination_capabilities.py` script by adding type hints for caching variables, enhancing code clarity and maintainability. Additionally, it modifies the formatting of the capabilities table to ensure consistent output and appends a newline for better readability. These changes collectively improve the structure and presentation of destination capabilities in the documentation. * Refactor documentation processing scripts by removing unnecessary argument documentation This commit simplifies the `insert_destination_capabilities` function in `preprocess_destination_capabilities.py` by removing the detailed argument and return type documentation. Additionally, the `format_tuba_links_section` function in `preprocess_tuba.py` is updated to streamline its docstring, enhancing clarity while maintaining essential information. These changes improve the readability and maintainability of the documentation processing scripts. * Update package.json to streamline documentation processing scripts This commit modifies the `package.json` to include a new script for installing Python dependencies and updates the `start` and `build` scripts to ensure a more efficient workflow. The changes enhance the coordination of documentation preprocessing tasks, improving the overall efficiency of the documentation generation process. * Added dependency installement in start * Refactor package.json scripts for improved documentation processing This commit updates the `package.json` to streamline the `start`, `build`, and `build:cloudflare` scripts by removing redundant installation of Python dependencies. The `preprocess-docs` script is now defined separately, enhancing clarity and efficiency in the documentation generation workflow. * Add type checking configurations for additional modules in mypy.ini This commit extends the mypy.ini configuration by adding ignore_missing_imports settings for several new modules, including constants and various preprocess modules. These changes aim to improve type checking flexibility and reduce false positives during type analysis, enhancing the overall development experience. * Enhance type hinting in preprocessing scripts for improved clarity This commit updates the type hints in `preprocess_destination_capabilities.py`, `preprocess_snippets.py`, and `preprocess_tuba.py` to provide more specific type information. Changes include casting for constants and refining list and dictionary type annotations. These improvements enhance code readability and maintainability, supporting better type checking and development practices. * Update dependencies and refactor documentation processing scripts This commit adds the `python-debouncer` dependency to `pyproject.toml` for improved event handling in documentation processing. Additionally, it refines the `package.json` scripts by separating the `preprocess-docs` command and optimizing the `start` script for better efficiency. The `preprocess_docs.py` script is also updated to utilize lazy imports for certain modules, enhancing performance during documentation processing. These changes collectively improve the clarity and efficiency of the documentation generation workflow. * Remove requirements.txt and clean up whitespace in preprocess_docs.py This commit deletes the `requirements.txt` file, which is no longer needed, and cleans up unnecessary whitespace in the `preprocess_docs.py` script. These changes help streamline the codebase and improve overall readability. * Update documentation for Databricks and DuckLake destinations This commit enhances the documentation for Databricks by adding a note about loading data to Managed Iceberg tables and refining the descriptions of table and column-level hints. Additionally, it updates the DuckLake documentation to recommend using a more explicit catalog name in configuration examples. These changes improve clarity and usability for users working with these destinations. * Enhance documentation for various destinations and add requirements.txt for project dependencies * Fix typo in DuckDB documentation regarding spatial extension installation * Remove destination capabilities section from AWS Athena documentation * Feat/adds workspace (#3171) * ports toml config provider with profiles * supports run context with profiles * separates pluggy hooks from impls, uses pyproject and __plugins__.py for self-plugging * implements workspace run context with profiles and basic cli * displays workspace name and profile name before executing cli commands if run context supports profiles * exposes dlt.current.workspace() * converts run context protocol into abstract class * fixes plugins tests * refactors _workspace: private and public modules * adds workspace test cases * launches workspace and pipeline mpc with cli, sse by default * tests basic workspace behaviors * refactors code to switch context and profile * adds default profile to run context interface * ports pipeline and oss mcp, changes derivation structure * adds safeguards and tests to workspace cleanup cli helper * adds run_context to SupportsPipeline, checks run_context change on pipeline activation * adds mcp dependency to workspace extra, fixes types * renames test fixture * mcp export tweak * updates cli reference and common ci workflow * disables dlt-plus deps in ci * removes df from mcp tools, fixes workspace tests * fixes tests * Fix build scripts for Cloudflare integration in package.json * Fix preprocess-docs:cloudflare script to use python directly instead of uv * Restore preprocess-docs scripts in package.json for consistency * Update preprocess-docs:cloudflare script to include requirements installation * Update preprocess-docs:cloudflare script to include requirements installation * Add __init__.py file to tools directory * Refactor import statements to use relative imports in preprocessing scripts * Update import statements to use absolute paths for consistency across preprocessing scripts * Add mypy configuration for additional modules to ignore missing imports * Removed duplicated line * Add mypy configuration to ignore missing imports for tools module * Update ducklake.md * temporarily add netlify build command back * fix typing in snippets and update mypy.ini a bit * reverse build commands back to previous order * Fixed watch by changing implementation into queue and locks * Refactor package.json for improved script organization and maintainability * Add mypy configuration to ignore missing imports for additional modules * Add mypy configuration to ignore missing imports for more modules * Remove mypy configuration for preprocess_examples to streamline settings * Update mypy configuration: rename dlt hub section to dlt plus and remove unused preprocess settings * Refactor import statements to remove 'tools' prefix, improving module accessibility across preprocess scripts * Refactor import statements in preprocessing scripts to use relative imports, enhancing module organization and consistency * Refactor import statements in preprocessing scripts to use absolute imports from the tools module, improving clarity and consistency across the codebase * Update mypy.ini * Fix formatting in _generate_doc_link function by removing unnecessary whitespace in return statement for improved readability * fix linting and script execution * remove sleeping after preprocessing in favor of predictable processing before docusaurus launch * remove unnecessary whitespace in preprocess_docs.py for cleaner code * Update deployment script in package.json and enhance file change handling in preprocess_docs.py; remove obsolete preprocess_change.py * Refactor preprocess_docs.py to improve file change handling; replace change counter with a pending changes flag for better processing control and enhance logging for file modifications. * Enhance capabilities table generation in preprocess_destination_capabilities.py by adding a descriptive header and introductory text for improved clarity and context. * Remove destination capabilities sections from multiple destination documentation files for consistency and clarity. * Fix formatting in start script of package.json for improved readability * Enhance capabilities table generation by improving destination name formatting; streamline file change handling in preprocess_docs.py by removing unnecessary print statements. * update files incrementally only when in watcher mode make tuba link generation random per day with a seed * fix duplicate page at examples error * remove outdated docs deploy action * add build docs action for better debugability * revert unintential change to md file * add info about where capabilities links should go * refactor: improve documentation link generation for capabilities * fix: update documentation link for replace strategy and improve link formatting --------- Co-authored-by: rudolfix <rudolfix@rudolfix.org> Co-authored-by: dave <shrps@posteo.net>
1 parent 986978d commit 773a649

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1479
-793
lines changed

.github/workflows/build_docs.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: docs | build docs
2+
3+
on:
4+
workflow_call:
5+
workflow_dispatch:
6+
7+
jobs:
8+
build_docs:
9+
name: docs | build docs
10+
runs-on: ubuntu-latest
11+
12+
steps:
13+
- name: Check out
14+
uses: actions/checkout@master
15+
16+
- uses: pnpm/action-setup@v2
17+
with:
18+
version: 9.13.2
19+
20+
- uses: actions/setup-node@v5
21+
with:
22+
node-version: '22'
23+
24+
- name: Setup Python
25+
uses: actions/setup-python@v5
26+
with:
27+
python-version: "3.11"
28+
29+
- name: Install node dependencies
30+
run: cd docs/website && npm install
31+
32+
- name: Install python dependencies
33+
run: cd docs/website && pip install -r requirements.txt
34+
35+
- name: Build docs
36+
run: cd docs/website && npm run build:cloudflare

.github/workflows/main.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,11 @@ jobs:
3030
test_docs_snippets:
3131
name: test snippets in docs
3232
uses: ./.github/workflows/test_docs_snippets.yml
33+
34+
# NOTE: we build docs the same way as on cloudflare, so we can catch problems early
35+
build_docs:
36+
name: build docs
37+
uses: ./.github/workflows/build_docs.yml
3338

3439
lint:
3540
name: lint on all python versions

.github/workflows/tools_deploy_docs.yml

Lines changed: 0 additions & 20 deletions
This file was deleted.

docs/package-lock.json

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/website/docs/dlt-ecosystem/destinations/athena.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ keywords: [aws, athena, glue catalog]
88

99
The Athena destination stores data as Parquet files in S3 buckets and creates [external tables in AWS Athena](https://docs.aws.amazon.com/athena/latest/ug/creating-tables.html). You can then query those tables with Athena SQL commands, which will scan the entire folder of Parquet files and return the results. This destination works very similarly to other SQL-based destinations, with the exception that the merge write disposition is not supported at this time. The `dlt` metadata will be stored in the same bucket as the Parquet files, but as iceberg tables. Athena also supports writing individual data tables as Iceberg tables, so they may be manipulated later. A common use case would be to strip GDPR data from them.
1010

11+
<!--@@@DLT_DESTINATION_CAPABILITIES athena-->
12+
1113
## Install dlt with Athena
1214
**To install the dlt library with Athena dependencies:**
1315
```sh

docs/website/docs/dlt-ecosystem/destinations/bigquery.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ keywords: [bigquery, destination, data warehouse]
1313
```sh
1414
pip install "dlt[bigquery]"
1515
```
16+
<!--@@@DLT_DESTINATION_CAPABILITIES bigquery-->
1617

1718
## Setup guide
1819

@@ -228,8 +229,8 @@ BigQuery supports the following [column hints](../../general-usage/schema#tables
228229

229230
:::warning
230231
**Deprecation Notice:**
231-
Per-column `cluster` hints are deprecated and will be removed in a future release.
232-
**To migrate, use the `cluster` argument of the `bigquery_adapter` instead.**
232+
Per-column `cluster` hints are deprecated and will be removed in a future release.
233+
**To migrate, use the `cluster` argument of the `bigquery_adapter` instead.**
233234
See the [example below](#use-an-adapter-to-apply-hints-to-a-resource) for how to specify clustering columns with the adapter.
234235
:::
235236

docs/website/docs/dlt-ecosystem/destinations/clickhouse.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ keywords: [ clickhouse, destination, data warehouse ]
1414
pip install "dlt[clickhouse]"
1515
```
1616

17+
<!--@@@DLT_DESTINATION_CAPABILITIES clickhouse-->
18+
1719
## Setup guide
1820

1921
### 1. Initialize the dlt project

docs/website/docs/dlt-ecosystem/destinations/databricks.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ Databricks supports both **Delta** (default) and **Apache Iceberg** table format
2626
pip install "dlt[databricks]"
2727
```
2828

29+
<!--@@@DLT_DESTINATION_CAPABILITIES databricks-->
30+
2931
## Set up your Databricks workspace
3032

3133
To use the Databricks destination, you need:

docs/website/docs/dlt-ecosystem/destinations/destination.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ To install `dlt` without additional dependencies:
1919
pip install dlt
2020
```
2121

22+
<!--@@@DLT_DESTINATION_CAPABILITIES destination-->
23+
2224
## Set up a destination function for your pipeline
2325

2426
The custom destination decorator differs from other destinations in that you do not need to provide connection credentials, but rather you provide a function that gets called for all items loaded during a pipeline run or load operation. With the `@dlt.destination`, you can convert any function that takes two arguments into a `dlt` destination.

docs/website/docs/dlt-ecosystem/destinations/dremio.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ keywords: [dremio, iceberg, aws, glue catalog]
1212
pip install "dlt[dremio,s3]"
1313
```
1414

15+
<!--@@@DLT_DESTINATION_CAPABILITIES dremio-->
16+
1517
## Setup guide
1618
### 1. Initialize the dlt project
1719

0 commit comments

Comments
 (0)