Skip to content
This repository was archived by the owner on Sep 4, 2024. It is now read-only.

Add support for a matrix of Airflow versions #4

Merged
merged 18 commits into from
Mar 10, 2023

Conversation

pankajkoti
Copy link
Collaborator

@pankajkoti pankajkoti commented Mar 10, 2023

While testing the Databricks workflow and notebook operators across various versions of Airflow, it was observed that quite a few attributes are not available as per their current usage available in the latest Airflow version.
To support these limitations with older Airflow versions, the PR adds the following fixes:

  1. DatabricksNotebookOperator: The task_group attribute is not available with the BaseOperator. For that, we fetch the task group from the TaskGroupContext
  2. DatabricksWorkflowTaskGroup: Like in DatabricksNotebookOperator, we need to fetch the task group from the TaskGroupContext and additionally, the dag attribute is not available with the TaskGroup base class which we need to fetch from the DagContext
  3. plugin.py module: The airflow_flask_app utility is unavailable. So, we use the current_app from the flask context to get the Airflow flask instance. Additionally, the task_group is not available in the operator. So, we have added a custom implementation to fetch the task group for the legacy Airflow version with also fixing the way how we construct URLs for the views.
  4. conftest.py: The EmptyOperator used here is unavailable. So we make use of the DummyOperator here
  5. dag.test() is unavailable. So, we introduce local utilities to leverage the test_dag utility for us to run example DAGs without the scheduler

Additionally, in order to support running the example DAGs i.e. running concurrent Databricks jobs with the same notebooks across multiple Airflow and Python versions we have introduced the following changes:

  1. Provider unique job_cluster_key to the Databricks jobs
  2. Add the Airflow (2.2.4, 2.3, 2.4, 2.5) and Python (3.8, 3.9) versions for the matrix in ci.yml

closes: #3

Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
@@ -141,18 +163,81 @@ def _get_api_client():
)


def _get_launch_task_key(current_task_key: TaskInstanceKey, group_id: str):
if group_id:
def get_task_group_legacy(operator: BaseOperator) -> TaskGroup:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably benefit by having unit tests for this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to address this as part of a separate ticket: #14

@codecov-commenter
Copy link

Codecov Report

Patch coverage: 45.37% and project coverage change: -11.29 ⚠️

Comparison is base (c7f4e2a) 89.60% compared to head (8ffd25a) 78.31%.

Additional details and impacted files
@@             Coverage Diff             @@
##             main       #4       +/-   ##
===========================================
- Coverage   89.60%   78.31%   -11.29%     
===========================================
  Files           5        5               
  Lines         327      415       +88     
  Branches       31       47       +16     
===========================================
+ Hits          293      325       +32     
- Misses         24       73       +49     
- Partials       10       17        +7     
Impacted Files Coverage Δ
src/astro_databricks/operators/workflow.py 83.76% <20.00%> (-6.06%) ⬇️
src/astro_databricks/operators/notebook.py 92.39% <25.00%> (-3.12%) ⬇️
src/astro_databricks/plugins/plugin.py 68.47% <48.93%> (-16.57%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work, @pankajkoti !
We will improve the coverage in a separate PR. This is the main problem left.
The only change I was unsure about is keeping example DAGs inside tests - but we can move them outside if we all agree with this afterwards in a separate PR.

@pankajkoti pankajkoti merged commit c7c1707 into main Mar 10, 2023
tatiana pushed a commit that referenced this pull request Mar 13, 2023
While testing the Databricks workflow and notebook operators across various versions of Airflow, it was observed that quite a few attributes are not available as per their current usage available in the latest Airflow version.
To support these limitations with older Airflow versions, the PR adds the following fixes:
1. `DatabricksNotebookOperator`: The `task_group` attribute is not available with the `BaseOperator`. For that, we fetch the task group from the `TaskGroupContext`
2. `DatabricksWorkflowTaskGroup`: Like in `DatabricksNotebookOperator`, we need to fetch the task group from the `TaskGroupContext` and additionally, the `dag` attribute is not available with the `TaskGroup` base class which we need to fetch from the `DagContext`
3. `plugin.py` module: The `airflow_flask_app` utility is unavailable. So, we use the `current_app` from the flask context to get the Airflow flask instance. Additionally, the task_group is not available in the operator. So, we have added a custom implementation to fetch the task group for the legacy Airflow version with also fixing the way how we construct URLs for the views.
4. `conftest.py`: The `EmptyOperator` used here is unavailable. So we make use of the `DummyOperator` here
5. `dag.test()` is unavailable. So, we introduce local utilities to leverage the `test_dag` utility for us to run example DAGs without the scheduler

Additionally, in order to support running the example DAGs i.e. running concurrent Databricks jobs with the same notebooks across multiple Airflow and Python versions we have introduced the following changes:
1. Provider unique `job_cluster_key` to the Databricks jobs
2. Add the Airflow (`2.2.4`, `2.3`, `2.4`, `2.5`) and Python (`3.8`, `3.9`) versions for the matrix in `ci.yml`

Closes: #3
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Airflow version 2.2.4, 2.3 and 2.4 and enable CI run on it
3 participants