Skip to content

Can not define tasks as a variable and then overwrite them #2436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
keviloper opened this issue Mar 5, 2025 · 3 comments
Open

Can not define tasks as a variable and then overwrite them #2436

keviloper opened this issue Mar 5, 2025 · 3 comments
Labels
DABs DABs related issues

Comments

@keviloper
Copy link

Describe the issue

I have two jobs that will share common tasks. This is why I created a complex variable for those tasks. Additionally I want to overwrite some cluster settings per target. With that setup I am getting an error:
Error: cannot merge string with sequence

Configuration

Here is the minimal configuration that will produce the error.

bundle:
  name: testing-bundle
  databricks_cli_version: ">= 0.229.0"

variables:
  job_tasks:
    description: "Common tasks"
    type: complex
    default:
      - task_key: Task_One
        spark_python_task:
          python_file: ./main_dummy.py
        job_cluster_key: Small
      - task_key: Task_Two
        depends_on:
          - task_key: Task_One
        run_if: "ALL_SUCCESS"
        spark_python_task:
          python_file: ./main_dummy.py
        job_cluster_key: Small

resources:
  jobs:
    job_one:
      tasks: ${var.job_tasks}
    job_two:
      tasks: ${var.job_tasks}


targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://...
    resources:
      jobs:
        job_one:
          tasks:
            - task_key: Task_One
              job_cluster_key: Medium
          job_clusters:
            - job_cluster_key: Small
              new_cluster:
                spark_version: 15.4.x-scala2.12
                node_type_id: Standard_DS3_v2
                data_security_mode: USER_ISOLATION
                runtime_engine: STANDARD
                num_workers: 1
            - job_cluster_key: Medium
              new_cluster:
                spark_version: 14.3.x-scala2.12
                node_type_id: Standard_DS5_v2
                data_security_mode: USER_ISOLATION
                runtime_engine: STANDARD
                num_workers: 1
        job_two:
          job_clusters:
            - job_cluster_key: Small
              new_cluster:
                spark_version: 15.4.x-scala2.12
                node_type_id: Standard_DS3_v2
                data_security_mode: USER_ISOLATION
                runtime_engine: STANDARD
                num_workers: 2

When I replace the tasks keys with the following, then it will work but we have to go through every index of the sequence.

tasks:
  - ${var.job_tasks[0]}
  - ${var.job_tasks[1]}

Steps to reproduce the behavior

  1. Use the above databricks.yaml
  2. Run databricks bundle validate or databricks bundle deploy
  3. See error

Expected Behavior

Validation and deployment should work, because the variable is a sequence and not a string

Actual Behavior

We are getting an error: Error: cannot merge string with sequence

OS and CLI version

OS: Ubuntu 22
Databricks CLI v0.243.0

Is this a regression?

Did this work in a previous version of the CLI? - No

Debug Logs

18:13:54 Info: start pid=27961 version=0.243.0 args="databricks, bundle, validate, --log-level=debug"
18:13:54 Debug: Found bundle root at /path/to/folder (file /path/to/databricks.yaml) pid=27961
18:13:54 Info: Phase: load pid=27961
18:13:54 Debug: Apply pid=27961 mutator=EntryPoint
18:13:54 Debug: Apply pid=27961 mutator=scripts.preinit
18:13:54 Debug: No script defined for preinit, skipping pid=27961 mutator=scripts.preinit
18:13:54 Debug: Apply pid=27961 mutator=ProcessRootIncludes
18:13:54 Debug: Apply pid=27961 mutator=VerifyCliVersion
18:13:54 Debug: Apply pid=27961 mutator=EnvironmentsToTargets
18:13:54 Debug: Apply pid=27961 mutator=ComputeIdToClusterId
18:13:54 Debug: Apply pid=27961 mutator=InitializeVariables
18:13:54 Debug: Apply pid=27961 mutator=DefineDefaultTarget(default)
18:13:54 Debug: Apply pid=27961 mutator=PythonMutator(load)
18:13:54 Debug: Apply pid=27961 mutator=validate:unique_resource_keys
18:13:54 Debug: Apply pid=27961 mutator=SelectDefaultTarget
18:13:54 Debug: Apply pid=27961 mutator=SelectDefaultTarget mutator=SelectTarget(dev)
18:13:54 Debug: Error: cannot merge string with sequence pid=27961 mutator=SelectDefaultTarget mutator=SelectTarget(dev)
18:13:54 Debug: Error: cannot merge string with sequence pid=27961 mutator=SelectDefaultTarget
Error: cannot merge string with sequence

Name: testing-bundle

Found 1 error
18:13:54 Debug: failed execution pid=27961 exit_code=1
@keviloper keviloper added the DABs DABs related issues label Mar 5, 2025
Copy link

github-actions bot commented May 5, 2025

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the Stale label May 5, 2025
@keviloper
Copy link
Author

🆙

@github-actions github-actions bot removed the Stale label May 6, 2025
denik added a commit that referenced this issue May 9, 2025
denik added a commit that referenced this issue May 9, 2025
@denik
Copy link
Contributor

denik commented May 9, 2025

Hi @keviloper , thanks for the report. This is a known issue: we currently first merge targets then resolve variables. This works when variables are scalar but breaks when variables are complex. We have a proposal internally on how to address it, we're going to discuss the priority of it.

github-merge-queue bot pushed a commit that referenced this issue May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DABs DABs related issues
Projects
None yet
Development

No branches or pull requests

2 participants