Skip to content

join_paths' interaction with Files that are *really* directories is confusing #721

@adamnovak

Description

@adamnovak

The new WDL 1.2 join_paths() function comes in overloads of:

File join_paths(File, String)
File join_paths(File, Array[String]+)
File join_paths(Array[String]+)

But this isn't a urljoin()-like operation where dirname/file1.txt and file2.txt gives dirname/file2.txt. Instead, the File-type argument for the first two overloads "must specify a directory", which I'm having trouble interpreting. Specifically I am confused about how the rules about localizing and de-localizing File values as they move in and out of tasks are applied to them, and how the behavior of one of these directory-specifying File values differs from that of a Directory initialized with the same string value.

The description of the File type doesn't really explain what it means for a File to actually specify a directory and not a file. It does say that a string literal for a non-optional File pointing to a nonexistent path is an error, but it doesn't say what happens if the path exists but does not point to a file (and instead points to, e.g., a directory).

The example for join_paths manages to construct a non-null File that points to a non-file, in the task inputs' default values:

File abs_file = "/usr"

These values seem mildly cursed and raise a lot of unanswered questions. Do these work identically to Directory type values, and bring along the directory contents recursively when passed between workflows and tasks, or do they not?

Even if they don't bring the directory contents with them, do they follow the same localization rules about required relationships to sibling Files and Directorys that share the same parent, and about whether the execution engine is allowed to adjust the text of the path as seen within a task, that would be applied to File values referring to files and Directory values referring to directories?

Specifically, if I make a call like:

call some_task {
inputs:
    some_directory_input="/usr/bin", # This input has a Directory type
    some_other_directory_input="/usr/share", # This input has a Directory type
    some_file_input="/usr/bin" # This input has a File type
}]

My reading of the localization rules is that I should have the contents of the host /usr/bin available in some_directory_input, and that the paths I get for some_directory_input and some_file_input must be the same path. But that would really suggest that join_paths() on some_file_input should be reading the host version of that directory, while the description of the function seems written like I should expect to see the container version.

Metadata

Metadata

Assignees

Labels

K-clarification(Kind) Clarifications regarding the WDL specification.S03-pre-rfc-discussion(State) A discussion that happens before an RFC is proposed.T-types(Topic) Issues related to the WDL type system.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions