-
Notifications
You must be signed in to change notification settings - Fork 312
Description
The table of comparison operators says you can compare File
values with ==
. It doesn't actually say you can compare Directory
values, but nothing seems to explain why you wouldn't be able to, so that looks like an omission.
The rules about file localization say that two local File
values that originate from the same "parent directory" need to be localized to a shared parent directory for the task, and that file basenames need to be preserved.
So say I have three input File
values that a workflow takes. It sends them all to the same task. The task can use conditional if...then...else
expressions to make the Bash code executed depend on the WDL-level equality of the File
values as seen by the task command. The Bash code can look at the string substituted-in values of the files and do Bash-level string comparisons on them. And the workflow can also use conditional blocks or expressions to make what the workflow executes depend on the equality relationships between the File
values as seen by the workflow.
Say I pass in these input paths to my execution engine:
{
"wf_name.file_a": "/home/anovak/file1.txt",
"wf_name.file_b": "/home/anovak/../anovak/file1.txt",
"wf_name.file_c": "/home/anovak/file2.txt"
}
Or at workflow scope I say:
File file_a = "/home/anovak/file1.txt"
File file_b = "/home/anovak/../anovak/file1.txt"
File file_c = "/home/anovak/file2.txt"
Despite using different strings for the paths, file_a
and file_b
are the same underlying file. That file is in the same "parent directory" as file_c
, as I interpret it, because multiple paths are being used to refer to one on-disk directory data structure, with one identifying device and inode.
When localized for a task, then, file_a
and file_c
must have the same parent directory, file_b
and file_c
must have the same parent directory, and both file_a
and file_b
need to have the same basename. So file_a
and file_b
must be presented to the task as the same file: the engine can't download two different copies and present them both.
(There's also a rule that "Two inputs with the same basename must be located separately, to avoid name collision.". But I read that really as referring to two distinct files being input, not two distinct input slots. Otherwise you could never pass the same file twice to a task if you were also passing a sibling of that file.)
So in the task, file_a == file_b
should really be true
, because these two variables refer to the same file. And at the Bash level, they must be substituted with the same string and be equal by a Bash string comparison, because "the absolute path to the localized file/directory is substituted" into the command, and for one file there is only one absolute path.
But, is file_a
== file_b
true at workflow scope? Outside of a task, the files have not been localized, so nothing is constraining them to have any particular on-disk relationship, and in many implementations there might not really be any on-disk files when a comparison is made at workflow scope.
I think it is least confusing to guarantee that equality relationships are always the same between File
values before and after the localization transformation. But this means that, at workflow scope, file_a
which was initialized from one string value needs to be the same as file_b
which was initialized from a different string value. If the two File
values compare equal, then at workflow scope do they coerce to equal String
values? Or do they coerce to the nonequal String
values used to initialize them?
Similar concerns apply for Directory
, but with a Directory
it's easier to get multiple paths to the same thing, and you can even do it while having the same string for the parent path, and the same basename:
{
"wf_name.dir_a": "/home/anovak/dir1/",
"wf_name.dir_b": "/home/anovak/dir1",
"wf_name.dir_c": "/home/anovak/dir2/"
}