Description
Problem
The logging capabilities in both WES and TES currently do not provide good support for structured information, making it difficult for clients to interpret workflow and task run logs. It is not well defined where implementations log what and how, only very few properties are required, support for external (GA4GH or third party) schemas is missing and the the expected logging behavior between WES and TES differs perhaps more than it needs to.
Proposed solution
This issue is a sandbox for discussung, in a single place, improved logging capabilities provided by the WES and TES specs. For reference, relevant schemas are included in section "Additional context".
Improvements could consist of, for example (not exhaustive):
- Streamlined, consistent log handling withing and across WES and TES
- Extended documentation of expected logging behavior
- Support for (more) structured logs, including those of external schemas
The primary goal of the improvements are to make WES and TES logs more useful for clients. Increase maintainability of the specifications and a better integration of WES and TES behavior are secondary goals.
Please open individual issues that address individual improvements, then link them in the task list below (more info on GitHub task lists).
In this way, we can keep track of all log-related issues and have a single place where we can consider how all proposed changes would impact one another.
Task list
- Add support for external logging schemas #215
- Consider using externalized common logging base model for WES and TES #217
Additional context
This section includes references to relevant schemas of the most recent WES (1.1.0) and TES (1.1) releases including minimal examples, preliminary notes and obvious issues that caught my eye when compiling this information.
Relevant WES schemas
Log
- Usage:
- Schema for the
run_log
property inRunLog
, which is the schema for the response ofGET /runs/{run_id}
- Alternative to
TaskLog
as array item schema for the deprecatedtask_logs
property in 'RunLog', which is the schema for the response ofGET /runs/{run_id}
- Schema that the 'TaskLog' schema inherits from
- Schema for the
- Schema:
Log: title: Log type: object properties: name: type: string description: The task or workflow name cmd: type: array items: type: string description: The command line that was executed start_time: type: string description: When the command started executing, in ISO 8601 format "%Y-%m-%dT%H:%M:%SZ" end_time: type: string description: When the command stopped executing (completed, failed, or cancelled), in ISO 8601 format "%Y-%m-%dT%H:%M:%SZ" stdout: type: string description: A URL to retrieve standard output logs of the workflow run or task. This URL may change between status requests, or may not be available until the task or workflow has finished execution. Should be available using the same credentials used to access the WES endpoint. stderr: type: string description: A URL to retrieve standard error logs of the workflow run or task. This URL may change between status requests, or may not be available until the task or workflow has finished execution. Should be available using the same credentials used to access the WES endpoint. exit_code: type: integer description: Exit code of the program format: int32 system_logs: type: array items: type: string description: |- System logs are any logs the system decides are relevant, which are not tied directly to a workflow. Content is implementation specific: format, size, etc. System logs may be collected here to provide convenient access. For example, the system may include an error message that caused a SYSTEM_ERROR state (e.g. disk is full), etc. description: Log and other info
- Minimal example:
{}
TaskLog
- Usage:
- Schema for the response of
GET /runs/{run_id}/tasks/{task_id}
- Array item schema for the
task_logs
property inTaskListResponse
, which is itself the schema for the response ofGET /runs/{run_id}/tasks
- Alternative to
Log
as array item schema for the deprecatedtask_logs
property inRunLog
, which is the schema for the response ofGET /runs/{run_id}
- Schema for the response of
- Schema:
TaskLog: title: TaskLog allOf: - $ref: '#/components/schemas/Log' - type: object properties: id: type: string description: A unique identifier which may be used to reference the task system_logs: type: array items: type: string description: |- System logs are any logs the system decides are relevant, which are not tied directly to a task. Content is implementation specific: format, size, etc. System logs may be collected here to provide convenient access. For example, the system may include the name of the host where the task is executing, an error message that caused a SYSTEM_ERROR state (e.g. disk is full), etc. tes_uri: type: string description: An optional URL pointing to an extended task definition defined by a [TES api](https://github.com/ga4gh/task-execution-schemas) required: - id - name description: Runtime information for a given task
- Minimal example:
{ "id": "some_id", "name": "some_name" }
- Notes:
- Extends
Log
schema with additional propertiesid
,system_logs
andtes_uri
; unlikeLog
, which has no required properties at all, requiresTaskLog
-specificid
property andname
property inherited fromLog
- Definition of
system_logs
is redundant, because it is already inherited fromLog
where it is defined almost identically; the only difference is the wording in the description, which could easily be generalized
- Extends
Relevant TES schemas
tesTaskLog
- Usage:
- Array item schema for the
logs
property intesTask
, which is the schema for the body ofPOST /tasks
and the response ofGET /tasks/{id}
- Array item schema for the
- Schema:
tesTaskLog: required: - logs - outputs type: object properties: logs: type: array description: Logs for each executor items: $ref: '#/components/schemas/tesExecutorLog' metadata: type: object additionalProperties: type: string description: Arbitrary logging metadata included by the implementation. example: host: worker-001 slurmm_id: 123456 start_time: type: string description: When the task started, in RFC 3339 format. example: 2020-10-02T10:00:00-05:00 end_time: type: string description: When the task ended, in RFC 3339 format. example: 2020-10-02T11:00:00-05:00 outputs: type: array description: |- Information about all output files. Directory outputs are flattened into separate items. items: $ref: '#/components/schemas/tesOutputFileLog' system_logs: type: array description: |- System logs are any logs the system decides are relevant, which are not tied directly to an Executor process. Content is implementation specific: format, size, etc. System logs may be collected here to provide convenient access. For example, the system may include the name of the host where the task is executing, an error message that caused a SYSTEM_ERROR state (e.g. disk is full), etc. System logs are only included in the FULL task view. items: type: string description: TaskLog describes logging information related to a Task.
- Minimal example:
{ "logs": [], "outputs": [] }
- Notes:
- Unlike WES, has
metadata
property to provide arbitrary task-level logging information as key-value pairs
- Unlike WES, has
tesExecutorLog
- Usage:
- Array item schema for the
logs
property intesTaskLog
, which is the schema for thelogs
property intesTask
, which is itself the schema for the body ofPOST /tasks
and the response ofGET /tasks/{id}
- Array item schema for the
- Schema:
tesExecutorLog: required: - exit_code type: object properties: start_time: type: string description: Time the executor started, in RFC 3339 format. example: 2020-10-02T10:00:00-05:00 end_time: type: string description: Time the executor ended, in RFC 3339 format. example: 2020-10-02T11:00:00-05:00 stdout: type: string description: |- Stdout content. This is meant for convenience. No guarantees are made about the content. Implementations may chose different approaches: only the head, only the tail, a URL reference only, etc. In order to capture the full stdout client should set Executor.stdout to a container file path, and use Task.outputs to upload that file to permanent storage. stderr: type: string description: |- Stderr content. This is meant for convenience. No guarantees are made about the content. Implementations may chose different approaches: only the head, only the tail, a URL reference only, etc. In order to capture the full stderr client should set Executor.stderr to a container file path, and use Task.outputs to upload that file to permanent storage. exit_code: type: integer description: Exit code. format: int32 description: ExecutorLog describes logging information related to an Executor.
- Minimal example:
{ "exit_code": [] }
- Notes:
- More or less a subset of WES
Log
schema withoutsystems_logs
(available upstream attesTask.logs
),name
(available upstream intesTask
) andcmd
(available ascommand
properties intesTask.executors[]
), and withexit_code
being required - Handling STDOUT and STDERR differs significantly from WES; full STDOUT and STDERR are expected to be provided via
tesExecutor.stdout
andtesExecutor.stderr
(set container file paths), andtesTask.outputs
(file upload)
- More or less a subset of WES
@patmagee @kellrott @vsmalladi @lbeckman314 @briandoconnor @dglazer