Awesome
<a href="https://wfcommons.org" target="_blank"><img src="https://wfcommons.org/images/wfcommons-horizontal.png" width="350" /></a>
WfFormat: The WfCommons JSON Schema
- Current schema version:
1.5
- Schema file:
wfcommons-schema.json
- Schema validator:
wfcommons-validator.py
(see documentation at the end of this file)
Documentation
This documentation provides an overview of the WfCommons JSON schema. Although this documentation attempts to cover all aspects of the schema, we strongly recommend the use of a JSON schema validator before using your own workflow execution instances or workflow descriptions. Required properties are identified with a marked checkbox symbol.
General Instance Properties
-
name
: Representative name for the instance name. -
description
: A concise description of the instance. It should aid researchers to understand the purpose of the execution. -
createdAt
: Schema creation date in the ISO 8601 format (e.g.,2020-03-20T15:19:28-08:00
). -
schemaVersion
: Version of the schema from an enumerate. -
runtimeSystem
: Anobject
to describe the runtime system used to execute the workflow. -
workflow
: Anobject
to describe the workflow characteristics and performance metrics. -
author
: Anobject
to describe the author/institution who created/generated the instance.
Runtime System Property
The runtimeSystem
property documents the runtime system used to run the workflow. It has the following sub-properties:
-
name
: runtime system name. -
version
: runtime system version. -
url
: URL for the main runtime system website.
Workflow Property
The workflow
* property is the core element of the instance file. It contains the workflow structure (tasks, depenencies, and files), as well as task characteristics and performance information. It is composed by the following sub-properties:
-
specification
: Workflow specification (does not contain any execution information). -
execution
: Workflow execution information.
Specification Property
Tasks Property (Specification)
This property lists all tasks of the workflow describing their relationships and file dependencies. Each task is described as an object
with 5 properties:
-
name
: Task name (often set to the name of the program executed by a task or to some notion of task type or category). -
id
: Unique task ID (e.g., ID0000001). -
parents
: List of parent tasks (reference to other workflow tasks by theirid
). -
children
: List of children tasks (reference to other workflow tasks by theirid
). -
inputFiles
: List of the input file IDs -
outputFiles
: List of output file IDs
Files Property (Specification)
This property lists all data files in the workflow that are used as input/output by tasks. Each file is described as an object
with 2 properties:
-
id
: Unique file ID (e.g., a file name, a path, an arbitrary string) -
sizeInBytes
: File size in bytes
Execution Property
-
makespanInSeconds
: Workflow overall execution time in seconds. -
executedAt
: Workflow start timestamp in the ISO 8601 format (e.g.,2020-04-01T15:10:53-08:00
). -
tasks
: List of workflow tasks. -
machines
: List of compute machines used for running the workflow tasks.
Tasks Property (Execution)
This property lists all tasks of the workflow describing their characteristics and performance metrics. Each task is described as an object
property and is composed of 11 properties:
-
id
: Task unique ID (e.g., ID0000001). -
runtimeInSeconds
: Task runtime in seconds. -
executedAt
: Task start timestamp in the ISO 8601 format (e.g.,2020-04-01T15:10:53-08:00
). -
command
: Anobject
to describe the task’s command. -
coreCount
: Number of cores required by the task, possibly fractional (e.g.,1.5
). -
avgCPU
: Average CPU utilization in % (e.g,93.78
). -
readBytes
: Total bytes read. -
writtenBytes
: Total bytes written. -
memoryInBytes
: Memory (resident set) size of the process in bytes. -
energyInKWh
: Total energy consumption in kWh. -
avgPowerInW
: Average power consumption in W. -
priority
: Task priority as an integer value. -
machines
: List of node names of machines on which the task executed.
Command Property (Execution)
The command
property describes the program and arguments used by a task. It is composed of the following properties:
-
program
: Program name. -
arguments
: List of task arguments.
Machines Property (Execution)
The machines
property lists all different machines that were used for workflow tasks execution. It is composed of the following properties:
-
system
: Machine system (linux
,macos
,windows
). -
architecture
: Machine architecture (e.g.,x86_64
). -
nodeName
: Machine node name. -
release
: Machine release. -
memoryInBytes
: Total RAM memory in bytes. -
cpu
: Anobject
to describe the machine's CPU information.
CPU Property (Execution)
The cpu
property describes the used CPUs. It has the following sub-properties:
-
coreCount
: Number of CPU cores - supports fractions of cores expressed as float numbers. -
speedInMHz
: CPU speed in MHz. -
vendor
: CPU vendor.
Author Property
The author
property should contain the contact information about the person or team who created the instance. It is composed of the following properties:
-
name
: Author name. -
email
: Author email. -
institution
: Author institution. -
country
: Author country (preferably country code, ISO ALPHA-2 code.
Validator
WfCommons provides a Python-based instance validator script for verifying the syntax of JSON instance files, as well as their semantics, e.g., whether all files and parents IDs refer to valid entries.
Prerequisite: The validator script requires the Python's jsonschema
and
requests
modules, which can be installed as follows:
$ pip install jsonschema
$ pip install requests
The validator script signature is defined as follows:
usage: wfcommons-validator.py [-h] [-s SCHEMA_FILE] [-d] JSON_FILE
Validate JSON file against wfcommons-schema.
positional arguments:
JSON_FILE JSON instance file
optional arguments:
-h, --help show this help message and exit
-s SCHEMA_FILE JSON schema file
-d, --debug Print debug messages to stderr