Awesome
cwlprov
Profile for provenance research object of a CWL workflow run.
Cite as
Peer-reviewed paper about CWLProv:
Farah Zaib Khan, Stian Soiland-Reyes, Richard O Sinnott, Andrew Lonie, Carole Goble, Michael R Crusoe (2019):
Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv.
GigaScience 8(11):giz095 https://doi.org/10.1093/gigascience/giz095
Quicklinks
- CWLProv Examples:
- revsort-run-1 - execution of revsort.cwl (CWLProv 0.4.0)
- RunTimeResearchObject execution of a sequence alignment workflow (CWLProv 0.2.0) - from 10.5281/10.5281/zenodo.1215611
- CWLProv posters:
- CWLProv slides:
- CWLProv Retrospective provenance capture and its challenges, BOSC 2018
- Reproducible BioCompute Objects using Common Workflow Language, BioCompute Object
- CWL Research Objects, ELIXIR
- Challenges in interoperable provenance capture, Research Data Alliance
- The Archive and Package (arcp) URI scheme
- Papers:
Overview
CWLProv is an informal profile to define how to record provenance of a workflow run (typically CWL or Nextflow), captured as a research object using Linked Data standards.
There are three parts to this profile:
- CWLProv BagIt, how the resources of an execution are packaged using BagIt
- CWLProv Research Object, how the resources of an execution are related in an RO
- CWLProv PROV, how the workflow execution provenance is modelled in W3C PROV
This repository may later also include formal profiles for computational validation, e.g. BagIt profile of included resources, ShEx for manifest content, and PROV Template to document PROV structures.
The CWLProv white paper describes the background and motivation for this profile. For the avoidance of doubt, from CWLProv 0.3.0 this GitHub repository is authoritative of CWLProv specifications.
Known implementations
- cwltool --provenance (reference implementation)
- cwlprov-py (command line tool to inspect)
- nextflow -with-prov (work in progress, approaching CWLProv without CWL)
- toil (planned)
License
This repository is distributed under Apache License, version 2.0.
See the file LICENSE.txt for details, and NOTICE for required notices.
Contributing
CWLProv is maintained at https://github.com/common-workflow-language/cwlprov/
Feel free to raise an issue or a pull request to contribute to CWLProv. Contributions are assumed to be covered by section 5 of the Apache License.
You may also want to contribute a corresponding issue or pull request in the cwltool reference implementation, in particular cwltool/provenance.py and documentation on cwltool --provenance support.
For an informal CWLProv discussion with other developers, join the (relatively quiet) Gitter room common-workflow-language/cwlprov, or the (more busy) common-workflow-language/common-workflow-language.
Code of Conduct
The CWL Project is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, age, race, or religion. We do not tolerate harassment of participants in any form. This code of conduct applies to all CWL Project spaces, including the Google Group, the Gitter chat room, the Google Hangouts chats, both online and off. Anyone who violates this code of conduct may be sanctioned or expelled from these spaces at the discretion of the leadership team.
For more details, see our Code of Conduct.
Requirements Language
The key words MUST
, MUST NOT
, REQUIRED
, SHALL
, SHALL NOT
, SHOULD
, SHOULD NOT
, RECOMMENDED
, MAY
, and
OPTIONAL
in documents of this repository are to be interpreted
as described in RFC 2119.
Versions
- 0.6.0 https://w3id.org/cwl/prov/0.6.0 Adds
metadata/logs
, no longer snapshot input files (they are also underdata/
) (introduced in cwltool 1.0.20181012180214) - 0.5.0 https://w3id.org/cwl/prov/0.5.0 Adds
workflow/primary-output.json
(introduced in cwltool 1.0.20180912090223) - 0.4.0 https://w3id.org/cwl/prov/0.4.0 Declares directories and secondary files (introduced in cwltool 1.0.20180819175200)
- 0.3.0 https://w3id.org/cwl/prov/0.3.0 Semantic versioning of CWLProv (introduced in cwltool 1.0.20180711112827)
- 0.2.0 https://w3id.org/cwl/prov/0.2.0 Prototype as exemplified in https://doi.org/10.5281/zenodo.1215611 - (Note: No semantic versioning,
conformsTo
https://doi.org/10.5281/zenodo.1208477 on PROV files) - 0.1.0 https://w3id.org/cwl/prov/0.1.0 Prototype as exemplified in https://doi.org/10.5281/zenodo.1208478 (Note: No self-declaration of CWLProv version)
CWLProv is versioned using Semantic Versioning, following the pattern MAJOR.MINOR.PATCH
(e.g. 1.2.0
).
To determine version compatibility we consider the packaging of a CWLProv RO as a kind of "API". Examples of changes to CWLProv:
- Major version change: Removal of resource type, change of format of PROV, removing annotations, changing namespaces, removing PROV statement patterns
- Minor version change: Adding other resources, adding annotations, additional properties, changing entity identifier scheme, change of file paths in RO, minor change of underlying syntax and package version, adding/augmenting PROV statement patterns, conformance to PROV constraints
- Patch version change: Fixing syntactical typos (e.g. invalid or inefficient JSON-LD), inconsistencies in textual language, adding inferred PROV statements
This means that consumers of CWLProv can make strong assumptions on backwards and forwards compatibility:
- Major: Unsupported major versions can't be safely parsed
- Minor: Can safely parse (but not reproduce) newer versions. Parsing older versions is safe if later CWLProv additions are handled as optional.
- Patch: Differences can usually be safely ignored
Unless a patch version is affecting the output, the declared profile SHOULD have patch version 0
even if the code was implemented with a later CWLProv.
Tip: You may spot that change of file paths is classified as minor, that is because paths can be found dynamically by following links from the manifest, its annotations and the PROV traces. This is similar to REST principles where URI templates should not be assumed, but followed from links._
The current version of CWLProv have major 0
, indicating that disruptive changes may occur before the profile stabilize at 1.x.y
.
Each CWLProv version has a w3id.org
permalink that SHOULD be declared inside the RO to indicate its conformance.