1. Making workflow provenance FAIR across workflow systems with Workflow Run RO-Crate
- Author
-
Leo, Simone, Rodríguez-Navas, Laura, Fernández, José M., De Geest, Paul, Pireddu, Luca, Crusoe, Michael R., Garijo, Daniel, Colonnelli, Iacopo, Sirvent, Raül, and Soiland-Reyes, Stian
- Subjects
Nextflow ,RO-Crate ,Galaxy ,workflow ,provenance ,scientific workflow ,CWL ,FAIR - Abstract
Workflow Run RO-Crate (https://w3id.org/ro/wfrun/), is a set of profiles of RO-Crate (https://doi.org/10.3233/DS-210053) that capture workflow provenance in a lightweight FAIR data package, in order to support traceability, reproducibility and interoperable description of diverse computational analysis. We implemented the profile in multiple workflow systems, including Galaxy, COMPSs, StreamFlow, WfExS, Sapporo, Autosubmit. The command line tool runcrate (https://pypi.org/project/runcrate/) can convert from the precursor CWLProv (https://doi.org/10.1093/gigascience/giz095), and display or validate crates according to the profiles, with (prototype) repeat of a previous execution. The profiles are organised by increasing levels of details, allowing gradual adaptation, ranging from arbitrary sets of computational processes (implied user-driven workflows), through a WorkflowHub-compatible crate with workflow definition, to a full provenance trace for each step, their input and output values. This use of RO-Crate allows the contextualization of a computational workflow and its execution, e.g. relating to people, organisations, projects, funding, data sources and wider research questions and studies. For instance, in the TRE-FX project (https://trefx.uk/) such crates are used as a lingua franca across federated Trusted Research Environments, as it can also address the security and review aspects. Workflow Run working group collaborates across ELIXIR nodes and EU-wide projects (BY-COVID, EOSC-Life, EJP-RD, EuroHPC, eFlows4HP, EuroScienceGateway, BioExcel-2) as well as national projects. After this first stable release of the profiles we are now expanding with more workflow systems, and tracking computational resources such as containers and memory usage.
- Published
- 2023
- Full Text
- View/download PDF