Back to Search Start Over

Perspectives on automated composition of workflows in the life sciences [version 1; peer review: 2 approved]

Authors :
Anna-Lena Lamprecht
Magnus Palmblad
Jon Ison
Veit Schwämmle
Mohammad Sadnan Al Manir
Ilkay Altintas
Christopher J. O. Baker
Ammar Ben Hadj Amor
Salvador Capella-Gutierrez
Paulos Charonyktakis
Michael R. Crusoe
Yolanda Gil
Carole Goble
Timothy J. Griffin
Paul Groth
Hans Ienasescu
Pratik Jagtap
Matúš Kalaš
Vedran Kasalica
Alireza Khanteymoori
Tobias Kuhn
Hailiang Mei
Hervé Ménager
Steffen Möller
Robin A. Richardson
Vincent Robert
Stian Soiland-Reyes
Robert Stevens
Szoke Szaniszlo
Suzan Verberne
Aswin Verhoeven
Katherine Wolstencroft
Author Affiliations :
<relatesTo>1</relatesTo>Utrecht University, 3584 CS Utrecht, The Netherlands<br /><relatesTo>2</relatesTo>Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands<br /><relatesTo>3</relatesTo>French Institute of Bioinformatics, 91057 Évry, France<br /><relatesTo>4</relatesTo>University of Southern Denmark, 5230 Odense M, Denmark<br /><relatesTo>5</relatesTo>University of Virginia, Charlottesville, VA, 22903, USA<br /><relatesTo>6</relatesTo>University of California San Diego, La Jolla, CA, 92093, USA<br /><relatesTo>7</relatesTo>University of New Brunswick, Saint John, E2L 4L5, Canada<br /><relatesTo>8</relatesTo>IPSNP Computing Inc., Saint John, E2L 4S6, Canada<br /><relatesTo>9</relatesTo>Westerdijk Institute, 3584 CT, Utrecht, The Netherlands<br /><relatesTo>10</relatesTo>Barcelona Supercomputing Center (BSC), 08034, Barcelona, Spain<br /><relatesTo>11</relatesTo>Gnosis Data Analysis PC, GR-700 13 Heraklion, Greece<br /><relatesTo>12</relatesTo>VU Amsterdam, 1081 HV Amsterdam, The Netherlands<br /><relatesTo>13</relatesTo>University of Southern California, Marina Del Rey, CA, 90292, USA<br /><relatesTo>14</relatesTo>Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK<br /><relatesTo>15</relatesTo>Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA<br /><relatesTo>16</relatesTo>University of Amsterdam, 1090 GH Amsterdam, The Netherlands<br /><relatesTo>17</relatesTo>Technical University of Denmark, 2800 Kongens Lyngby, Denmark<br /><relatesTo>18</relatesTo>University of Bergen, 5020 Bergen, Norway<br /><relatesTo>19</relatesTo>Bioinformatics Group, University of Freiburg, 79110 Freiburg, Germany<br /><relatesTo>20</relatesTo>Sequencing Analysis Support Core, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands<br /><relatesTo>21</relatesTo>Institut Pasteur, 75015 Paris, France<br /><relatesTo>22</relatesTo>IBIMA, Rostock University Medical Center, 18057 Rostock, Germany<br /><relatesTo>23</relatesTo>Netherlands eScience Center, 1098 XG Amsterdam, The Netherlands<br /><relatesTo>24</relatesTo>Informatics Institute, University of Amsterdam, 1090 GH Amsterdam, The Netherlands<br /><relatesTo>25</relatesTo>Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
Source :
F1000Research. 10:897
Publication Year :
2021
Publisher :
London, UK: F1000 Research Limited, 2021.

Abstract

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the “big picture” of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.

Details

ISSN :
20461402
Volume :
10
Database :
F1000Research
Journal :
F1000Research
Notes :
[version 1; peer review: 2 approved]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.54159.1
Document Type :
opinion-article
Full Text :
https://doi.org/10.12688/f1000research.54159.1