Back to Search
Start Over
Managing genomic variant calling workflows with Swift/T
- Source :
- PLoS ONE, Vol 14, Iss 7, p e0211608 (2019), PLoS ONE
- Publication Year :
- 2019
- Publisher :
- Cold Spring Harbor Laboratory, 2019.
-
Abstract
- Genomic variant discovery is frequently performed using the GATK Best Practices variant calling pipeline, a complex workflow with multiple steps, fans/merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Here we describe a wrapper for the GATK-based variant calling workflow using the Swift/T parallel scripting language. Standard built-in features include the flexibility to split by chromosome before variant calling, optionally permitting the analysis to continue when faulty samples are detected, and allowing users to analyze multiple samples in parallel within each cluster node. The use of Swift/T conveys two key advantages: (1) Thanks to the embedded ability of Swift/T to transparently operate in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.,) a single workflow is trivially portable across numerous clusters; (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, conditional on the analyst’s choice, which makes the workflow easy to maintain. This modular design permits separation of the workflow into multiple stages and the request of resources optimal for each stage of the pipeline. While Swift/T’s implicit data-level parallelism eliminates the need for the developer to code parallel analysis of multiple samples, it does make debugging of the workflow a bit more difficult, as is the case with any implicitly parallel code. With the above features, users have a powerful and portable way to scale up their variant calling analysis to run in many traditional computer cluster architectures.https://github.com/ncsa/Swift-T-Variant-Callinghttp://swift-t-variant-calling.readthedocs.io/en/latest/
- Subjects :
- Man-Computer Interface
Swift
Economics
Computer science
Data management
Big data
Social Sciences
Cloud computing
computer.software_genre
Workflow
Computer Architecture
Database and Informatics Methods
0302 clinical medicine
Computer cluster
Psychology
Graphical User Interfaces
Language
Data Management
computer.programming_language
media_common
0303 health sciences
Multidisciplinary
Genomics
computer.file_format
Scalability
Engineering and Technology
Medicine
Executable
Workflow management system
Research Article
Employment
Computer and Information Sciences
Bioinformatics
Science
media_common.quotation_subject
Jobs
Research and Analysis Methods
Genome Complexity
Software portability
03 medical and health sciences
Genetics
Animals
Humans
Engines
030304 developmental biology
business.industry
Mechanical Engineering
Cognitive Psychology
Computational Biology
Biology and Life Sciences
Chromosome
Genome Analysis
Debugging
Scripting language
Labor Economics
Human Factors Engineering
Operating system
Cognitive Science
business
Software engineering
computer
Software
030217 neurology & neurosurgery
Neuroscience
User Interfaces
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- PLoS ONE, Vol 14, Iss 7, p e0211608 (2019), PLoS ONE
- Accession number :
- edsair.doi.dedup.....cdfaeca87794aab4dad97cc5fe87c5cf
- Full Text :
- https://doi.org/10.1101/524645