1. Introducing Data Primitives: Data Formats for the SKED Framework
- Author
-
Trippe, Elizabeth D., Aguilar, Jacob B., Yan, Yi H., Nural, Mustafa V., Brady, Jessica A., and Gutierrez, Juan B.
- Subjects
Quantitative Biology - Quantitative Methods - Abstract
Background: The past few years have seen a tremendous increase in the size and complexity of datasets. Scientific and clinical studies must to incorporate datasets that cross multiple spatial and temporal scales to describe a particular phenomenon. The storage and accessibility of these heterogeneous datasets in a way that is useful to researchers and yet extensible to new data types is a major challenge. Methods: In order to overcome these obstacles, we propose the use of data primitives as a common currency between analytical methods. The four data primitives we have identified are time series, text, annotated graph and triangulated mesh, with associated metadata. Using only data primitives to store data and as algorithm input, output, and intermediate results, promotes interoperability, scalability, and reproducibility in scientific studies. Results: Data primitives were used in a multi-omic, multi-scale systems biology study of malaria infection in non-human primates to perform many types of integrative analysis quickly and efficiently. Conclusions: Using data primitives as a common currency for both data storage and for cross talk between analytical methods enables the analysis of complex multi-omic, multi-scale datasets in a reproducible modular fashion., Comment: 10 pages, 3 figures
- Published
- 2017