Back to Search Start Over

Towards integration of data-driven agronomic experiments with data provenance.

Authors :
Cruz, Sérgio Manuel Serra da
Nascimento, José Antonio Pires do
Source :
Computers & Electronics in Agriculture. Jun2019, Vol. 161, p14-28. 15p.
Publication Year :
2019

Abstract

• Integration of data and provenance play a central role in agronomic research. • RFlow is a framework for capturing provenance of data-centric agronomic experiments. • It allows reuse of R scripts encapsulated by generic scientific meta-workflows. • RFlow save data and metadata in repositories compatible with W3C PROV recommendation. • Researchers or referees can browse information or protocols using Web interfaces. With improvements in computing and communications, the amount of scientific data in agriculture has been exploding. Thus, researchers must rely on computational simulations to model the data-driven in silico agronomic experiments, the in silico experiments are those that are completely executed by using computer models. Reproducibility, transparency, independent verification are major features of Science. However, even agricultural research of exemplary quality may have irreproducible empirical findings because of random or systematic error. Funding agencies, researchers, and reviewers are demanding improved processes and the use of open data to increase reproducibility of those experiments. Currently, there are no scientific criteria to evaluate the integration of data-driven agronomic experiments with data provenance. We propose RFlow, a framework that aid researchers to manage, share, and enact the scientific in silico experiments of research projects that use reusable R scripts. The framework uses open data standards and transparently captures provenance of the agronomic experiments. RFlow is non-intrusive, can be connected to workflow systems and does not require researchers to change their working way. Our computational experiments show that the framework can collect provenance metadata and enrich a scientific project. This study shows how RFlow can serve as the primary integration platform for statistical systems, like R, with implications for other data and compute-intensive agronomic projects. As a proof of concept, we show the concrete effectiveness and expressive power of the RFlow which was evaluated through a set of data-driven agronomic in silico experiments and provenance SQL queries that exemplifies what kind of information was gathered. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01681699
Volume :
161
Database :
Academic Search Index
Journal :
Computers & Electronics in Agriculture
Publication Type :
Academic Journal
Accession number :
136497774
Full Text :
https://doi.org/10.1016/j.compag.2019.01.044