1. Semantic Management of Data from Biodiversity and Ecosystem Studies: Toward an Integrated Workflow from Collection to Publication. Application to Plankton Data from Lake Geneva
- Author
-
Pichot, Christian, Maurice, Damien, Monet, Ghislaine, Yahiaoui, Rachid, Clastre, Philippe, Jaillet, Benjamin, Ecologie des Forêts Méditerranéennes (URFM), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), SILVA (SILVA), AgroParisTech-Université de Lorraine (UL)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Centre Alpin de Recherche sur les Réseaux Trophiques et Ecosystèmes Limniques (CARRTEL), Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), InfoSol (InfoSol), ANR-18-CE23-0017,D2KAB,Des Données aux Connaissances en Agronomie et Biodiversité(2018), ANR-11-INBS-0001,ANAEE-FR,ANAEE-Services(2011), and European Project: 654182,H2020,H2020-INFRADEV-1-2014-1,ENVRI PLUS(2015)
- Subjects
modelling ,FAIR data ,entity property ,plankton ,pipeline ,interoperability ,ontology ,[SDE.BE]Environmental Sciences/Biodiversity and Ecology ,biodiversity - Abstract
International audience; Biodiversity is a key player in ecosystem characteristics and dynamics. Acting as a driver, italso results from ecosystem functioning. Understanding this complex interplay betweenbiological and physical components is one of the main current challenges in the context of landuse changes and climate warming. The acquisition of knowledge on biodiversity requiresmultidisciplinary approaches and mobilises numerous research teams. Data are collected orcomputed in large quantity but are most often poorly standardised and therefore heterogeneous.In this context the development of semantic interoperability is a major challenge for the sharingand reuse of these data. This objective is implemented within the framework of the AnaEE(Analysis and Experimentation on Ecosystems) Research Infrastructure dedicated toexperimentation on ecosystems and biodiversity. A distributed Information System (IS) isdeveloped, based on the semantic interoperability of its components using commonvocabularies (AnaeeThes thesaurus and OBOE-based ontology extended for disciplinaryneeds) for modelling the studied system. This modelling covers the measured variablesincluding biodiversity, as well as the different components of the experimental or observationalcontext, from sensor to plot and network. Driven by the ontology, the approach relies on theatomic decomposition of each of the components into observed entities, their characteristicsand qualifiers, their units or naming standards. The modelling of the system allows the semanticannotation of relational databases or flat files for the production of URIs based graph databases.A first pipeline automates the annotation process and the production of the semantic data. Asecond pipeline is devoted to the exploitation of these semantic data by generating i) metadatarecords formatted according to the geospatial extension for the Data Catalog Vocabularystandard and the ISO 19139 standard, and ii) Network Common Data Form data files. Theimplementation of this integrated semantic management of data is presented here for phytoand zoo-plankton data collected from water columns in Lake Geneva over a 30 years period,as well as for environmental data about water temperature and nutrients. The work carried outcontributes to the development and use of semantic vocabularies within the biodiversity andecology research community, leading to semantically enriched metadata records andinteroperable data sets. The genericity of the tools make them usable in different contexts ofdata production, management and ontologies involved in semantic modelling.
- Published
- 2021