1. Data dictionary cookbook for research data and software interoperability at global scale
- Author
-
Romain David, Laurent Bouveret, Lorraine Coché, Pedro Pizzigatti Corrêa, Rorie Edmunds, Ana Heredia, Jean-Luc Jung, Yasuhisa Kondo, Iwan Le Berre, Yvan Le Bras, Emilie Lerigoleur, Laurence Mabile, Jeaneth Machicao, Bénédicte Madon, Yasuhiro Murayama, Margaret O'Brien, Takeshi Osawa, Hervé Raoul, Audrey Richard, Solange Santos, Alison Specht, Shelley Stall, Diana Stepanyan, Danton Ferreira Vellenich, Lesley Wyborn, European Research Infrastructure on Highly Pathogenic Agents (ERINHA-AISBL), Mathématiques, Informatique et STatistique pour l'Environnement et l'Agronomie (MISTEA), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut Agro - Montpellier SupAgro, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Observatoire des Mammiferes Marins de l'Archipel Guadeloupeen (OMMAG), Institut Universitaire Européen de la Mer (IUEM), Institut de Recherche pour le Développement (IRD)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Brest (UBO)-Centre National de la Recherche Scientifique (CNRS), University of São Paulo (USP), Facultad de Agronomía (E.E.F.A.S), Word Data System, ORCID, Institut de Systématique, Evolution, Biodiversité (ISYEB ), Muséum national d'Histoire naturelle (MNHN)-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université des Antilles (UA), RCAST, Littoral, Environnement, Télédétection, Géomatique (LETG - Brest), Littoral, Environnement, Télédétection, Géomatique UMR 6554 (LETG), Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Normandie Université (NU)-Université d'Angers (UA)-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Brest (UBO)-Université de Rennes 2 (UR2), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Centre National de la Recherche Scientifique (CNRS)-Institut de Géographie et d'Aménagement Régional de l'Université de Nantes (IGARUN), Université de Nantes (UN)-Université de Nantes (UN)-Université de Caen Normandie (UNICAEN), Université de Nantes (UN)-Université de Nantes (UN), Département de Radiologie [Niort] (DR - Niort), CH Niort, Géographie de l'environnement (GEODE), Université Toulouse - Jean Jaurès (UT2J)-Centre National de la Recherche Scientifique (CNRS), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, American Geophysical Union, PARSEC is funded by the Belmont Forum through the National Science Foundation (NSF), The São Paulo Research Foundation (FAPESP), the French National Research Agency (ANR), and the Japan Science and Technology Agency (JST). ERINHA Advance is funded by ERINHA-Advance european program under grant agreement Nº824061. Kakila database is funded by the LabEx DRIIHM French program 'Investissements d'Avenir' (ANR-11-LABX-0010) and supported by the SO-DRIIHM project (ANR-19-DATA-0022). This work is partially funded by the EOSC-Life European program (grant agreement No. 824087), ANR-11-LABX-0010,DRIIHM / IRDHEI,Dispositif de recherche interdisciplinaire sur les Interactions Hommes-Milieux(2011), ANR-19-DATA-0022,SO-DRIIHM,Impulser la science ouverte au sein du Dispositif de Recherche Interdisciplinaire sur les Interactions Hommes-Milieux (DRIIHM) : co-design d'une e-infrastructure intégrant les principes FAIR(2019), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Observatoire des Mammiferes Marins de lArchipel Guadeloupeen (OMMAG), Centre National de la Recherche Scientifique (CNRS)-Université Toulouse - Jean Jaurès (UT2J), Universidade de São Paulo = University of São Paulo (USP), Muséum national d'Histoire naturelle (MNHN)-École Pratique des Hautes Études (EPHE), Normandie Université (NU)-Normandie Université (NU)-Université d'Angers (UA)-École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Brest (UBO)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut de Géographie et d'Aménagement Régional de l'Université de Nantes (IGARUN), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), and American Geophysical Union [Washington]
- Subjects
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Data dictionary, cookbook, Research Data Management, Interoperability, reproducibility, FAIR Data, Data Reuse, Data Aggregation ,[SDV]Life Sciences [q-bio] ,Research Data Management ,Interoperability ,OHM Littoral Caraibe ,[SDE.ES]Environmental Sciences/Environmental and Society ,Data dictionary ,cookbook ,FAIR Data ,Data Aggregation ,[SDV.EE.ECO]Life Sciences [q-bio]/Ecology, environment/Ecosystems ,Data Reuse ,[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET] ,[SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie ,14. Life underwater ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,[SDE.BE]Environmental Sciences/Biodiversity and Ecology ,LABEX DRIIHM ,reproducibility ,[SDV.MHEP]Life Sciences [q-bio]/Human health and pathology - Abstract
We are now facing profound changes (biodiversity, climate, pandemic, etc.). Human impacts and their mitigation will depend on our ability to mobilize research at the global level. The sustainable development of the society will largely depend on the sustainable development of global science and scientific research tools, outputs, and research ecosystems. This globalization of research requires interoperating our observation and experimentation systems in order to better understand these changes, to better simulate their effects. The Covid-19 pandemic is now raging around the world. The reproducibility of research and results across regions in different contexts should accelerate human responses. Data sharing and the development of Synthesis Research with data aggregation at large scale is critical to enable such processes. The use of common knowledge, vocabularies, standards and procedures at a large scale is necessary. The objective of this poster is to report on the challenges met while building data dictionaries in three global projects related to biodiversity and/or disease research: PARSEC, Kakila, ERINHA-Advance. The Kakila database centralizes and harmonizes marine mammal observation data for the AGOA sanctuary around the French archipelago of Guadeloupe, French Antilles. The PARSEC Project is building new tools for data sharing and reuse through a transnational investigation of the socioeconomic impact of protected areas. The ERINHA-Advance project aims to support the operations of the ERINHA research infrastructure which is designed to generate data from transnational access research activities on highly pathogenic agents. In these 3 global case-studies, similar challenges have arisen: to aggregate and interoperate pre-existing heterogeneous data at the global scale, and to share common tools to monitor, maintain quality, scan scale and cope with uncertainty. This poster proposes a draft common methodology, a data dictionary cookbook, which will provide a roadmap towards the building of large scale - data dictionaries. Topics proposed to be covered in such a cookbook include: how to search for existing and appropriate data dictionaries, controlled vocabularies or other semantic resources (before building a new one), the first steps for data dictionary building, data dictionary literacy (and why it is a mandatory work), how to define all scientific objects, aspects (or use existing one) and agree on the definitions with the whole community, building / proposing variables / indicators with ontology models, schemas, variables naming rules and context awareness, and finally addressing dimension issues considering each context. The common experience of our three projects showed that we need to proceed step by step as simply as possible and to ensure that each step is understandable for the whole community. It is necessary to improve access and re-use of all existing semantic materials and not trying to build a cathedral with a little spoon.
- Published
- 2021
- Full Text
- View/download PDF