Back to Search Start Over

Data quality monitoring in clinical and observational epidemiologic studies: the role of metadata and process information

Authors :
Richter, Adrian
Schössow, Janka
Werner, André
Schauer, Birgit
Radke, Dörte
Henke, Jörg
Struckmann, Stephan
Schmidt, Carsten Oliver
Source :
GMS Medizinische Informatik, Biometrie und Epidemiologie, Vol 15, Iss 1, p Doc08 (2019)
Publication Year :
2019
Publisher :
German Medical Science GMS Publishing House, 2019.

Abstract

High data quality is fundamental for valid inferences in health research. Metadata, i.e. “data that describe other data”, are essential to implement data quality assessments but more guidance on which metadata to use is needed. Similarly, the selection and use of variables describing the measurement process should be exemplified to improve the design and conduct of observational health studies. This work provides a conceptual framework and overview of metadata and process information for systematic data quality reports based on implementations within the population-based cohort Study of Health in Pomerania (SHIP). In previous years, a prerequisite for automated data quality checks has been established by the augmentation of the data dictionary; the added information of up to 20 different characteristics for each variable is used for data quality assessments and triggers diverse data quality checks. Conceptually we distinguish static metadata, variable metadata, and process variables. Examples for static metadata are the expected probability distribution, plausibility limits, and the data type. Variable metadata may be reference limits of a laboratory marker. Information inherent to these metadata allows for the detection of data quality flaws by comparing observed with expected data characteristics. In contrast, process variables, such as the observer or device ID, also allow for the identification of sources of data quality issues. This is the case even if characteristics defined in metadata were not violated. Metadata and process variables can be used alone or in combination to implement a versatile and efficient data quality assessment. A comprehensive setup of metadata and process variables is an extensive task, particularly in studies involving large data collections. Nonetheless, the gain in transparency and efficacy of data curation and quality reporting after this setup is considerable.

Details

Language :
German, English
ISSN :
18609171 and 63489597
Volume :
15
Issue :
1
Database :
Directory of Open Access Journals
Journal :
GMS Medizinische Informatik, Biometrie und Epidemiologie
Publication Type :
Academic Journal
Accession number :
edsdoj.988e4cbf8b63489597738d1e668d452a
Document Type :
article
Full Text :
https://doi.org/10.3205/mibe000202