Back to Search Start Over

Data Lake, Data Warehouse, Datamart, and Feature Store: Their Contributions to the Complete Data Reuse Pipeline.

Authors :
Lamer A
Saint-Dizier C
Paris N
Chazard E
Source :
JMIR medical informatics [JMIR Med Inform] 2024 Jul 17; Vol. 12, pp. e54590. Date of Electronic Publication: 2024 Jul 17.
Publication Year :
2024

Abstract

Unlabelled: The growing adoption and use of health information technology has generated a wealth of clinical data in electronic format, offering opportunities for data reuse beyond direct patient care. However, as data are distributed across multiple software, it becomes challenging to cross-reference information between sources due to differences in formats, vocabularies, and technologies and the absence of common identifiers among software. To address these challenges, hospitals have adopted data warehouses to consolidate and standardize these data for research. Additionally, as a complement or alternative, data lakes store both source data and metadata in a detailed and unprocessed format, empowering exploration, manipulation, and adaptation of the data to meet specific analytical needs. Subsequently, datamarts are used to further refine data into usable information tailored to specific research questions. However, for efficient analysis, a feature store is essential to pivot and denormalize the data, simplifying queries. In conclusion, while data warehouses are crucial, data lakes, datamarts, and feature stores play essential and complementary roles in facilitating data reuse for research and analysis in health care.<br /> (© Antoine Lamer, Chloé Saint-Dizier, Nicolas Paris, Emmanuel Chazard. Originally published in JMIR Medical Informatics (https://medinform.jmir.org).)

Details

Language :
English
ISSN :
2291-9694
Volume :
12
Database :
MEDLINE
Journal :
JMIR medical informatics
Publication Type :
Academic Journal
Accession number :
39037339
Full Text :
https://doi.org/10.2196/54590