Start Over

Learning debiased graph representations from the OMOP common data model for synthetic data generation.

Authors :: Schulz, Nicolas Alexander
Carus, Jasmin
Wiederhold, Alexander Johannes
Johanns, Ole
Peters, Frederik
Rath, Natalie
Rausch, Katharina
Holleczek, Bernd
Katalinic, Alexander
Nennecke, Alice
Kusche, Henrik
Heinrichs, Vera
Eberle, Andrea
Luttmann, Sabine
Abnaof, Khalid
Kim-Wanner, Soo-Zin
Handels, Heinz
Germer, Sebastian
Halber, Marco
Richter, Martin
Source :: BMC Medical Research Methodology; 6/22/2024, Vol. 24 Issue 1, p1-13, 13p
Publication Year :: 2024
Abstract: Background: Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention. Methods: Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts. Results: The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand. Conclusion: Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable. [ABSTRACT FROM AUTHOR]

Subjects :: REPRESENTATIONS of graphs
ELECTRONIC health record standards
MEDICAL informatics
DATA modeling
NURSING informatics
MARKOV processes

Details

Language :: English
ISSN :: 14712288
Volume :: 24
Issue :: 1
Database :: Complementary Index
Journal :: BMC Medical Research Methodology
Publication Type :: Academic Journal
Accession number :: 178026817
Full Text :: https://doi.org/10.1186/s12874-024-02257-8

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Learning debiased graph representations from the OMOP common data model for synthetic data generation.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Learning debiased graph representations from the OMOP common data model for synthetic data generation.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources