Back to Search Start Over

Generation of a Realistic Synthetic Laryngeal Cancer Cohort for AI Applications.

Authors :
Katalinic, Mika
Schenk, Martin
Franke, Stefan
Katalinic, Alexander
Neumuth, Thomas
Dietz, Andreas
Stoehr, Matthaeus
Gaebel, Jan
Source :
Cancers. Feb2024, Vol. 16 Issue 3, p639. 14p.
Publication Year :
2024

Abstract

Simple Summary: The use of synthetic patient data can help address patient privacy concerns and the general availability of clinical data. It can overcome the challenges associated with obtaining real patient data for use in medical research, healthcare analytics, and clinical decision support. We propose an approach to provide synthetic patient data for laryngeal cancer, a relatively rare but complex disease. We adapted an existing synthesis technology to produce realistic prevalence and age distributions in the generated patient datasets. We verified the methodology and validated the results using real patient data from a German cancer registry. Background: Obtaining large amounts of real patient data involves great efforts and expenses, and processing this data is fraught with data protection concerns. Consequently, data sharing might not always be possible, particularly when large, open science datasets are needed, as for AI development. For such purposes, the generation of realistic synthetic data may be the solution. Our project aimed to generate realistic cancer data with the use case of laryngeal cancer. Methods: We used the open-source software Synthea and programmed an additional module for development, treatment and follow-up for laryngeal cancer by using external, real-world (RW) evidence from guidelines and cancer registries from Germany. To generate an incidence-based cohort view, we randomly drew laryngeal cancer cases from the simulated population and deceased persons, stratified by the real-world age and sex distributions at diagnosis. Results: A module with age- and stage-specific treatment and prognosis for laryngeal cancer was successfully implemented. The synthesized population reflects RW prevalence well, extracting a cohort of 50,000 laryngeal cancer patients. Descriptive data on stage-specific and 5-year overall survival were in accordance with published data. Conclusions: We developed a large cohort of realistic synthetic laryngeal cancer cases with Synthea. Such data can be shared and published open source without data protection issues. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20726694
Volume :
16
Issue :
3
Database :
Academic Search Index
Journal :
Cancers
Publication Type :
Academic Journal
Accession number :
175373906
Full Text :
https://doi.org/10.3390/cancers16030639