Start Over

A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data.

Authors :: Khalid, Sara
Yang, Cynthia
Blacketer, Clair
Duarte-Salles, Talita
Fernández-Bertolín, Sergio
Kim, Chungsoo
Park, Rae Woong
Park, Jimyung
Schuemie, Martijn J.
Sena, Anthony G.
Suchard, Marc A.
You, Seng Chan
Rijnbeek, Peter R.
Reps, Jenna M.
Source :: Computer Methods & Programs in Biomedicine. Nov2021, Vol. 211, pN.PAG-N.PAG. 1p.
Publication Year :: 2021
Abstract: • Harmonization and quality control of originally heterogenous observational databases. • Large-scale application of machine learning methods in a distributed data network. • Transparent use of open-source software tools and publicly shared analytical code. As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g. , by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). We show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world. [ABSTRACT FROM AUTHOR]

Subjects :: *PREDICTION models
*COVID-19 pandemic
*RANDOM forest algorithms
*COVID-19
*SOFTWARE development tools
*DECISION trees
*PIPELINE inspection

Details

Language :: English
ISSN :: 01692607
Volume :: 211
Database :: Academic Search Index
Journal :: Computer Methods & Programs in Biomedicine
Publication Type :: Academic Journal
Accession number :: 153173150
Full Text :: https://doi.org/10.1016/j.cmpb.2021.106394

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources