1. Predictive modeling of nontuberculous mycobacterial pulmonary disease epidemiology using German health claims data
- Author
-
Raphael Ewen, M Obradovic, Roald van der Laan, Bondo Ben Monga, Felix C. Ringshausen, Jan Multmeier, and Roland Diel
- Subjects
0301 basic medicine ,Lung Diseases ,Male ,Epidemiology ,Comorbidity ,German ,Machine Learning ,0302 clinical medicine ,Risk Factors ,Germany ,Prevalence ,Medicine ,030212 general & internal medicine ,Aged, 80 and over ,education.field_of_study ,biology ,Incidence (epidemiology) ,Insurance claims analysis ,Incidence ,Nontuberculous Mycobacteria ,General Medicine ,Middle Aged ,Infectious Diseases ,language ,Female ,Microbiology (medical) ,Adult ,medicine.medical_specialty ,Adolescent ,030106 microbiology ,Population ,Pulmonary disease ,Mycobacterium Infections, Nontuberculous ,Sample (statistics) ,Nontuberculous mycobacterium infections ,lcsh:Infectious and parasitic diseases ,03 medical and health sciences ,Insurance Claim Review ,Young Adult ,Health claims on food labels ,Humans ,lcsh:RC109-216 ,education ,Aged ,Retrospective Studies ,Models, Statistical ,business.industry ,biology.organism_classification ,bacterial infections and mycoses ,language.human_language ,Case-Control Studies ,Nontuberculous mycobacteria ,business ,Probability learning ,Demography - Abstract
Objectives Administrative claims data are prone to underestimate the burden of non-tuberculous mycobacterial pulmonary disease (NTM-PD). Methods We developed machine learning-based algorithms using historical claims data from cases with NTM-PD to predict patients with a high probability of having previously undiagnosed NTM-PD and to assess actual prevalence and incidence. Adults with incident NTM-PD were classified from a representative 5% sample of the German population covered by statutory health insurance during 2011–2016 by the International Classification of Diseases, 10th revision code A31.0. Pre-diagnosis characteristics (patient demographics, comorbidities, diagnostic and therapeutic procedures, and medications) were extracted and compared to that of a control group without NTM-PD to identify risk factors. Results Applying a random forest model (area under the curve 0.847; total error 19.4%) and a risk threshold of >99%, prevalence and incidence rates in 2016 increased 5-fold and 9-fold to 19 and 15 cases/100,000 population, respectively, for both coded and non-coded vs. coded cases alone. Conclusions The use of a machine learning-based algorithm applied to German statutory health insurance claims data predicted a considerable number of previously unreported NTM-PD cases with high probabilty.
- Published
- 2020