262 results on '"Pfaff, Emily"'
Search Results
102. 371 – Using a Machine Learning Program - the Clinical Annotation Research Kit (Clark!) - to Identify Patients with Undiagnosed Nafld
- Author
-
Kim, Hannah P., primary, Bradford, Robert L., additional, Pfaff, Emily, additional, and Barritt, Alfred S., additional
- Published
- 2019
- Full Text
- View/download PDF
103. A novel approach for exposing and sharing clinical data: the Translator Integrated Clinical and Environmental Exposures Service
- Author
-
Fecho, Karamarie, primary, Pfaff, Emily, additional, Xu, Hao, additional, Champion, James, additional, Cox, Steve, additional, Stillwell, Lisa, additional, Peden, David B, additional, Bizon, Chris, additional, Krishnamurthy, Ashok, additional, Tropsha, Alexander, additional, and Ahalt, Stanley C, additional
- Published
- 2019
- Full Text
- View/download PDF
104. Semantic Integration of Clinical Laboratory Tests from Electronic Health Records for Deep Phenotyping and Biomarker Discovery
- Author
-
Zhang, Xingmin Aaron, primary, Yates, Amy, additional, Vasilevsky, Nicole, additional, Gourdine, JP, additional, Carmody, Leigh C., additional, Danis, Daniel, additional, Joachimiak, Marcin P., additional, Ravanmehr, Vida, additional, Pfaff, Emily R., additional, Champion, James, additional, Robasky, Kimberly, additional, Xu, Hao, additional, Fecho, Karamarie, additional, Walton, Nephi A., additional, Zhu, Richard, additional, Ramsdill, Justin, additional, Mungall, Chris, additional, Köhler, Sebastian, additional, Haendel, Melissa A., additional, McDonald, Clem, additional, Vreeman, Daniel J., additional, Peden, David B., additional, Chute, Christopher G., additional, and Robinson, Peter N., additional
- Published
- 2019
- Full Text
- View/download PDF
105. Ensuring a safe(r) harbor: Excising personally identifiable information from structured electronic health record data.
- Author
-
Pfaff, Emily R., Haendel, Melissa A., Kostka, Kristin, Lee, Adam, Niehaus, Emily, Palchuk, Matvey B., Walters, Kellie, and Chute, Christopher G.
- Subjects
PERSONALLY identifiable information ,ELECTRONIC health records ,DATA recorders & recording ,COMMUNITIES ,TELEPHONE numbers - Abstract
Recent findings have shown that the continued expansion of the scope and scale of data collected in electronic health records are making the protection of personally identifiable information (PII) more challenging and may inadvertently put our institutions and patients at risk if not addressed. As clinical terminologies expand to include new terms that may capture PII (e.g., Patient First Name, Patient Phone Number), institutions may start using them in clinical data capture (and in some cases, they already have). Once in use, PII-containing values associated with these terms may find their way into laboratory or observation data tables via extract-transform-load jobs intended to process structured data, putting institutions at risk of unintended disclosure. Here we aim to inform the informatics community of these findings, as well as put out a call to action for remediation by the community. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
106. Recruiting for a pragmatic trial using the electronic health record and patient portal: successes and lessons learned
- Author
-
Pfaff, Emily, primary, Lee, Adam, additional, Bradford, Robert, additional, Pae, Jinhee, additional, Potter, Clarence, additional, Blue, Paul, additional, Knoepp, Patricia, additional, Thompson, Kristie, additional, Roumie, Christianne L, additional, Crenshaw, David, additional, Servis, Remy, additional, and DeWalt, Darren A, additional
- Published
- 2018
- Full Text
- View/download PDF
107. FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data
- Author
-
Buranosky, Matt, primary, Stellnberger, Elmar, additional, Pfaff, Emily, additional, Diaz-Sanchez, David, additional, and Ward-Caviness, Cavin, additional
- Published
- 2018
- Full Text
- View/download PDF
108. Annual Average PM2.5 Exposure Is Associated with Mortality in a Heart Failure Cohort: Results from the EPA CARES Study
- Author
-
Ward-Caviness, Cavin, primary, Weaver, Anne M, additional, Pfaff, Emily, additional, Neas, Lucas, additional, Devlin, Robert, additional, Cascio, Wayne, additional, and Diaz-Sanchez, David, additional
- Published
- 2018
- Full Text
- View/download PDF
109. Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records
- Author
-
Hoffman, Sarah R., primary, Vines, Anissa I., additional, Halladay, Jacqueline R., additional, Pfaff, Emily, additional, Schiff, Lauren, additional, Westreich, Daniel, additional, Sundaresan, Aditi, additional, Johnson, La-Shell, additional, and Nicholson, Wanda K., additional
- Published
- 2018
- Full Text
- View/download PDF
110. An Electronic Health Record–Based Strategy to Systematically Assess Medication Use Among Primary Care Patients With Multidrug Regimens: Feasibility Study
- Author
-
Bailey, Stacy Cooper, primary, Oramasionwu, Christine U, additional, Infanzon, Alexandra C, additional, Pfaff, Emily R, additional, Annis, Izabela E, additional, and Reuland, Daniel S, additional
- Published
- 2017
- Full Text
- View/download PDF
111. Alloimmunization is associated with older age of transfused red blood cells in sickle cell disease
- Author
-
Pfaff, Emily R., Qaqish, Bahjat, Hebden, Leyna M., Park, Yara A., Desai, Payal C., Ataga, Kenneth I., and Deal, Allison M.
- Subjects
circulatory and respiratory physiology - Abstract
Red blood cell (RBC) alloimmunization is a significant clinical complication of sickle cell disease (SCD). It can lead to difficulty with cross-matching for future transfusions and may sometimes trigger life-threatening delayed hemolytic transfusion reactions. We conducted a retrospective study to explore the association of clinical complications and age of RBC with alloimmunization in patients with SCD followed at a single institution from 2005 to 2012. One hundred and sixty six patients with a total of 488 RBC transfusions were evaluated. Nineteen patients (11%) developed new alloantibodies following blood transfusions during the period of review. The median age of RBC units was 20 days (interquartile range: 14-27 days). RBC antibody formation was significantly associated with the age of RBC units (P = 0.002), with a hazard ratio of 3.5 (95% CI: 1.71-7.11) for a RBC unit that was 7 days old and 9.8 (95% CI: 2.66-35.97) for a unit that was 35 days old, 28 days after the blood transfusion. No association was observed between RBC alloimmunization and acute vaso-occlusive complications. Although increased echocardiography-derived tricuspid regurgitant jet velocity (TRV) was associated with the presence of RBC alloantibodies (P = 0.02), TRV was not significantly associated with alloimmunization when adjusted for patient age and number of transfused RBC units. Our study suggests that RBC antibody formation is significantly associated with older age of RBCs at the time of transfusion. Prospective studies in patients with SCD are required to confirm this finding.
- Published
- 2015
- Full Text
- View/download PDF
112. Clinical Data: Sources and Types, Regulatory Constraints, Applications.
- Author
-
Ahalt, Stanley C., Chute, Christopher G., Fecho, Karamarie, Glusman, Gustavo, Hadlock, Jennifer, Taylor, Casey Overby, Pfaff, Emily R., Robinson, Peter N., Solbrig, Harold, Ta, Casey, Tatonetti, Nicholas, and Weng, Chunhua
- Subjects
MEDICAL databases ,AIR pollutants - Abstract
We briefly describe several clinical data types that are commonly employed in clinical and translational research, including fully identified clinical data, HIPAA-limited clinical data, deidentified clinical data, and synthetic data. We highlight several novel approaches for openly exposing clinical data that we have developed as part of the Translator program, namely, HIPAA Safe Harbor Plus (HuSH+) clinical data, clinical profiles, Columbia Open Health Data (COHD), and the Integrated Clinical and Environmental Exposures Service (ICEES). Deidentified clinical data sets may be used for clinical interpretation and scientific inference and discovery but to a lesser extent than HIPAA-limited clinical data sets because of the fact that key variables or covariates may have been removed from the data. Synthetic clinical data sets comprise realistic (but not real) data generated statistically by applying simulation techniques to population distributions of observational patient data. [Extracted from the article]
- Published
- 2019
- Full Text
- View/download PDF
113. Recruiting for a pragmatic trial using the electronic health record and patient portal: successes and lessons learned.
- Author
-
Pfaff, Emily, Lee, Adam, Bradford, Robert, Pae, Jinhee, Potter, Clarence, Blue, Paul, Knoepp, Patricia, Thompson, Kristie, Roumie, Christianne L, Crenshaw, David, Servis, Remy, and DeWalt, Darren A
- Abstract
Objective: Querying electronic health records (EHRs) to find patients meeting study criteria is an efficient method of identifying potential study participants. We aimed to measure the effectiveness of EHR-driven recruitment in the context of ADAPTABLE (Aspirin Dosing: A Patient-centric Trial Assessing Benefits and Long-Term Effectiveness)-a pragmatic trial aiming to recruit 15 000 patients.Materials and Methods: We compared the participant yield of 4 recruitment methods: in-clinic recruitment by a research coordinator, letters, direct email, and patient portal messages. Taken together, the latter 2 methods comprised our EHR-driven electronic recruitment workflow.Results: The electronic recruitment workflow sent electronic messages to 12 254 recipients; 13.5% of these recipients visited the study website, and 4.2% enrolled in the study. Letters were sent to 427 recipients; 5.6% visited the study website, and 3.3% enrolled in the study. Coordinators recruited 339 participants in clinic; 23.6% visited the study website, and 16.8% enrolled in the study. Five-hundred-nine of the 580 UNC enrollees (87.8%) were recruited using an electronic method.Discussion: Electronic recruitment reached a wide net of patients, recruited many participants to the study, and resulted in a workflow that can be reused for future studies. In-clinic recruitment saw the highest yield, suggesting that a combination of recruitment methods may be the best approach. Future work should account for demographic skew that may result by recruiting from a pool of patient portal users.Conclusion: The success of electronic recruitment for ADAPTABLE makes this workflow well worth incorporating into an overall recruitment strategy, particularly for a pragmatic trial. [ABSTRACT FROM AUTHOR]- Published
- 2019
- Full Text
- View/download PDF
114. Association of COVID-19 With Risk of Posttransplant Diabetes Mellitus
- Author
-
Vinson, Amanda J., Anzalone, A. Jerrod, Schissel, Makayla, Dai, Ran, Olex, Amy L., Mannon, Roslyn B., Wilcox, Adam B., Lee, Adam M., Graves, Alexis, Anzalone, Alfred (Jerrod), Manna, Amin, Saha, Amit, Zhou, Andrea, Williams, Andrew E., Southerland, Andrew, Girvin, Andrew T., Walden, Anita, Sharathkumar, Anjali A., Amor, Benjamin, Bates, Benjamin, Hendricks, Brian, Patel, Brijesh, Alexander, Caleb, Bramante, Carolyn, Ward-Caviness, Cavin, Madlock-Brown, Charisse, Suver, Christine, Chute, Christopher, Dillon, Christopher, Wu, Chunlei, Schmitt, Clare, Takemoto, Cliff, Housman, Dan, Gabriel, Davera, Eichmann, David A., Mazzotti, Diego, Brown, Don, Boudreau, Eilis, Hill, Elaine, Zampino, Elizabeth, Carlson Marti, Emily, Pfaff, Emily R., French, Evan, Koraishy, Farrukh M, Mariona, Federico, Prior, Fred, Agarwal, Gaurav, Sokos, George, Martin, Greg, Lehmann, Harold, Spratt, Heidi, Mehta, Hemalkumar, Liu, Hongfang, Sidky, Hythem, Awori Hayanga, J.W., Pincavitch, Jami, Clark, Jaylyn, Richard Harper, Jeremy, Islam, Jessica, Ge, Jin, Gagnier, Joel, Saltz, Joel H., Saltz, Joel, Loomba, Johanna, Buse, John, Mathew, Jomol, Rutter, Joni L., McMurry, Julie A., Guinney, Justin, Starren, Justin, Crowley, Karen, Rebecca Bradwell, Katie, Walters, Kellie M., Wilkins, Ken, Gersing, Kenneth R., Dwain Cato, Kenrick, Murray, Kimberly, Kostka, Kristin, Northington, Lavance, Allan Pyles, Lee, Misquitta, Leonie, Cottrell, Lesley, Portilla, Lili, Deacy, Mariam, Bissell, Mark M., Clark, Marshall, Emmett, Mary, Morrison Saltz, Mary, Palchuk, Matvey B., Haendel, Melissa A., Adams, Meredith, Temple-O’Connor, Meredith, Kurilla, Michael G., Morris, Michele, Qureshi, Nabeel, Safdar, Nasia, Garbarini, Nicole, Sharafeldin, Noha, Sadan, Ofer, Francis, Patricia A., Wung Burgoon, Penny, Robinson, Peter, Payne, Philip R.O., Fuentes, Rafael, Jawa, Randeep, Erwin-Cohen, Rebecca, Patel, Rena, Moffitt, Richard A., Zhu, Richard L., Kamaleswaran, Rishi, Hurley, Robert, Miller, Robert T., Pyarajan, Saiju, Michael, Sam G., Bozzette, Samuel, Mallipattu, Sandeep, Vedula, Satyanarayana, Chapman, Scott, O’Neil, Shawn T., Setoguchi, Soko, Hong, Stephanie S., Johnson, Steve, Lee, Stephen, Bennett, Tellen D., Callahan, Tiffany, Topaloglu, Umit, Sheikh, Usman, Gordon, Valery, Subbian, Vignesh, Kibbe, Warren A., Hernandez, Wenndy, Beasley, Will, Cooper, Will, Hillegass, William, and Tanner Zhang, Xiaohan
- Published
- 2024
- Full Text
- View/download PDF
115. An efficient approach for surveillance of childhood diabetes by type derived from electronic health record data: the SEARCH for Diabetes in Youth Study
- Author
-
Zhong, Victor W, primary, Obeid, Jihad S, additional, Craig, Jean B, additional, Pfaff, Emily R, additional, Thomas, Joan, additional, Jaacks, Lindsay M, additional, Beavers, Daniel P, additional, Carey, Timothy S, additional, Lawrence, Jean M, additional, Dabelea, Dana, additional, Hamman, Richard F, additional, Bowlby, Deborah A, additional, Pihoker, Catherine, additional, Saydah, Sharon H, additional, and Mayer-Davis, Elizabeth J, additional
- Published
- 2016
- Full Text
- View/download PDF
116. Efficient Surveillance of Childhood Diabetes Using Electronic Health Record Data
- Author
-
Zhong, Victor W., primary, Obeid, Jihad S., additional, Craig, Jean B., additional, Pfaff, Emily R., additional, Thomas, Joan, additional, Jaacks, Lindsay M., additional, Beavers, Daniel P., additional, Carey, Timothy S., additional, Lawrence, Jean M., additional, Dabelea, Dana, additional, Hamman, Richard F., additional, Bowlby, Deborah A., additional, Pihoker, Catherine, additional, Saydah, Sharon H., additional, and Mayer-Davis, Elizabeth J., additional
- Published
- 2016
- Full Text
- View/download PDF
117. Administrative coding is specific, but not sensitive, for identifying eosinophilic esophagitis
- Author
-
Rybnicek, David A., Hathorn, Kelly E., Pfaff, Emily R., Bulsiewicz, William J., Shaheen, Nicholas J., and Dellon, Evan S.
- Subjects
Adult ,Male ,Academic Medical Centers ,Adolescent ,Databases, Factual ,Incidence ,Infant ,Eosinophilic Esophagitis ,Middle Aged ,Sensitivity and Specificity ,Article ,Young Adult ,International Classification of Diseases ,Child, Preschool ,North Carolina ,Humans ,Female ,Registries ,Child ,Aged ,Retrospective Studies - Abstract
The use of administrative databases to conduct population-based studies of eosinophilic esophagitis (EoE) in the United States is limited because it is unknown whether the International Classification of Diseases, Ninth Revision (ICD-9) code for EoE, 530.13, accurately identifies those who truly have the disease. The aim of this retrospective study was to validate the ICD-9 code for identifying cases of EoE in administrative data. Confirmed cases of EoE as per consensus guidelines (symptoms of esophageal dysfunction and ≥15 eosinophils per high-power field on biopsy after 8 weeks of twice daily proton pump inhibitor therapy) were identified in the University of North Carolina (UNC) EoE Clinicopathologic Database from 2008 to 2010; 2008 was the first year in which the 530.13 code was approved. Using the Carolina Data Warehouse, the administrative database for patients seen in the UNC system, all diagnostic and procedure codes were obtained for these cases. Then, with the EoE cases as the reference standard, we re-queried the Carolina Data Warehouse over the same time frame for all patients seen in the system (n=308,372) and calculated the sensitivity and specificity of the ICD-9 code 530.13 as a case definition of EoE. To attempt to refine the case definition, we added procedural codes in an iterative fashion to optimize sensitivity and specificity, and restricted our analysis to privately insured patients. We also conducted a sensitivity analysis with 2011 data to identify trends in the operating parameters of the code. We identified 226 cases of EoE at UNC to serve as the reference standard. The ICD-9 code 530.13 yielded a sensitivity of 37% (83/226; 95% confidence interval: 31-43%) and specificity of 99% (308,111/308,146; 95% confidence interval: 98-100%). These operating parameters were not substantially altered if the case definition required a procedure code for endoscopy or if cases were limited to those with commercial insurance. However, in 2011, the sensitivity of the code had increased to 61%, while the specificity remained at 99%. The ICD-9 code for EoE, 530.13, had excellent specificity for identifying cases of EoE in administrative data, although this high specificity was achieved at an academic center. Additionally, the sensitivity of the code appears to be increasing over time, and the threshold at which it will stabilize is not known. While use of this administrative code will still miss a number of cases, those identified in this manner are highly likely to have the disease.
- Published
- 2013
118. The Effect of Computer and Internet Attitudes and Anxiety on e-Health Search Behaviors
- Author
-
Pfaff, Emily R.
- Subjects
Health--Internet resources ,education ,Information retrieval--Social aspects ,Internet searching - Abstract
While the need for health information is a seemingly universal concept, comfort using computers and the Internet is not. Yet studies have shown that users within a large range of years of computer experience search the Internet for health information (“e-health information”) at ever-increasing rates. The purpose of the current study is to discover how a searcher’s attitudes toward and self-perception of their computer and Internet competence affect his or her e-health information-seeking behaviors. An online survey was distributed with questions that served to measure participants’ computer and Internet anxiety, as well as questions pertaining to their e-health attitudes and search behaviors. Participants’ anxiety levels had a statistically significant effect on participants’ (1) feeling that their e-health searches are generally successful (or unsuccessful), (2) satisfaction with the information obtained, and (3) tendency to share e-health information with a health care provider.
- Published
- 2011
- Full Text
- View/download PDF
119. Using EHR data and machine learning approach to facilitate the identification of patients with lung cancer from a pan-cancer cohort.
- Author
-
Yu, Yue, Ruddy, Kathryn Jean, Leventakos, Konstantinos, Liu, Bolun, Huo, Nan, Pachman, Deirdre R., Zong, Nansu, Xiao, Guohui, Chute, Christopher, Pfaff, Emily, Cheville, Andrea L., and Jiang, Guoqian
- Published
- 2023
- Full Text
- View/download PDF
120. Risk of post-acute sequelae of SARS-CoV-2 infection associated with pre-coronavirus disease obstructive sleep apnea diagnoses: an electronic health record-based analysis from the RECOVER initiative
- Author
-
L Mandel, Hannah, Colleen, Gunnar, Abedian, Sajjad, Ammar, Nariman, Charles Bailey, L, Bennett, Tellen D, Daniel Brannock, M, Brosnahan, Shari B, Chen, Yu, Chute, Christopher G, Divers, Jasmin, Evans, Michael D, Haendel, Melissa, Hall, Margaret A, Hirabayashi, Kathryn, Hornig, Mady, Katz, Stuart D, Krieger, Ana C, Loomba, Johanna, Lorman, Vitaly, Mazzotti, Diego R, McMurry, Julie, Moffitt, Richard A, Pajor, Nathan M, Pfaff, Emily, Radwell, Jeff, Razzaghi, Hanieh, Redline, Susan, Seibert, Elle, Sekar, Anisha, Sharma, Suchetha, Thaweethai, Tanayott, Weiner, Mark G, Jae Yoo, Yun, Zhou, Andrea, and Thorpe, Lorna E
- Published
- 2023
- Full Text
- View/download PDF
121. Can Eosinophilic Esophagitis be Reliably Diagnosed in Administrative Databases?
- Author
-
Rybnicek, David, primary, Pfaff, Emily, additional, Bulsiewicz, William, additional, Shaheen, Nicholas, additional, and Dellon, Evan, additional
- Published
- 2012
- Full Text
- View/download PDF
122. COVID-19 outcomes in persons with hemophilia: results from a US-based national COVID-19 surveillance registry
- Author
-
Sharathkumar, Anjali, Wendt, Linder, Ortman, Chris, Srinivasan, Ragha, Chute, Christopher G., Chrischilles, Elizabeth, Takemoto, Clifford M., Wilcox, Adam B., Lee, Adam M., Graves, Alexis, Anzalone, Alfred (Jerrod), Manna, Amin, Saha, Amit, Olex, Amy, Zhou, Andrea, Williams, Andrew E., Southerland, Andrew, Girvin, Andrew T., Walden, Anita, Sharathkumar, Anjali A., Amor, Benjamin, Bates, Benjamin, Hendricks, Brian, Patel, Brijesh, Alexander, Caleb, Bramante, Carolyn, Ward-Caviness, Cavin, Madlock-Brown, Charisse, Suver, Christine, Chute, Christopher, Dillon, Christopher, Wu, Chunlei, Schmitt, Clare, Takemoto, Cliff, Housman, Dan, Gabriel, Davera, Eichmann, David A., Mazzotti, Diego, Brown, Don, Boudreau, Eilis, Hill, Elaine, Zampino, Elizabeth, Marti, Emily Carlson, Pfaff, Emily R., French, Evan, Koraishy, Farrukh M., Mariona, Federico, Prior, Fred, Sokos, George, Martin, Greg, Lehmann, Harold, Spratt, Heidi, Mehta, Hemalkumar, Liu, Hongfang, Sidky, Hythem, Hayanga, J. W. Awori, Pincavitch, Jami, Clark, Jaylyn, Harper, Jeremy Richard, Islam, Jessica, Ge, Jin, Gagnier, Joel, Saltz, Joel H., Saltz, Joel, Loomba, Johanna, Buse, John, Mathew, Jomol, Rutter, Joni L., McMurry, Julie A., Guinney, Justin, Starren, Justin, Crowley, Karen, Bradwell, Katie Rebecca, Walters, Kellie M., Wilkins, Ken, Gersing, Kenneth R., Cato, Kenrick Dwain, Murray, Kimberly, Kostka, Kristin, Northington, Lavance, Pyles, Lee Allan, Misquitta, Leonie, Cottrell, Lesley, Portilla, Lili, Deacy, Mariam, Bissell, Mark M., Clark, Marshall, Emmett, Mary, Saltz, Mary Morrison, Palchuk, Matvey B., Haendel, Melissa A., Adams, Meredith, Temple-O’Connor, Meredith, Kurilla, Michael G., Morris, Michele, Qureshi, Nabeel, Safdar, Nasia, Garbarini, Nicole, Sharafeldin, Noha, Sadan, Ofer, Francis, Patricia A., Burgoon, Penny Wung, Robinson, Peter, Payne, Philip R.O., Fuentes, Rafael, Jawa, Randeep, Erwin-Cohen, Rebecca, Patel, Rena, Moffitt, Richard A., Zhu, Richard L., Kamaleswaran, Rishi, Hurley, Robert, Miller, Robert T., Pyarajan, Saiju, Michael, Sam G., Bozzette, Samuel, Mallipattu, Sandeep, Vedula, Satyanarayana, Chapman, Scott, O’Neil, Shawn T., Setoguchi, Soko, Hong, Stephanie S., Johnson, Steve, Bennett, Tellen D., Callahan, Tiffany, Topaloglu, Umit, Sheikh, Usman, Gordon, Valery, Subbian, Vignesh, Kibbe, Warren A., Hernandez, Wenndy, Beasley, Will, Cooper, Will, Hillegass, William, and Zhang, Xiaohan Tanner
- Abstract
Hypercoagulable state contributing to thrombotic complications worsens COVID-19 severity and outcomes, whereas anticoagulation improves outcomes by alleviating hypercoagulability.
- Published
- 2023
- Full Text
- View/download PDF
123. Additional file 1 of FHIR PIT: an open software application for spatiotemporal integration of clinical data and environmental exposures data
- Author
-
Xu, Hao, Cox, Steven, Stillwell, Lisa, Pfaff, Emily, Champion, James, Ahalt, Stanley C., and Fecho, Karamarie
- Subjects
3. Good health - Abstract
Additional file 1: Supplementary Table 1. ICEES integrated feature variable tables (v1.0.0, v2.0.0): variable names, descriptions, and binning strategy.*
124. Additional file 1 of FHIR PIT: an open software application for spatiotemporal integration of clinical data and environmental exposures data
- Author
-
Xu, Hao, Cox, Steven, Stillwell, Lisa, Pfaff, Emily, Champion, James, Ahalt, Stanley C., and Fecho, Karamarie
- Subjects
3. Good health - Abstract
Additional file 1: Supplementary Table 1. ICEES integrated feature variable tables (v1.0.0, v2.0.0): variable names, descriptions, and binning strategy.*
125. National COVID Cohort Collaborative Data Enhancements: A Path for Expanding Common Data Models.
- Author
-
Walters KM, Clark M, Dard S, Hong SS, Kelly E, Kostka K, Lee AM, Miller RT, Morris M, Palchuk MB, and Pfaff ER
- Abstract
Introduction: To support long COVID research in National COVID Cohort Collaborative (N3C), the N3C Phenotype and Data Acquisition team created data designs to aid contributing sites in enhancing their data. Enhancements include: long COVID specialty clinic indicator; Admission, Discharge, and Transfer (ADT) transactions; patient-level social determinants of health; and in-hospital use of oxygen supplementation., Methods: For each enhancement, we defined the scope and wrote guidance on how to prepare and populate the data in a standardized way., Results: As of June 2024, 29 sites have added at least one data enhancement to their N3C pipeline., Discussion: The use of common data models is critical to the success of N3C; however, these data models cannot account for all needs. Project-driven data enhancement is required. This should be done in a standardized way in alignment with CDM specifications. Our approach offers a useful pathway for enhancing data to improve fit for purpose., (© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2024
- Full Text
- View/download PDF
126. Crowd-sourced machine learning prediction of long COVID using data from the National COVID Cohort Collaborative.
- Author
-
Bergquist T, Loomba J, Pfaff E, Xia F, Zhao Z, Zhu Y, Mitchell E, Bhattacharya B, Shetty G, Munia T, Delong G, Tariq A, Butzin-Dozier Z, Ji Y, Li H, Coyle J, Shi S, Philips RV, Mertens A, Pirracchio R, van der Laan M, Colford JM Jr, Hubbard A, Gao J, Chen G, Velingker N, Li Z, Wu Y, Stein A, Huang J, Dai Z, Long Q, Naik M, Holmes J, Mowery D, Wong E, Parekh R, Getzen E, Hightower J, and Blase J
- Subjects
- Humans, United States epidemiology, Algorithms, Post-Acute COVID-19 Syndrome, Cohort Studies, Crowdsourcing, COVID-19 epidemiology, Machine Learning, SARS-CoV-2 isolation & purification
- Abstract
Background: While many patients seem to recover from SARS-CoV-2 infections, many patients report experiencing SARS-CoV-2 symptoms for weeks or months after their acute COVID-19 ends, even developing new symptoms weeks after infection. These long-term effects are called post-acute sequelae of SARS-CoV-2 (PASC) or, more commonly, Long COVID. The overall prevalence of Long COVID is currently unknown, and tools are needed to help identify patients at risk for developing long COVID., Methods: A working group of the Rapid Acceleration of Diagnostics-radical (RADx-rad) program, comprised of individuals from various NIH institutes and centers, in collaboration with REsearching COVID to Enhance Recovery (RECOVER) developed and organized the Long COVID Computational Challenge (L3C), a community challenge aimed at incentivizing the broader scientific community to develop interpretable and accurate methods for identifying patients at risk of developing Long COVID. From August 2022 to December 2022, participants developed Long COVID risk prediction algorithms using the National COVID Cohort Collaborative (N3C) data enclave, a harmonized data repository from over 75 healthcare institutions from across the United States (U.S.)., Findings: Over the course of the challenge, 74 teams designed and built 35 Long COVID prediction models using the N3C data enclave. The top 10 teams all scored above a 0.80 Area Under the Receiver Operator Curve (AUROC) with the highest scoring model achieving a mean AUROC of 0.895. Included in the top submission was a visualization dashboard that built timelines for each patient, updating the risk of a patient developing Long COVID in response to clinical events., Interpretation: As a result of L3C, federal reviewers identified multiple machine learning models that can be used to identify patients at risk for developing Long COVID. Many of the teams used approaches in their submissions which can be applied to future clinical prediction questions., Funding: Research reported in this RADx® Rad publication was supported by the National Institutes of Health. Timothy Bergquist, Johanna Loomba, and Emily Pfaff were supported by Axle Subcontract: NCATS-STSS-P00438., Competing Interests: Declaration of interests Danielle Mowery serves as an unpaid member of the Epic Cosmos Governing Council. Romain Pirracchio received funding from the FDA CERSI grant U01FD005978 and the PCORI grant P0562155 and received a consulting honorarium from Phillips. Martin van der Laan received funding from the NIAID grant 5R01AI074345. Johanna Loomba received contract funding from the NIH RECOVER program. Emily Pfaff received funding from the NIH and PCORI. The views expressed in this manuscript are solely those of the authors and do not necessarily represent those of the National Institutes of Health, the U.S. Department of Health and Human Services or the U.S. government. Qi Long was supported by grants from the NIH., (Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
127. A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation.
- Author
-
Wen A, Wang L, He H, Fu S, Liu S, Hanauer DA, Harris DR, Kavuluru R, Zhang R, Natarajan K, Pavinkurve NP, Hajagos J, Rajupet S, Lingam V, Saltz M, Elowsky C, Moffitt RA, Koraishy FM, Palchuk MB, Donovan J, Lingrey L, Stone-DerHagopian G, Miller RT, Williams AE, Leese PJ, Kovach PI, Pfaff ER, Zemmel M, Pates RD, Guthe N, Haendel MA, Chute CG, and Liu H
- Abstract
Background: A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC)., Objective: This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC., Methods: We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm., Results: An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites., Conclusions: The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement., (©Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Sijia Liu, David A Hanauer, Daniel R Harris, Ramakanth Kavuluru, Rui Zhang, Karthik Natarajan, Nishanth P Pavinkurve, Janos Hajagos, Sritha Rajupet, Veena Lingam, Mary Saltz, Corey Elowsky, Richard A Moffitt, Farrukh M Koraishy, Matvey B Palchuk, Jordan Donovan, Lora Lingrey, Garo Stone-DerHagopian, Robert T Miller, Andrew E Williams, Peter J Leese, Paul I Kovach, Emily R Pfaff, Mikhail Zemmel, Robert D Pates, Nick Guthe, Melissa A Haendel, Christopher G Chute, Hongfang Liu, National COVID Cohort Collaborative, The RECOVER Initiative. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 09.09.2024.)
- Published
- 2024
- Full Text
- View/download PDF
128. Effect of Paxlovid Treatment During Acute COVID-19 on Long COVID Onset: An EHR-Based Target Trial Emulation from the N3C and RECOVER Consortia.
- Author
-
Preiss A, Bhatia A, Aragon LV, Baratta JM, Baskaran M, Blancero F, Brannock MD, Chew RF, Diaz I, Fitzgerald M, Kelly EP, Zhou AG, Carton TW, Chute CG, Haendel M, Moffitt R, and Pfaff E
- Abstract
Preventing and treating post-acute sequelae of SARS-CoV-2 infection (PASC), commonly known as Long COVID, has become a public health priority. In this study, we examined whether treatment with Paxlovid in the acute phase of COVID-19 helps prevent the onset of PASC. We used electronic health records from the National Covid Cohort Collaborative (N3C) to define a cohort of 426,352 patients who had COVID-19 since April 1, 2022, and were eligible for Paxlovid treatment due to risk for progression to severe COVID-19. We used the target trial emulation (TTE) framework to estimate the effect of Paxlovid treatment on PASC incidence. We estimated overall PASC incidence using a computable phenotype. We also measured the onset of novel cognitive, fatigue, and respiratory symptoms in the post-acute period. Paxlovid treatment did not have a significant effect on overall PASC incidence (relative risk [RR] = 0.98, 95% confidence interval [CI] 0.95-1.01). However, it had a protective effect on cognitive (RR = 0.90, 95% CI 0.84-0.96) and fatigue (RR = 0.95, 95% CI 0.91-0.98) symptom clusters, which suggests that the etiology of these symptoms may be more closely related to viral load than that of respiratory symptoms.
- Published
- 2024
- Full Text
- View/download PDF
129. Finding Long-COVID: Temporal Topic Modeling of Electronic Health Records from the N3C and RECOVER Programs.
- Author
-
O'Neil ST, Madlock-Brown C, Wilkins KJ, McGrath BM, Davis HE, Assaf GS, Wei H, Zareie P, French ET, Loomba J, McMurry JA, Zhou A, Chute CG, Moffitt RA, Pfaff ER, Yoo YJ, Leese P, Chew RF, Lieberman M, and Haendel MA
- Abstract
Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection. We found many conditions increased in COVID-19 patients compared to controls, and using a novel method to associate patients with clusters over time, we additionally found phenotypes specific to patient sex, age, wave of infection, and PASC diagnosis status. While many of these results reflect known PASC symptoms, the resolution provided by this unprecedented data scale suggests avenues for improved diagnostics and mechanistic understanding of this multifaceted disease., Competing Interests: Competing Interests The authors declare no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
130. Increased Incidence of Vestibular Disorders in Patients With SARS-CoV-2.
- Author
-
Lee L, French E, Coelho DH, Manzoor NF, Wilcox AB, Lee AM, Graves A, Anzalone A, Manna A, Saha A, Olex A, Zhou A, Williams AE, Southerland A, Girvin AT, Walden A, Sharathkumar AA, Amor B, Bates B, Hendricks B, Patel B, Alexander C, Bramante C, Ward-Caviness C, Madlock-Brown C, Suver C, Chute C, Dillon C, Wu C, Schmitt C, Takemoto C, Housman D, Gabriel D, Eichmann DA, Mazzotti D, Brown D, Boudreau E, Hill E, Zampino E, Marti EC, Pfaff ER, French E, Koraishy FM, Mariona F, Prior F, Sokos G, Martin G, Lehmann H, Spratt H, Mehta H, Liu H, Sidky H, Awori Hayanga JW, Pincavitch J, Clark J, Harper JR, Islam J, Ge J, Gagnier J, Saltz JH, Saltz J, Loomba J, Buse J, Mathew J, Rutter JL, McMurry JA, Guinney J, Starren J, Crowley K, Bradwell KR, Walters KM, Wilkins K, Gersing KR, Cato KD, Murray K, Kostka K, Northington L, Pyles LA, Misquitta L, Cottrell L, Portilla L, Deacy M, Bissell MM, Clark M, Emmett M, Saltz MM, Palchuk MB, Haendel MA, Adams M, Temple-O'Connor M, Kurilla MG, Morris M, Qureshi N, Safdar N, Garbarini N, Sharafeldin N, Sadan O, Francis PA, Burgoon PW, Robinson P, Payne PRO, Fuentes R, Jawa R, Erwin-Cohen R, Patel R, Moffitt RA, Zhu RL, Kamaleswaran R, Hurley R, Miller RT, Pyarajan S, Michael SG, Bozzette S, Mallipattu S, Vedula S, Chapman S, O'Neil ST, Setoguchi S, Hong SS, Johnson S, Bennett TD, Callahan T, Topaloglu U, Sheikh U, Gordon V, Subbian V, Kibbe WA, Hernandez W, Beasley W, Cooper W, Hillegass W, and Zhang XT
- Abstract
Objective: Determine the incidence of vestibular disorders in patients with SARS-CoV-2 compared to the control population., Study Design: Retrospective., Setting: Clinical data in the National COVID Cohort Collaborative database (N3C)., Methods: Deidentified patient data from the National COVID Cohort Collaborative database (N3C) were queried based on variant peak prevalence (untyped, alpha, delta, omicron 21K, and omicron 23A) from covariants.org to retrospectively analyze the incidence of vestibular disorders in patients with SARS-CoV-2 compared to control population, consisting of patients without documented evidence of COVID infection during the same period., Results: Patients testing positive for COVID-19 were significantly more likely to have a vestibular disorder compared to the control population. Compared to control patients, the odds ratio of vestibular disorders was significantly elevated in patients with untyped (odds ratio [OR], 2.39; confidence intervals [CI], 2.29-2.50; P < 0.001), alpha (OR, 3.63; CI, 3.48-3.78; P < 0.001), delta (OR, 3.03; CI, 2.94-3.12; P < 0.001), omicron 21K variant (OR, 2.97; CI, 2.90-3.04; P < 0.001), and omicron 23A variant (OR, 8.80; CI, 8.35-9.27; P < 0.001)., Conclusions: The incidence of vestibular disorders differed between COVID-19 variants and was significantly elevated in COVID-19-positive patients compared to the control population. These findings have implications for patient counseling and further research is needed to discern the long-term effects of these findings., Competing Interests: None declared., (Copyright © 2024 The Authors. Published by Wolters Kluwer Health, Inc. on behalf of Otology & Neurotology, Inc.)
- Published
- 2024
- Full Text
- View/download PDF
131. Genetic and Survey Data Improves Performance of Machine Learning Model for Long COVID.
- Author
-
Wei WQ, Guardo C, Gandireddy S, Yan C, Ong H, Kerchberger V, Dickson A, Pfaff E, Master H, Basford M, Tran N, Mancuso S, Syed T, Zhao Z, Feng Q, Haendel M, Lunt C, Ginsburg G, Chute C, Denny J, and Roden D
- Abstract
Over 200 million SARS-CoV-2 patients have or will develop persistent symptoms (long COVID). Given this pressing research priority, the National COVID Cohort Collaborative (N3C) developed a machine learning model using only electronic health record data to identify potential patients with long COVID. We hypothesized that additional data from health surveys, mobile devices, and genotypes could improve prediction ability. In a cohort of SARS-CoV-2 infected individuals (n=17,755) in the All of Us program, we applied and expanded upon the N3C long COVID prediction model, testing machine learning infrastructures, assessing model performance, and identifying factors that contributed most to the prediction models. For the survey/mobile device information and genetic data, extreme gradient boosting and a convolutional neural network delivered the best performance for predicting long COVID, respectively. Combined survey, genetic, and mobile data increased specificity and the Area Under Curve the Receiver Operating Characteristic score versus the original N3C model., Competing Interests: Declarations The authors declared no competing interests for this work.
- Published
- 2023
- Full Text
- View/download PDF
132. Effect of Nirmatrelvir/Ritonavir (Paxlovid) on Hospitalization among Adults with COVID-19: an EHR-based Target Trial Emulation from N3C.
- Author
-
Bhatia A, Preiss AJ, Xiao X, Brannock MD, Alexander GC, Chew RF, Fitzgerald M, Hill E, Kelly EP, Mehta HB, Madlock-Brown C, Wilkins KJ, Chute CG, Haendel M, Moffitt R, and Pfaff ER
- Abstract
This study leverages electronic health record data in the National COVID Cohort Collaborative's (N3C) repository to investigate disparities in Paxlovid treatment and to emulate a target trial assessing its effectiveness in reducing COVID-19 hospitalization rates. From an eligible population of 632,822 COVID-19 patients seen at 33 clinical sites across the United States between December 23, 2021 and December 31, 2022, patients were matched across observed treatment groups, yielding an analytical sample of 410,642 patients. We estimate a 65% reduced odds of hospitalization among Paxlovid-treated patients within a 28-day follow-up period, and this effect did not vary by patient vaccination status. Notably, we observe disparities in Paxlovid treatment, with lower rates among Black and Hispanic or Latino patients, and within socially vulnerable communities. Ours is the largest study of Paxlovid's real-world effectiveness to date, and our primary findings are consistent with previous randomized control trials and real-world studies., Competing Interests: Competing Interests No authors have competing interests or disclosures to report.
- Published
- 2023
- Full Text
- View/download PDF
133. Pre-existing autoimmunity is associated with increased severity of COVID-19: A retrospective cohort study using data from the National COVID Cohort Collaborative (N3C).
- Author
-
Yadaw AS, Afzali B, Hotaling N, Sidky H, Pfaff ER, Sahner DK, and Mathé EA
- Abstract
Importance: Identifying individuals with a higher risk of developing severe COVID-19 outcomes will inform targeted or more intensive clinical monitoring and management., Objective: To examine, using data from the National COVID Cohort Collaborative (N3C), whether patients with pre-existing autoimmune disease (AID) diagnosis and/or immunosuppressant (IS) exposure are at a higher risk of developing severe COVID-19 outcomes., Design Setting and Participants: A retrospective cohort of 2,453,799 individuals diagnosed with COVID-19 between January 1
st , 2020, and June 30th , 2022, was created from the N3C data enclave, which comprises data of 15,231,849 patients from 75 USA data partners. Patients were stratified as those with/without a pre-existing diagnosis of AID and/or those with/without exposure to IS prior to COVID-19., Main Outcomes and Measures: Two outcomes of COVID-19 severity, derived from the World Health Organization severity score, were defined, namely life-threatening disease and hospitalization. Odds ratios (ORs) with 95% confidence intervals (CIs) were calculated using logistic regression models with and without adjustment for demographics (age, BMI, gender, race, ethnicity, smoking status), and comorbidities (cardiovascular disease, dementia, pulmonary disease, liver disease, type 2 diabetes mellitus, kidney disease, cancer, and HIV infection)., Results: In total, 2,453,799 (16.11% of the N3C cohort) adults (age> 18 years) were diagnosed with COVID-19, of which 191,520 (7.81%) had a prior AID diagnosis, and 278,095 (11.33%) had a prior IS exposure. Logistic regression models adjusted for demographic factors and comorbidities demonstrated that individuals with a prior AID (OR = 1.13, 95% CI 1.09 - 1.17; p =2.43E-13), prior exposure to IS (OR= 1.27, 95% CI 1.24 - 1.30; p =3.66E-74), or both (OR= 1.35, 95% CI 1.29 - 1.40; p =7.50E-49) were more likely to have a life-threatening COVID-19 disease. These results were confirmed after adjusting for exposure to antivirals and vaccination in a cohort subset with COVID-19 diagnosis dates after December 2021 (AID OR = 1.18, 95% CI 1.02 - 1.36; p =2.46E-02; IS OR= 1.60, 95% CI 1.41 - 1.80; p =5.11E-14; AID+IS OR= 1.93, 95% CI 1.62 - 2.30; p =1.68E-13). These results were consistent when evaluating hospitalization as the outcome and also when stratifying by race and sex. Finally, a sensitivity analysis evaluating specific IS revealed that TNF inhibitors were protective against life-threatening disease (OR = 0.80, 95% CI 0.66-0.96; p =1.66E-2) and hospitalization (OR = 0.80, 95% CI 0.73 - 0.89; p =1.06E-05)., Conclusions and Relevance: Patients with pre-existing AID, exposure to IS, or both are more likely to have a life-threatening disease or hospitalization. These patients may thus require tailored monitoring and preventative measures to minimize negative consequences of COVID-19.- Published
- 2023
- Full Text
- View/download PDF
134. Long COVID Risk and Pre-COVID Vaccination: An EHR-Based Cohort Study from the RECOVER Program.
- Author
-
Brannock MD, Chew RF, Preiss AJ, Hadley EC, McMurry JA, Leese PJ, Girvin AT, Crosskey M, Zhou AG, Moffitt RA, Funk MJ, Pfaff ER, Haendel MA, and Chute CG
- Abstract
Importance: Characterizing the effect of vaccination on long COVID allows for better healthcare recommendations., Objective: To determine if, and to what degree, vaccination prior to COVID-19 is associated with eventual long COVID onset, among those a documented COVID-19 infection., Design Settings and Participants: Retrospective cohort study of adults with evidence of COVID-19 between August 1, 2021 and January 31, 2022 based on electronic health records from eleven healthcare institutions taking part in the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, a project of the National Covid Cohort Collaborative (N3C)., Exposures: Pre-COVID-19 receipt of a complete vaccine series versus no pre-COVID-19 vaccination., Main Outcomes and Measures: Two approaches to the identification of long COVID were used. In the clinical diagnosis cohort (n=47,752), ICD-10 diagnosis codes or evidence of a healthcare encounter at a long COVID clinic were used. In the model-based cohort (n=199,498), a computable phenotype was used. The association between pre-COVID vaccination and long COVID was estimated using IPTW-adjusted logistic regression and Cox proportional hazards., Results: In both cohorts, when adjusting for demographics and medical history, pre-COVID vaccination was associated with a reduced risk of long COVID (clinic-based cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; model-based cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75)., Conclusions and Relevance: Long COVID has become a central concern for public health experts. Prior studies have considered the effect of vaccination on the prevalence of future long COVID symptoms, but ours is the first to thoroughly characterize the association between vaccination and clinically diagnosed or computationally derived long COVID. Our results bolster the growing consensus that vaccines retain protective effects against long COVID even in breakthrough infections., Key Points: Question: Does vaccination prior to COVID-19 onset change the risk of long COVID diagnosis? Findings: Four observational analyses of EHRs showed a statistically significant reduction in long COVID risk associated with pre-COVID vaccination (first cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; second cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75). Meaning: Vaccination prior to COVID onset has a protective association with long COVID even in the case of breakthrough infections.
- Published
- 2022
- Full Text
- View/download PDF
135. FHIR-Ontop-OMOP: Building clinical knowledge graphs in FHIR RDF with the OMOP Common data Model.
- Author
-
Xiao G, Pfaff E, Prud'hommeaux E, Booth D, Sharma DK, Huo N, Yu Y, Zong N, Ruddy KJ, Chute CG, and Jiang G
- Subjects
- Data Warehousing, Delivery of Health Care, Electronic Health Records, Humans, Artificial Intelligence, Pattern Recognition, Automated
- Abstract
Background: Knowledge graphs (KGs) play a key role to enable explainable artificial intelligence (AI) applications in healthcare. Constructing clinical knowledge graphs (CKGs) against heterogeneous electronic health records (EHRs) has been desired by the research and healthcare AI communities. From the standardization perspective, community-based standards such as the Fast Healthcare Interoperability Resources (FHIR) and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) are increasingly used to represent and standardize EHR data for clinical data analytics, however, the potential of such a standard on building CKG has not been well investigated., Objective: To develop and evaluate methods and tools that expose the OMOP CDM-based clinical data repositories into virtual clinical KGs that are compliant with FHIR Resource Description Framework (RDF) specification., Methods: We developed a system called FHIR-Ontop-OMOP to generate virtual clinical KGs from the OMOP relational databases. We leveraged an OMOP CDM-based Medical Information Mart for Intensive Care (MIMIC-III) data repository to evaluate the FHIR-Ontop-OMOP system in terms of the faithfulness of data transformation and the conformance of the generated CKGs to the FHIR RDF specification., Results: A beta version of the system has been released. A total of more than 100 data element mappings from 11 OMOP CDM clinical data, health system and vocabulary tables were implemented in the system, covering 11 FHIR resources. The generated virtual CKG from MIMIC-III contains 46,520 instances of FHIR Patient, 716,595 instances of Condition, 1,063,525 instances of Procedure, 24,934,751 instances of MedicationStatement, 365,181,104 instances of Observations, and 4,779,672 instances of CodeableConcept. Patient counts identified by five pairs of SQL (over the MIMIC database) and SPARQL (over the virtual CKG) queries were identical, ensuring the faithfulness of the data transformation. Generated CKG in RDF triples for 100 patients were fully conformant with the FHIR RDF specification., Conclusion: The FHIR-Ontop-OMOP system can expose OMOP database as a FHIR-compliant RDF graph. It provides a meaningful use case demonstrating the potentials that can be enabled by the interoperability between FHIR and OMOP CDM. Generated clinical KGs in FHIR RDF provide a semantic foundation to enable explainable AI applications in healthcare., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
136. Coding Long COVID: Characterizing a new disease through an ICD-10 lens.
- Author
-
Pfaff ER, Madlock-Brown C, Baratta JM, Bhatia A, Davis H, Girvin A, Hill E, Kelly L, Kostka K, Loomba J, McMurry JA, Wong R, Bennett TD, Moffitt R, Chute CG, and Haendel M
- Abstract
Background: Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes Long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of Long COVID are still in flux, and the deployment of an ICD-10-CM code for Long COVID in the US took nearly two years after patients had begun to describe their condition. Here we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified.", Methods: We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code ( n = 21,072), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan., Results: We established the diagnoses most commonly co-occurring with U09.9, and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty, high education, and high access to medical care. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients., Conclusions: This work offers insight into potential subtypes and current practice patterns around Long COVID, and speaks to the existence of disparities in the diagnosis of patients with Long COVID. This latter finding in particular requires further research and urgent remediation.
- Published
- 2022
- Full Text
- View/download PDF
137. Risk Factors Associated with Post-Acute Sequelae of SARS-CoV-2 in an EHR Cohort: A National COVID Cohort Collaborative (N3C) Analysis as part of the NIH RECOVER program.
- Author
-
Hill E, Mehta H, Sharma S, Mane K, Xie C, Cathey E, Loomba J, Russell S, Spratt H, DeWitt PE, Ammar N, Madlock-Brown C, Brown D, McMurry JA, Chute CG, Haendel MA, Moffitt R, Pfaff ER, and Bennett TD
- Abstract
Background: More than one-third of individuals experience post-acute sequelae of SARS-CoV-2 infection (PASC, which includes long-COVID)., Objective: To identify risk factors associated with PASC/long-COVID., Design: Retrospective case-control study., Setting: 31 health systems in the United States from the National COVID Cohort Collaborative (N3C)., Patients: 8,325 individuals with PASC (defined by the presence of the International Classification of Diseases, version 10 code U09.9 or a long-COVID clinic visit) matched to 41,625 controls within the same health system., Measurements: Risk factors included demographics, comorbidities, and treatment and acute characteristics related to COVID-19. Multivariable logistic regression, random forest, and XGBoost were used to determine the associations between risk factors and PASC., Results: Among 8,325 individuals with PASC, the majority were >50 years of age (56.6%), female (62.8%), and non-Hispanic White (68.6%). In logistic regression, middle-age categories (40 to 69 years; OR ranging from 2.32 to 2.58), female sex (OR 1.4, 95% CI 1.33-1.48), hospitalization associated with COVID-19 (OR 3.8, 95% CI 3.05-4.73), long (8-30 days, OR 1.69, 95% CI 1.31-2.17) or extended hospital stay (30+ days, OR 3.38, 95% CI 2.45-4.67), receipt of mechanical ventilation (OR 1.44, 95% CI 1.18-1.74), and several comorbidities including depression (OR 1.50, 95% CI 1.40-1.60), chronic lung disease (OR 1.63, 95% CI 1.53-1.74), and obesity (OR 1.23, 95% CI 1.16-1.3) were associated with increased likelihood of PASC diagnosis or care at a long-COVID clinic. Characteristics associated with a lower likelihood of PASC diagnosis or care at a long-COVID clinic included younger age (18 to 29 years), male sex, non-Hispanic Black race, and comorbidities such as substance abuse, cardiomyopathy, psychosis, and dementia. More doctors per capita in the county of residence was associated with an increased likelihood of PASC diagnosis or care at a long-COVID clinic. Our findings were consistent in sensitivity analyses using a variety of analytic techniques and approaches to select controls., Conclusions: This national study identified important risk factors for PASC such as middle age, severe COVID-19 disease, and specific comorbidities. Further clinical and epidemiological research is needed to better understand underlying mechanisms and the potential role of vaccines and therapeutics in altering PASC course.
- Published
- 2022
- Full Text
- View/download PDF
138. NSAID use and clinical outcomes in COVID-19 patients: a 38-center retrospective cohort study.
- Author
-
Reese JT, Coleman B, Chan L, Blau H, Callahan TJ, Cappelletti L, Fontana T, Bradwell KR, Harris NL, Casiraghi E, Valentini G, Karlebach G, Deer R, McMurry JA, Haendel MA, Chute CG, Pfaff E, Moffitt R, Spratt H, Singh JA, Mungall CJ, Williams AE, and Robinson PN
- Subjects
- Anti-Inflammatory Agents, Non-Steroidal adverse effects, COVID-19 Testing, Cohort Studies, Humans, Pandemics, Retrospective Studies, Acute Kidney Injury, COVID-19
- Abstract
Background: Non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used to reduce pain, fever, and inflammation but have been associated with complications in community-acquired pneumonia. Observations shortly after the start of the COVID-19 pandemic in 2020 suggested that ibuprofen was associated with an increased risk of adverse events in COVID-19 patients, but subsequent observational studies failed to demonstrate increased risk and in one case showed reduced risk associated with NSAID use., Methods: A 38-center retrospective cohort study was performed that leveraged the harmonized, high-granularity electronic health record data of the National COVID Cohort Collaborative. A propensity-matched cohort of 19,746 COVID-19 inpatients was constructed by matching cases (treated with NSAIDs at the time of admission) and 19,746 controls (not treated) from 857,061 patients with COVID-19 available for analysis. The primary outcome of interest was COVID-19 severity in hospitalized patients, which was classified as: moderate, severe, or mortality/hospice. Secondary outcomes were acute kidney injury (AKI), extracorporeal membrane oxygenation (ECMO), invasive ventilation, and all-cause mortality at any time following COVID-19 diagnosis., Results: Logistic regression showed that NSAID use was not associated with increased COVID-19 severity (OR: 0.57 95% CI: 0.53-0.61). Analysis of secondary outcomes using logistic regression showed that NSAID use was not associated with increased risk of all-cause mortality (OR 0.51 95% CI: 0.47-0.56), invasive ventilation (OR: 0.59 95% CI: 0.55-0.64), AKI (OR: 0.67 95% CI: 0.63-0.72), or ECMO (OR: 0.51 95% CI: 0.36-0.7). In contrast, the odds ratios indicate reduced risk of these outcomes, but our quantitative bias analysis showed E-values of between 1.9 and 3.3 for these associations, indicating that comparatively weak or moderate confounder associations could explain away the observed associations., Conclusions: Study interpretation is limited by the observational design. Recording of NSAID use may have been incomplete. Our study demonstrates that NSAID use is not associated with increased COVID-19 severity, all-cause mortality, invasive ventilation, AKI, or ECMO in COVID-19 inpatients. A conservative interpretation in light of the quantitative bias analysis is that there is no evidence that NSAID use is associated with risk of increased severity or the other measured outcomes. Our results confirm and extend analogous findings in previous observational studies using a large cohort of patients drawn from 38 centers in a nationally representative multicenter database., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
139. Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration.
- Author
-
Yu Y, Zong N, Wen A, Liu S, Stone DJ, Knaack D, Chamberlain AM, Pfaff E, Gabriel D, Chute CG, Shah N, and Jiang G
- Subjects
- Databases, Factual, Electronic Health Records, Humans, Information Storage and Retrieval, Pandemics, SARS-CoV-2, COVID-19 epidemiology
- Abstract
Objective: The large-scale collection of observational data and digital technologies could help curb the COVID-19 pandemic. However, the coexistence of multiple Common Data Models (CDMs) and the lack of data extract, transform, and load (ETL) tool between different CDMs causes potential interoperability issue between different data systems. The objective of this study is to design, develop, and evaluate an ETL tool that transforms the PCORnet CDM format data into the OMOP CDM., Methods: We developed an open-source ETL tool to facilitate the data conversion from the PCORnet CDM and the OMOP CDM. The ETL tool was evaluated using a dataset with 1000 patients randomly selected from the PCORnet CDM at Mayo Clinic. Information loss, data mapping accuracy, and gap analysis approaches were conducted to assess the performance of the ETL tool. We designed an experiment to conduct a real-world COVID-19 surveillance task to assess the feasibility of the ETL tool. We also assessed the capacity of the ETL tool for the COVID-19 data surveillance using data collection criteria of the MN EHR Consortium COVID-19 project., Results: After the ETL process, all the records of 1000 patients from 18 PCORnet CDM tables were successfully transformed into 12 OMOP CDM tables. The information loss for all the concept mapping was less than 0.61%. The string mapping process for the unit concepts lost 2.84% records. Almost all the fields in the manual mapping process achieved 0% information loss, except the specialty concept mapping. Moreover, the mapping accuracy for all the fields were 100%. The COVID-19 surveillance task collected almost the same set of cases (99.3% overlaps) from the original PCORnet CDM and target OMOP CDM separately. Finally, all the data elements for MN EHR Consortium COVID-19 project could be captured from both the PCORnet CDM and the OMOP CDM., Conclusion: We demonstrated that our ETL tool could satisfy the data conversion requirements between the PCORnet CDM and the OMOP CDM. The outcome of the work would facilitate the data retrieval, communication, sharing, and analysis between different institutions for not only COVID-19 related project, but also other real-world evidence-based observational studies., (Copyright © 2022 Elsevier Inc. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
140. NSAID use and clinical outcomes in COVID-19 patients: A 38-center retrospective cohort study.
- Author
-
Reese JT, Coleman B, Chan L, Blau H, Callahan TJ, Cappelletti L, Fontana T, Bradwell KR, Harris NL, Casiraghi E, Valentini G, Karlebach G, Deer R, McMurry JA, Haendel MA, Chute CG, Pfaff E, Moffitt R, Spratt H, Singh J, Mungall CJ, Williams AE, and Robinson PN
- Abstract
Background: Non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used to reduce pain, fever, and inflammation but have been associated with complications in community-acquired pneumonia. Observations shortly after the start of the COVID-19 pandemic in 2020 suggested that ibuprofen was associated with an increased risk of adverse events in COVID-19 patients, but subsequent observational studies failed to demonstrate increased risk and in one case showed reduced risk associated with NSAID use., Methods: A 38-center retrospective cohort study was performed that leveraged the harmonized, high-granularity electronic health record data of the National COVID Cohort Collaborative. A propensity-matched cohort of COVID-19 inpatients was constructed by matching cases (treated with NSAIDs) and controls (not treated) from 857,061 patients with COVID-19. The primary outcome of interest was COVID-19 severity in hospitalized patients, which was classified as: moderate, severe, or mortality/hospice. Secondary outcomes were acute kidney injury (AKI), extracorporeal membrane oxygenation (ECMO), invasive ventilation, and all-cause mortality at any time following COVID-19 diagnosis., Results: Logistic regression showed that NSAID use was not associated with increased COVID-19 severity (OR: 0.57 95% CI: 0.53-0.61). Analysis of secondary outcomes using logistic regression showed that NSAID use was not associated with increased risk of all-cause mortality (OR 0.51 95% CI: 0.47-0.56), invasive ventilation (OR: 0.59 95% CI: 0.55-0.64), AKI (OR: 0.67 95% CI: 0.63-0.72), or ECMO (OR: 0.51 95% CI: 0.36-0.7). In contrast, the odds ratios indicate reduced risk of these outcomes, but our quantitative bias analysis showed E-values of between 1.9 and 3.3 for these associations, indicating that comparatively weak or moderate confounder associations could explain away the observed associations., Conclusions: Study interpretation is limited by the observational design. Recording of NSAID use may have been incomplete. Our study demonstrates that NSAID use is not associated with increased COVID-19 severity, all-cause mortality, invasive ventilation, AKI, or ECMO in COVID-19 inpatients. A conservative interpretation in light of the quantitative bias analysis is that there is no evidence that NSAID use is associated with risk of increased severity or the other measured outcomes. Our findings are the largest EHR-based analysis of the effect of NSAIDs on outcome in COVID-19 patients to date. Our results confirm and extend analogous findings in previous observational studies using a large cohort of patients drawn from 38 centers in a nationally representative multicenter database.
- Published
- 2021
- Full Text
- View/download PDF
141. Enabling Longitudinal Exploratory Analysis of Clinical COVID Data.
- Author
-
Borland D, Brain I, Fecho K, Pfaff E, Xu H, Champion J, Bizon C, and Gotz D
- Abstract
As the COVID-19 pandemic continues to impact the world, data is being gathered and analyzed to better understand the disease. Recognizing the potential for visual analytics technologies to support exploratory analysis and hypothesis generation from longitudinal clinical data, a team of collaborators worked to apply existing event sequence visual analytics technologies to a longitudinal clinical data from a cohort of 998 patients with high rates of COVID-19 infection. This paper describes the initial steps toward this goal, including: (1) the data transformation and processing work required to prepare the data for visual analysis, (2) initial findings and observations, and (3) qualitative feedback and lessons learned which highlight key features as well as limitations to address in future work.
- Published
- 2021
142. Children with SARS-CoV-2 in the National COVID Cohort Collaborative (N3C).
- Author
-
Martin B, DeWitt PE, Russell S, Anand A, Bradwell KR, Bremer C, Gabriel D, Girvin AT, Hajagos JG, McMurry JA, Neumann AJ, Pfaff ER, Walden A, Wooldridge JT, Yoo YJ, Saltz J, Gersing KR, Chute CG, Haendel MA, Moffitt R, and Bennett TD
- Abstract
Importance: SARS-CoV-2., Objective: To determine the characteristics, changes over time, outcomes, and severity risk factors of SARS-CoV-2 affected children within the National COVID Cohort Collaborative (N3C)., Design: Prospective cohort study of patient encounters with end dates before May 27th, 2021., Setting: 45 N3C institutions., Participants: Children <19-years-old at initial SARS-CoV-2 testing., Main Outcomes and Measures: Case incidence and severity over time, demographic and comorbidity severity risk factors, vital sign and laboratory trajectories, clinical outcomes, and acute COVID-19 vs MIS-C contrasts for children infected with SARS-CoV-2., Results: 728,047 children in the N3C were tested for SARS-CoV-2; of these, 91,865 (12.6%) were positive. Among the 5,213 (6%) hospitalized children, 685 (13%) met criteria for severe disease: mechanical ventilation (7%), vasopressor/inotropic support (7%), ECMO (0.6%), or death/discharge to hospice (1.1%). Male gender, African American race, older age, and several pediatric complex chronic condition (PCCC) subcategories were associated with higher clinical severity (p ≤ 0.05). Vital signs (all p≤0.002) and many laboratory tests from the first day of hospitalization were predictive of peak disease severity. Children with severe (vs moderate) disease were more likely to receive antimicrobials (71% vs 32%, p<0.001) and immunomodulatory medications (53% vs 16%, p<0.001). Compared to those with acute COVID-19, children with MIS-C were more likely to be male, Black/African American, 1-to-12-years-old, and less likely to have asthma, diabetes, or a PCCC (p < 0.04). MIS-C cases demonstrated a more inflammatory laboratory profile and more severe clinical phenotype with higher rates of invasive ventilation (12% vs 6%) and need for vasoactive-inotropic support (31% vs 6%) compared to acute COVID-19 cases, respectively (p<0.03)., Conclusions: In the largest U.S. SARS-CoV-2-positive pediatric cohort to date, we observed differences in demographics, pre-existing comorbidities, and initial vital sign and laboratory test values between severity subgroups. Taken together, these results suggest that early identification of children likely to progress to severe disease could be achieved using readily available data elements from the day of admission. Further work is needed to translate this knowledge into improved outcomes.
- Published
- 2021
- Full Text
- View/download PDF
143. Clinical Characterization and Prediction of Clinical Severity of SARS-CoV-2 Infection Among US Adults Using Data From the US National COVID Cohort Collaborative.
- Author
-
Bennett TD, Moffitt RA, Hajagos JG, Amor B, Anand A, Bissell MM, Bradwell KR, Bremer C, Byrd JB, Denham A, DeWitt PE, Gabriel D, Garibaldi BT, Girvin AT, Guinney J, Hill EL, Hong SS, Jimenez H, Kavuluru R, Kostka K, Lehmann HP, Levitt E, Mallipattu SK, Manna A, McMurry JA, Morris M, Muschelli J, Neumann AJ, Palchuk MB, Pfaff ER, Qian Z, Qureshi N, Russell S, Spratt H, Walden A, Williams AE, Wooldridge JT, Yoo YJ, Zhang XT, Zhu RL, Austin CP, Saltz JH, Gersing KR, Haendel MA, and Chute CG
- Subjects
- Adult, Aged, Aged, 80 and over, Comorbidity, Ethnicity, Extracorporeal Membrane Oxygenation, Female, Humans, Hydrogen-Ion Concentration, Male, Middle Aged, Pandemics, Respiration, Artificial, Retrospective Studies, Risk Factors, SARS-CoV-2, United States, Young Adult, COVID-19 ethnology, COVID-19 mortality, Databases, Factual, Forecasting, Hospitalization, Models, Biological, Severity of Illness Index
- Abstract
Importance: The National COVID Cohort Collaborative (N3C) is a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative COVID-19 cohort to date. This multicenter data set can support robust evidence-based development of predictive and diagnostic tools and inform clinical care and policy., Objectives: To evaluate COVID-19 severity and risk factors over time and assess the use of machine learning to predict clinical severity., Design, Setting, and Participants: In a retrospective cohort study of 1 926 526 US adults with SARS-CoV-2 infection (polymerase chain reaction >99% or antigen <1%) and adult patients without SARS-CoV-2 infection who served as controls from 34 medical centers nationwide between January 1, 2020, and December 7, 2020, patients were stratified using a World Health Organization COVID-19 severity scale and demographic characteristics. Differences between groups over time were evaluated using multivariable logistic regression. Random forest and XGBoost models were used to predict severe clinical course (death, discharge to hospice, invasive ventilatory support, or extracorporeal membrane oxygenation)., Main Outcomes and Measures: Patient demographic characteristics and COVID-19 severity using the World Health Organization COVID-19 severity scale and differences between groups over time using multivariable logistic regression., Results: The cohort included 174 568 adults who tested positive for SARS-CoV-2 (mean [SD] age, 44.4 [18.6] years; 53.2% female) and 1 133 848 adult controls who tested negative for SARS-CoV-2 (mean [SD] age, 49.5 [19.2] years; 57.1% female). Of the 174 568 adults with SARS-CoV-2, 32 472 (18.6%) were hospitalized, and 6565 (20.2%) of those had a severe clinical course (invasive ventilatory support, extracorporeal membrane oxygenation, death, or discharge to hospice). Of the hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March to April 2020 to 8.6% in September to October 2020 (P = .002 for monthly trend). Using 64 inputs available on the first hospital day, this study predicted a severe clinical course using random forest and XGBoost models (area under the receiver operating curve = 0.87 for both) that were stable over time. The factor most strongly associated with clinical severity was pH; this result was consistent across machine learning methods. In a separate multivariable logistic regression model built for inference, age (odds ratio [OR], 1.03 per year; 95% CI, 1.03-1.04), male sex (OR, 1.60; 95% CI, 1.51-1.69), liver disease (OR, 1.20; 95% CI, 1.08-1.34), dementia (OR, 1.26; 95% CI, 1.13-1.41), African American (OR, 1.12; 95% CI, 1.05-1.20) and Asian (OR, 1.33; 95% CI, 1.12-1.57) race, and obesity (OR, 1.36; 95% CI, 1.27-1.46) were independently associated with higher clinical severity., Conclusions and Relevance: This cohort study found that COVID-19 mortality decreased over time during 2020 and that patient demographic characteristics and comorbidities were associated with higher clinical severity. The machine learning models accurately predicted ultimate clinical severity using commonly collected clinical data from the first 24 hours of a hospital admission.
- Published
- 2021
- Full Text
- View/download PDF
144. Challenges in defining Long COVID: Striking differences across literature, Electronic Health Records, and patient-reported information.
- Author
-
Rando HM, Bennett TD, Byrd JB, Bramante C, Callahan TJ, Chute CG, Davis HE, Deer R, Gagnier J, Koraishy FM, Liu F, McMurry JA, Moffitt RA, Pfaff ER, Reese JT, Relevo R, Robinson PN, Saltz JH, Solomonides A, Sule A, Topaloglu U, and Haendel MA
- Abstract
Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. The worldwide scientific community is forging ahead to characterize a wide range of outcomes associated with SARS-CoV-2 infection; however the underlying assumptions in these studies have varied so widely that the resulting data are difficult to compareFormal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. Even the condition itself goes by three terms, most widely "Long COVID", but also "COVID-19 syndrome (PACS)" or, "post-acute sequelae of SARS-CoV-2 infection (PASC)". In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic itself. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat., Competing Interests: Declaration of Conflicts of Interest Julie A. McMurry: Cofounder, Pryzm Health; Melissa A. Haendel: co-founder Pryzm Health
- Published
- 2021
- Full Text
- View/download PDF
145. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.
- Author
-
Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, Payne PRO, Pfaff ER, Robinson PN, Saltz JH, Spratt H, Suver C, Wilbanks J, Wilcox AB, Williams AE, Wu C, Blacketer C, Bradford RL, Cimino JJ, Clark M, Colmenares EW, Francis PA, Gabriel D, Graves A, Hemadri R, Hong SS, Hripscak G, Jiao D, Klann JG, Kostka K, Lee AM, Lehmann HP, Lingrey L, Miller RT, Morris M, Murphy SN, Natarajan K, Palchuk MB, Sheikh U, Solbrig H, Visweswaran S, Walden A, Walters KM, Weber GM, Zhang XT, Zhu RL, Amor B, Girvin AT, Manna A, Qureshi N, Kurilla MG, Michael SG, Portilla LM, Rutter JL, Austin CP, and Gersing KR
- Subjects
- Computer Security, Data Analysis, Ethics Committees, Research, Government Regulation, Humans, National Institutes of Health (U.S.), United States, COVID-19, Data Science organization & administration, Information Dissemination, Intersectoral Collaboration
- Abstract
Objective: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers., Materials and Methods: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics., Results: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access., Conclusions: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19., (© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2021
- Full Text
- View/download PDF
146. The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction.
- Author
-
Bennett TD, Moffitt RA, Hajagos JG, Amor B, Anand A, Bissell MM, Bradwell KR, Bremer C, Byrd JB, Denham A, DeWitt PE, Gabriel D, Garibaldi BT, Girvin AT, Guinney J, Hill EL, Hong SS, Jimenez H, Kavuluru R, Kostka K, Lehmann HP, Levitt E, Mallipattu SK, Manna A, McMurry JA, Morris M, Muschelli J, Neumann AJ, Palchuk MB, Pfaff ER, Qian Z, Qureshi N, Russell S, Spratt H, Walden A, Williams AE, Wooldridge JT, Yoo YJ, Zhang XT, Zhu RL, Austin CP, Saltz JH, Gersing KR, Haendel MA, and Chute CG
- Abstract
Background: The majority of U.S. reports of COVID-19 clinical characteristics, disease course, and treatments are from single health systems or focused on one domain. Here we report the creation of the National COVID Cohort Collaborative (N3C), a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative U.S. cohort of COVID-19 cases and controls to date. This multi-center dataset supports robust evidence-based development of predictive and diagnostic tools and informs critical care and policy., Methods and Findings: In a retrospective cohort study of 1,926,526 patients from 34 medical centers nationwide, we stratified patients using a World Health Organization COVID-19 severity scale and demographics; we then evaluated differences between groups over time using multivariable logistic regression. We established vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics. The cohort included 174,568 adults with severe acute respiratory syndrome associated with SARS-CoV-2 (PCR >99% or antigen <1%) as well as 1,133,848 adult patients that served as lab-negative controls. Among 32,472 hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March/April 2020 to 8.6% in September/October 2020 (p = 0.002 monthly trend). In a multivariable logistic regression model, age, male sex, liver disease, dementia, African-American and Asian race, and obesity were independently associated with higher clinical severity. To demonstrate the utility of the N3C cohort for analytics, we used machine learning (ML) to predict clinical severity and risk factors over time. Using 64 inputs available on the first hospital day, we predicted a severe clinical course (death, discharge to hospice, invasive ventilation, or extracorporeal membrane oxygenation) using random forest and XGBoost models (AUROC 0.86 and 0.87 respectively) that were stable over time. The most powerful predictors in these models are patient age and widely available vital sign and laboratory values. The established expected trajectories for many vital signs and laboratory values among patients with different clinical severities validates observations from smaller studies, and provides comprehensive insight into COVID-19 characterization in U.S. patients., Conclusions: This is the first description of an ongoing longitudinal observational study of patients seen in diverse clinical settings and geographical regions and is the largest COVID-19 cohort in the United States. Such data are the foundation for ML models that can be the basis for generalizable clinical decision support tools. The N3C Data Enclave is unique in providing transparent, reproducible, easily shared, versioned, and fully auditable data and analytic provenance for national-scale patient-level EHR data. The N3C is built for intensive ML analyses by academic, industry, and citizen scientists internationally. Many observational correlations can inform trial designs and care guidelines for this new disease., Competing Interests: Declaration of interests Benjamin Amor, Katie Rebecca Bradwell, Andrew T. Girvin, Amin Manna, and Nabeel Qureshi: employee of Palantir Technologies; Brian T. Garibaldi: Member of the FDA Pulmonary-Allergy Drugs Advisory Committee (PADAC); Matvey B. Palchuk: employee of TriNetX; Kristin Kostka: employee of IQVIA Inc.; Julie A. McMurry: and Melissa A. Haendel Cofounders of Pryzm Health; Chris P. Austin and Ken R. Gersing, employees of the National Institutes of Health. No conflicts of interest reported for all other authors.
- Published
- 2021
- Full Text
- View/download PDF
147. Electronic Medical Record Search Engine (EMERSE): An Information Retrieval Tool for Supporting Cancer Research.
- Author
-
Hanauer DA, Barnholtz-Sloan JS, Beno MF, Del Fiol G, Durbin EB, Gologorskaya O, Harris D, Harnett B, Kawamoto K, May B, Meeks E, Pfaff E, Weiss J, and Zheng K
- Subjects
- Electronic Health Records, Humans, Information Storage and Retrieval, Natural Language Processing, Software, Neoplasms therapy, Search Engine
- Abstract
Purpose: The Electronic Medical Record Search Engine (EMERSE) is a software tool built to aid research spanning cohort discovery, population health, and data abstraction for clinical trials. EMERSE is now live at three academic medical centers, with additional sites currently working on implementation. In this report, we describe how EMERSE has been used to support cancer research based on a variety of metrics., Methods: We identified peer-reviewed publications that used EMERSE through online searches as well as through direct e-mails to users based on audit logs. These logs were also used to summarize use at each of the three sites. Search terms for two of the sites were characterized using the natural language processing tool MetaMap to determine to which semantic types the terms could be mapped., Results: We identified a total of 326 peer-reviewed publications that used EMERSE through August 2019, although this is likely an underestimation of the true total based on the use log analysis. Oncology-related research comprised nearly one third (n = 105; 32.2%) of all research output. The use logs showed that EMERSE had been used by multiple people at each site (nearly 3,500 across all three) who had collectively logged into the system > 100,000 times. Many user-entered search queries could not be mapped to a semantic type, but the most common semantic type for terms that did match was "disease or syndrome," followed by "pharmacologic substance.", Conclusion: EMERSE has been shown to be a valuable tool for supporting cancer research. It has been successfully deployed at other sites, despite some implementation challenges unique to each deployment environment.
- Published
- 2020
- Full Text
- View/download PDF
148. Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning.
- Author
-
Pfaff ER, Crosskey M, Morton K, and Krishnamurthy A
- Abstract
Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient's medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning-based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning-based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods., (©Emily R Pfaff, Miles Crosskey, Kenneth Morton, Ashok Krishnamurthy. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 24.01.2020.)
- Published
- 2020
- Full Text
- View/download PDF
149. Fast Healthcare Interoperability Resources (FHIR) as a Meta Model to Integrate Common Data Models: Development of a Tool and Quantitative Validation Study.
- Author
-
Pfaff ER, Champion J, Bradford RL, Clark M, Xu H, Fecho K, Krishnamurthy A, Cox S, Chute CG, Overby Taylor C, and Ahalt S
- Abstract
Background: In a multisite clinical research collaboration, institutions may or may not use the same common data model (CDM) to store clinical data. To overcome this challenge, we proposed to use Health Level 7's Fast Healthcare Interoperability Resources (FHIR) as a meta-CDM-a single standard to represent clinical data., Objective: In this study, we aimed to create an open-source application termed the Clinical Asset Mapping Program for FHIR (CAMP FHIR) to efficiently transform clinical data to FHIR for supporting source-agnostic CDM-to-FHIR mapping., Methods: Mapping with CAMP FHIR involves (1) mapping each source variable to its corresponding FHIR element and (2) mapping each item in the source data's value sets to the corresponding FHIR value set item for variables with strict value sets. To date, CAMP FHIR has been used to transform 108 variables from the Informatics for Integrating Biology & the Bedside (i2b2) and Patient-Centered Outcomes Research Network data models to fields across 7 FHIR resources. It is designed to allow input from any source data model and will support additional FHIR resources in the future., Results: We have used CAMP FHIR to transform data on approximately 23,000 patients with asthma from our institution's i2b2 database. Data quality and integrity were validated against the origin point of the data, our enterprise clinical data warehouse., Conclusions: We believe that CAMP FHIR can serve as an alternative to implementing new CDMs on a project-by-project basis. Moreover, the use of FHIR as a CDM could support rare data sharing opportunities, such as collaborations between academic medical centers and community hospitals. We anticipate adoption and use of CAMP FHIR to foster sharing of clinical data across institutions for downstream applications in translational research., (©Emily Rose Pfaff, James Champion, Robert Louis Bradford, Marshall Clark, Hao Xu, Karamarie Fecho, Ashok Krishnamurthy, Steven Cox, Christopher G Chute, Casey Overby Taylor, Stan Ahalt. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 16.10.2019.)
- Published
- 2019
- Full Text
- View/download PDF
150. Clinical Data: Sources and Types, Regulatory Constraints, Applications.
- Author
-
Ahalt SC, Chute CG, Fecho K, Glusman G, Hadlock J, Taylor CO, Pfaff ER, Robinson PN, Solbrig H, Ta C, Tatonetti N, and Weng C
- Subjects
- Databases as Topic, Humans, Data Analysis, Social Control, Formal
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.