Author: "Pfaff, Emily" / Topic: algorithms - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Pfaff, Emily"' showing total 5 results

Start Over Author "Pfaff, Emily" Topic algorithms

5 results on '"Pfaff, Emily"'

1. FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data.

Author: Buranosky M, Stellnberger E, Pfaff E, Diaz-Sanchez D, and Ward-Caviness C
Subjects: Datasets as Topic, Algorithms, Data Mining, Programming Languages
Abstract: Functional dependencies (FDs) and candidate keys are essential for table decomposition, database normalization, and data cleansing. In this paper, we present FDTool, a command line Python application to discover minimal FDs in tabular datasets and infer equivalent attribute sets and candidate keys from them. The runtime and memory costs associated with seven published FD discovery algorithms are given with an overview of their theoretical foundations. We conclude that FD_Mine is the most efficient FD discovery algorithm when applied to datasets with many rows (> 100,000 rows) and few columns (< 14 columns). This puts it in a special position to rule mine clinical and demographic datasets, which often consist of long and narrow sets of participant records. The structure of FD Mine is described and supplemented with a formal proof of the equivalence pruning method used. FDTool is a re-implementation of FD Mine with additional features added to improve performance and automate typical processes in database architecture. The experimental results of applying FDTool to 12 datasets of different dimensions are summarized in terms of the number of FDs checked, the number of FDs found, and the time it takes for the code to terminate. We find that the number of attributes in a dataset has a much greater effect on the runtime and memory costs of FDTool than does row count. The last section explains in detail how the FDTool application can be accessed, executed, and further developed., Competing Interests: No competing interests were disclosed.
Published: 2018
Full Text: View/download PDF

2. Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.

Author: Hoffman SR, Vines AI, Halladay JR, Pfaff E, Schiff L, Westreich D, Sundaresan A, Johnson LS, and Nicholson WK
Subjects: Adolescent, Adult, Biomedical Research, Current Procedural Terminology, Data Collection methods, Female, Humans, Infertility, Female etiology, Infertility, Female physiopathology, International Classification of Diseases, Leiomyoma complications, Menorrhagia etiology, Menorrhagia physiopathology, Middle Aged, Pelvic Pain etiology, Pelvic Pain physiopathology, Phenotype, Uterine Neoplasms complications, Young Adult, Algorithms, Electronic Health Records, Leiomyoma physiopathology, Uterine Neoplasms physiopathology
Abstract: Background: Women with symptomatic uterine fibroids can report a myriad of symptoms, including pain, bleeding, infertility, and psychosocial sequelae. Optimizing fibroid research requires the ability to enroll populations of women with image-confirmed symptomatic uterine fibroids., Objective: Our objective was to develop an electronic health record-based algorithm to identify women with symptomatic uterine fibroids for a comparative effectiveness study of medical or surgical treatments on quality-of-life measures. Using an iterative process and text-mining techniques, an effective computable phenotype algorithm, composed of demographics, and clinical and laboratory characteristics, was developed with reasonable performance. Such algorithms provide a feasible, efficient way to identify populations of women with symptomatic uterine fibroids for the conduct of large traditional or pragmatic trials and observational comparative effectiveness studies. Symptomatic uterine fibroids, due to menorrhagia, pelvic pain, bulk symptoms, or infertility, are a source of substantial morbidity for reproductive-age women. Comparing Treatment Options for Uterine Fibroids is a multisite registry study to compare the effectiveness of hormonal or surgical fibroid treatments on women's perceptions of their quality of life. Electronic health record-based algorithms are able to identify large numbers of women with fibroids, but additional work is needed to develop electronic health record algorithms that can identify women with symptomatic fibroids to optimize fibroid research. We sought to develop an efficient electronic health record-based algorithm that can identify women with symptomatic uterine fibroids in a large health care system for recruitment into large-scale observational and interventional research in fibroid management., Study Design: We developed and assessed the accuracy of 3 algorithms to identify patients with symptomatic fibroids using an iterative approach. The data source was the Carolina Data Warehouse for Health, a repository for the health system's electronic health record data. In addition to International Classification of Diseases, Ninth Revision diagnosis and procedure codes and clinical characteristics, text data-mining software was used to derive information from imaging reports to confirm the presence of uterine fibroids. Results of each algorithm were compared with expert manual review to calculate the positive predictive values for each algorithm., Results: Algorithm 1 was composed of the following criteria: (1) age 18-54 years; (2) either ≥1 International Classification of Diseases, Ninth Revision diagnosis codes for uterine fibroids or mention of fibroids using text-mined key words in imaging records or documents; and (3) no International Classification of Diseases, Ninth Revision or Current Procedural Terminology codes for hysterectomy and no reported history of hysterectomy. The positive predictive value was 47% (95% confidence interval 39-56%). Algorithm 2 required ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids and positive text-mined key words and had a positive predictive value of 65% (95% confidence interval 50-79%). In algorithm 3, further refinements included ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids on separate outpatient visit dates, the exclusion of women who had a positive pregnancy test within 3 months of their fibroid-related visit, and exclusion of incidentally detected fibroids during prenatal or emergency department visits. Algorithm 3 achieved a positive predictive value of 76% (95% confidence interval 71-81%)., Conclusion: An electronic health record-based algorithm is capable of identifying cases of symptomatic uterine fibroids with moderate positive predictive value and may be an efficient approach for large-scale study recruitment., (Copyright © 2018 Elsevier Inc. All rights reserved.)
Published: 2018
Full Text: View/download PDF

3. An efficient approach for surveillance of childhood diabetes by type derived from electronic health record data: the SEARCH for Diabetes in Youth Study.

Author: Zhong VW, Obeid JS, Craig JB, Pfaff ER, Thomas J, Jaacks LM, Beavers DP, Carey TS, Lawrence JM, Dabelea D, Hamman RF, Bowlby DA, Pihoker C, Saydah SH, and Mayer-Davis EJ
Subjects: Adolescent, Child, Child, Preschool, Clinical Coding, Female, Humans, Infant, Male, Sensitivity and Specificity, Young Adult, Algorithms, Diabetes Mellitus, Type 1 classification, Diabetes Mellitus, Type 2 classification, Electronic Health Records, Population Surveillance methods
Abstract: Objective: To develop an efficient surveillance approach for childhood diabetes by type across 2 large US health care systems, using phenotyping algorithms derived from electronic health record (EHR) data., Materials and Methods: Presumptive diabetes cases <20 years of age from 2 large independent health care systems were identified as those having ≥1 of the 5 indicators in the past 3.5 years, including elevated HbA1c, elevated blood glucose, diabetes-related billing codes, patient problem list, and outpatient anti-diabetic medications. EHRs of all the presumptive cases were manually reviewed, and true diabetes status and diabetes type were determined. Algorithms for identifying diabetes cases overall and classifying diabetes type were either prespecified or derived from classification and regression tree analysis. Surveillance approach was developed based on the best algorithms identified., Results: We developed a stepwise surveillance approach using billing code-based prespecified algorithms and targeted manual EHR review, which efficiently and accurately ascertained and classified diabetes cases by type, in both health care systems. The sensitivity and positive predictive values in both systems were approximately ≥90% for ascertaining diabetes cases overall and classifying cases with type 1 or type 2 diabetes. About 80% of the cases with "other" type were also correctly classified. This stepwise surveillance approach resulted in a >70% reduction in the number of cases requiring manual validation compared to traditional surveillance methods., Conclusion: EHR data may be used to establish an efficient approach for large-scale surveillance for childhood diabetes by type, although some manual effort is still needed., (Published by Oxford University Press on behalf of the American Medical Informatics Association 2016. This work is written by US Government employees and is in the public domain in the United States.)
Published: 2016
Full Text: View/download PDF

4. Use of administrative and electronic health record data for development of automated algorithms for childhood diabetes case ascertainment and type classification: the SEARCH for Diabetes in Youth Study.

Author: Zhong VW, Pfaff ER, Beavers DP, Thomas J, Jaacks LM, Bowlby DA, Carey TS, Lawrence JM, Dabelea D, Hamman RF, Pihoker C, Saydah SH, and Mayer-Davis EJ
Subjects: Adolescent, Adult, Child, Child, Preschool, Diabetes Mellitus, Type 1 epidemiology, Diabetes Mellitus, Type 2 epidemiology, Female, Humans, Infant, Infant, Newborn, Male, Mass Screening methods, Young Adult, Algorithms, Diabetes Mellitus, Type 1 classification, Diabetes Mellitus, Type 1 diagnosis, Diabetes Mellitus, Type 2 classification, Diabetes Mellitus, Type 2 diagnosis, Electronic Health Records standards
Abstract: Background: The performance of automated algorithms for childhood diabetes case ascertainment and type classification may differ by demographic characteristics., Objective: This study evaluated the potential of administrative and electronic health record (EHR) data from a large academic care delivery system to conduct diabetes case ascertainment in youth according to type, age, and race/ethnicity., Subjects: Of 57 767 children aged <20 yr as of 31 December 2011 seen at University of North Carolina Health Care System in 2011 were included., Methods: Using an initial algorithm including billing data, patient problem lists, laboratory test results, and diabetes related medications between 1 July 2008 and 31 December 2011, presumptive cases were identified and validated by chart review. More refined algorithms were evaluated by type (type 1 vs. type 2), age (<10 vs. ≥10 yr) and race/ethnicity (non-Hispanic White vs. 'other'). Sensitivity, specificity, and positive predictive value were calculated and compared., Results: The best algorithm for ascertainment of overall diabetes cases was billing data. The best type 1 algorithm was the ratio of the number of type 1 billing codes to the sum of type 1 and type 2 billing codes ≥0.5. A useful algorithm to ascertain youth with type 2 diabetes with 'other' race/ethnicity was identified. Considerable age and racial/ethnic differences were present in type-non-specific and type 2 algorithms., Conclusions: Administrative and EHR data may be used to identify cases of childhood diabetes (any type), and to identify type 1 cases. The performance of type 2 case ascertainment algorithms differed substantially by race/ethnicity., (© 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.)
Published: 2014
Full Text: View/download PDF

5. Crowd-sourced machine learning prediction of long COVID using data from the National COVID Cohort Collaborative.

Author: Bergquist, Timothy, Loomba, Johanna, Pfaff, Emily, Xia, Fangfang, Zhao, Zixuan, Zhu, Yitan, Mitchell, Elliot, Bhattacharya, Biplab, Shetty, Gaurav, Munia, Tamanna, Delong, Grant, Tariq, Adbul, Butzin-Dozier, Zachary, Ji, Yunwen, Li, Haodong, Coyle, Jeremy, Shi, Seraphina, Philips, Rachael, Mertens, Andrew, Pirracchio, Romain, van der Laan, Mark, Colford, John, Hubbard, Alan, Gao, Jifan, Chen, Guanhua, Velingker, Neelay, Li, Ziyang, Wu, Yinjun, Stein, Adam, Huang, Jiani, Dai, Zongyu, Long, Qi, Naik, Mayur, Holmes, John, Mowery, Danielle, Wong, Eric, Parekh, Ravi, Getzen, Emily, Hightower, Jake, and Blase, Jennifer
Subjects: COVID-19, Community challenge, Evaluation, Long COVID, Machine learning, PASC, Humans, COVID-19, Machine Learning, SARS-CoV-2, United States, Algorithms, Post-Acute COVID-19 Syndrome, Cohort Studies, Crowdsourcing
Abstract: BACKGROUND: While many patients seem to recover from SARS-CoV-2 infections, many patients report experiencing SARS-CoV-2 symptoms for weeks or months after their acute COVID-19 ends, even developing new symptoms weeks after infection. These long-term effects are called post-acute sequelae of SARS-CoV-2 (PASC) or, more commonly, Long COVID. The overall prevalence of Long COVID is currently unknown, and tools are needed to help identify patients at risk for developing long COVID. METHODS: A working group of the Rapid Acceleration of Diagnostics-radical (RADx-rad) program, comprised of individuals from various NIH institutes and centers, in collaboration with REsearching COVID to Enhance Recovery (RECOVER) developed and organized the Long COVID Computational Challenge (L3C), a community challenge aimed at incentivizing the broader scientific community to develop interpretable and accurate methods for identifying patients at risk of developing Long COVID. From August 2022 to December 2022, participants developed Long COVID risk prediction algorithms using the National COVID Cohort Collaborative (N3C) data enclave, a harmonized data repository from over 75 healthcare institutions from across the United States (U.S.). FINDINGS: Over the course of the challenge, 74 teams designed and built 35 Long COVID prediction models using the N3C data enclave. The top 10 teams all scored above a 0.80 Area Under the Receiver Operator Curve (AUROC) with the highest scoring model achieving a mean AUROC of 0.895. Included in the top submission was a visualization dashboard that built timelines for each patient, updating the risk of a patient developing Long COVID in response to clinical events. INTERPRETATION: As a result of L3C, federal reviewers identified multiple machine learning models that can be used to identify patients at risk for developing Long COVID. Many of the teams used approaches in their submissions which can be applied to future clinical prediction questions. FUNDING: Research reported in this RADx® Rad publication was supported by the National Institutes of Health. Timothy Bergquist, Johanna Loomba, and Emily Pfaff were supported by Axle Subcontract: NCATS-STSS-P00438.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Pfaff, Emily"'

1. FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data.

2. Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.

3. An efficient approach for surveillance of childhood diabetes by type derived from electronic health record data: the SEARCH for Diabetes in Youth Study.

4. Use of administrative and electronic health record data for development of automated algorithms for childhood diabetes case ascertainment and type classification: the SEARCH for Diabetes in Youth Study.

5. Crowd-sourced machine learning prediction of long COVID using data from the National COVID Cohort Collaborative.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

5 results on '"Pfaff, Emily"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources