Author: "Goldwasser, Jeremy" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Goldwasser, Jeremy"' showing total 10 results

Start Over Author "Goldwasser, Jeremy"

10 results on '"Goldwasser, Jeremy"'

1. Provably Stable Feature Rankings with SHAP and LIME

Author: Goldwasser, Jeremy and Hooker, Giles
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Feature attributions are ubiquitous tools for understanding the predictions of machine learning models. However, the calculation of popular methods for scoring input variables such as SHAP and LIME suffers from high instability due to random sampling. Leveraging ideas from multiple hypothesis testing, we devise attribution methods that ensure the most important features are ranked correctly with high probability. Given SHAP estimates from KernelSHAP or Shapley Sampling, we demonstrate how to retrospectively verify the number of stable rankings. Further, we introduce efficient sampling algorithms for SHAP and LIME that guarantee the $K$ highest-ranked features have the proper ordering. Finally, we show how to adapt these local feature attribution methods for the global importance setting.
Published: 2024

2. Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

Author: Yang, Rui, Zeng, Qingcheng, You, Keen, Qiao, Yujie, Huang, Lucas, Hsieh, Chia-Chun, Rosand, Benjamin, Goldwasser, Jeremy, Dave, Amisha D, Keenan, Tiarnan D. L., Chew, Emily Y, Radev, Dragomir, Lu, Zhiyong, Xu, Hua, Chen, Qingyu, and Li, Irene
Subjects: Computer Science - Computation and Language
Abstract: This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. The toolkit, its models, and associated data are publicly available via https://github.com/Yale-LILY/MedGen., Comment: 5 figures, 4 tables
Published: 2023

3. Stabilizing Estimates of Shapley Values with Control Variates

Author: Goldwasser, Jeremy and Hooker, Giles
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applicable to any machine learning model and requires virtually no extra computation or modeling effort. On several high-dimensional datasets, we find it can produce dramatic reductions in the Monte Carlo variability of Shapley estimates.
Published: 2023

4. Stabilizing Estimates of Shapley Values with Control Variates

Author: Goldwasser, Jeremy, Hooker, Giles, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Longo, Luca, editor, Lapuschkin, Sebastian, editor, and Seifert, Christin, editor
Published: 2024
Full Text: View/download PDF

5. EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts

Author: Li, Irene, You, Keen, Qiao, Yujie, Huang, Lucas, Hsieh, Chia-Chun, Rosand, Benjamin, Goldwasser, Jeremy, and Radev, Dragomir
Subjects: Computer Science - Computation and Language
Abstract: The Electronic Health Record (EHR) is an essential part of the modern medical system and impacts healthcare delivery, operations, and research. Unstructured text is attracting much attention despite structured information in the EHRs and has become an exciting research field. The success of the recent neural Natural Language Processing (NLP) method has led to a new direction for processing unstructured clinical notes. In this work, we create a python library for clinical texts, EHRKit. This library contains two main parts: MIMIC-III-specific functions and tasks specific functions. The first part introduces a list of interfaces for accessing MIMIC-III NOTEEVENTS data, including basic search, information retrieval, and information extraction. The second part integrates many third-party libraries for up to 12 off-shelf NLP tasks such as named entity recognition, summarization, machine translation, etc.
Published: 2022

6. Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review

Author: Li, Irene, Pan, Jessica, Goldwasser, Jeremy, Verma, Neha, Wong, Wai Pan, Nuzumlalı, Muhammed Yavuz, Rosand, Benjamin, Li, Yixin, Zhang, Matthew, Chang, David, Taylor, R. Andrew, Krumholz, Harlan M., and Radev, Dragomir
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, 68T50, I.2.7
Abstract: Electronic health records (EHRs), digital collections of patient healthcare events and observations, are ubiquitous in medicine and critical to healthcare delivery, operations, and research. Despite this central role, EHRs are notoriously difficult to process automatically. Well over half of the information stored within EHRs is in the form of unstructured text (e.g. provider notes, operation reports) and remains largely untapped for secondary use. Recently, however, newer neural network and deep learning approaches to Natural Language Processing (NLP) have made considerable advances, outperforming traditional statistical and rule-based systems on a variety of tasks. In this survey paper, we summarize current neural NLP methods for EHR applications. We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics such as question answering, phenotyping, knowledge graphs, medical dialogue, multilinguality, interpretability, etc., Comment: 33 pages, 11 figures
Published: 2021

7. Forest Fire Clustering for Single-cell Sequencing with Iterative Label Propagation and Parallelized Monte Carlo Simulation

Author: Chen, Zhanlin, Goldwasser, Jeremy, Tuckman, Philip, Liu, Jason, Zhang, Jing, and Gerstein, Mark
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In the era of single-cell sequencing, there is a growing need to extract insights from data with clustering methods. Here, we introduce Forest Fire Clustering, an efficient and interpretable method for cell-type discovery from single-cell data. Forest Fire Clustering makes minimal prior assumptions and, different from current approaches, calculates a non-parametric posterior probability that each cell is assigned a cell-type label. These posterior distributions allow for the evaluation of a label confidence for each cell and enable the computation of "label entropies," highlighting transitions along developmental trajectories. Furthermore, we show that Forest Fire Clustering can make robust, inductive inferences in an online-learning context and can readily scale to millions of cells. Finally, we demonstrate that our method outperforms state-of-the-art clustering approaches on diverse benchmarks of simulated and experimental data. Overall, Forest Fire Clustering is a useful tool for rare cell type discovery in large-scale single-cell analysis., Comment: 30 pages, 6 figures
Published: 2021
Full Text: View/download PDF

8. Neural Natural Language Processing for unstructured data in electronic health records: A review

Author: Li, Irene, Pan, Jessica, Goldwasser, Jeremy, Verma, Neha, Wong, Wai Pan, Nuzumlalı, Muhammed Yavuz, Rosand, Benjamin, Li, Yixin, Zhang, Matthew, Chang, David, Taylor, R. Andrew, Krumholz, Harlan M., and Radev, Dragomir
Published: 2022
Full Text: View/download PDF

9. Forest Fire Clustering for single-cell sequencing combines iterative label propagation with parallelized Monte Carlo simulations

Author: Chen, Zhanlin, Goldwasser, Jeremy, Tuckman, Philip, Liu, Jason, Zhang, Jing, and Gerstein, Mark
Published: 2022
Full Text: View/download PDF

10. Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.

Author: Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh CC, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, and Li I
Subjects: Humans, Algorithms, Software, Natural Language Processing
Abstract: Background: Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings., Objective: This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases., Methods: We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics., Results: The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5)., Conclusions: This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face., (©Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha Dave, Tiarnan Keenan, Yuhe Ke, Chuan Hong, Nan Liu, Emily Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.10.2024.)
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"Goldwasser, Jeremy"'

1. Provably Stable Feature Rankings with SHAP and LIME

2. Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

3. Stabilizing Estimates of Shapley Values with Control Variates

4. Stabilizing Estimates of Shapley Values with Control Variates

5. EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts

6. Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review

7. Forest Fire Clustering for Single-cell Sequencing with Iterative Label Propagation and Parallelized Monte Carlo Simulation

8. Neural Natural Language Processing for unstructured data in electronic health records: A review

9. Forest Fire Clustering for single-cell sequencing combines iterative label propagation with parallelized Monte Carlo simulations

10. Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

10 results on '"Goldwasser, Jeremy"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources