Author: "Hartvigsen, Thomas" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hartvigsen, Thomas"' showing total 100 results

Start Over Author "Hartvigsen, Thomas"

100 results on '"Hartvigsen, Thomas"'

1. BendVLM: Test-Time Debiasing of Vision-Language Embeddings

Author: Gerych, Walter, Zhang, Haoran, Hamidieh, Kimia, Pan, Eileen, Sharma, Maanas, Hartvigsen, Thomas, and Ghassemi, Marzyeh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Vision-language model (VLM) embeddings have been shown to encode biases present in their training data, such as societal biases that prescribe negative characteristics to members of various racial and gender identities. VLMs are being quickly adopted for a variety of tasks ranging from few-shot classification to text-guided image generation, making debiasing VLM embeddings crucial. Debiasing approaches that fine-tune the VLM often suffer from catastrophic forgetting. On the other hand, fine-tuning-free methods typically utilize a "one-size-fits-all" approach that assumes that correlation with the spurious attribute can be explained using a single linear direction across all possible inputs. In this work, we propose Bend-VLM, a nonlinear, fine-tuning-free approach for VLM embedding debiasing that tailors the debiasing operation to each unique input. This allows for a more flexible debiasing approach. Additionally, we do not require knowledge of the set of inputs a priori to inference time, making our method more appropriate for online, open-set tasks such as retrieval and text guided image generation.
Published: 2024

2. Identifying Implicit Social Biases in Vision-Language Models

Author: Hamidieh, Kimia, Zhang, Haoran, Gerych, Walter, Hartvigsen, Thomas, and Ghassemi, Marzyeh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Vision-language models, like CLIP (Contrastive Language Image Pretraining), are becoming increasingly popular for a wide range of multimodal retrieval tasks. However, prior work has shown that large language and deep vision models can learn historical biases contained in their training sets, leading to perpetuation of stereotypes and potential downstream harm. In this work, we conduct a systematic analysis of the social biases that are present in CLIP, with a focus on the interaction between image and text modalities. We first propose a taxonomy of social biases called So-B-IT, which contains 374 words categorized across ten types of bias. Each type can lead to societal harm if associated with a particular demographic group. Using this taxonomy, we examine images retrieved by CLIP from a facial image dataset using each word as part of a prompt. We find that CLIP frequently displays undesirable associations between harmful words and specific demographic groups, such as retrieving mostly pictures of Middle Eastern men when asked to retrieve images of a "terrorist". Finally, we conduct an analysis of the source of such biases, by showing that the same harmful stereotypes are also present in a large image-text dataset used to train CLIP models for examples of biases that we find. Our findings highlight the importance of evaluating and addressing bias in vision-language models, and suggest the need for transparency and fairness-aware curation of large pre-training datasets.
Published: 2024

3. SkipSNN: Efficiently Classifying Spike Trains with Event-attention

Author: Yin, Hang, Su, Yao, Liu, Liping, Hartvigsen, Thomas, Dai, Xin, and Kong, Xiangnan
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Spike train classification has recently become an important topic in the machine learning community, where each spike train is a binary event sequence with \emph{temporal-sparsity of signals of interest} and \emph{temporal-noise} properties. A promising model for it should follow the design principle of performing intensive computation only when signals of interest appear. So such tasks use mainly Spiking Neural Networks (SNNs) due to their consideration of temporal-sparsity of spike trains. However, the basic mechanism of SNNs ignore the temporal-noise issue, which makes them computationally expensive and thus high power consumption for analyzing spike trains on resource-constrained platforms. As an event-driven model, an SNN neuron makes a reaction given any input signals, making it difficult to quickly find signals of interest. In this paper, we introduce an event-attention mechanism that enables SNNs to dynamically highlight useful signals of the original spike trains. To this end, we propose SkipSNN, which extends existing SNN models by learning to mask out noise by skipping membrane potential updates and shortening the effective size of the computational graph. This process is analogous to how people choose to open and close their eyes to filter the information they see. We evaluate SkipSNN on various neuromorphic tasks and demonstrate that it achieves significantly better computational efficiency and classification accuracy than other state-of-the-art SNNs., Comment: Published as a research paper at IEEE BigData 2024
Published: 2024

4. Offline Reinforcement Learning With Combinatorial Action Spaces

Author: Landers, Matthew, Killian, Taylor W., Barnes, Hugo, Hartvigsen, Thomas, and Doryab, Afsaneh
Subjects: Computer Science - Machine Learning
Abstract: Reinforcement learning problems often involve large action spaces arising from the simultaneous execution of multiple sub-actions, resulting in combinatorial action spaces. Learning in combinatorial action spaces is difficult due to the exponential growth in action space size with the number of sub-actions and the dependencies among these sub-actions. In offline settings, this challenge is compounded by limited and suboptimal data. Current methods for offline learning in combinatorial spaces simplify the problem by assuming sub-action independence. We propose Branch Value Estimation (BVE), which effectively captures sub-action dependencies and scales to large combinatorial spaces by learning to evaluate only a small subset of actions at each timestep. Our experiments show that BVE outperforms state-of-the-art methods across a range of action space sizes.
Published: 2024

5. Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing

Author: Guo, Dongliang, Hu, Mengxuan, Guan, Zihan, Guo, Junfeng, Hartvigsen, Thomas, and Li, Sheng
Subjects: Computer Science - Artificial Intelligence
Abstract: Large pre-trained models have achieved notable success across a range of downstream tasks. However, recent research shows that a type of adversarial attack ($\textit{i.e.,}$ backdoor attack) can manipulate the behavior of machine learning models through contaminating their training dataset, posing significant threat in the real-world application of large pre-trained model, especially for those customized models. Therefore, addressing the unique challenges for exploring vulnerability of pre-trained models is of paramount importance. Through empirical studies on the capability for performing backdoor attack in large pre-trained models ($\textit{e.g.,}$ ViT), we find the following unique challenges of attacking large pre-trained models: 1) the inability to manipulate or even access large training datasets, and 2) the substantial computational resources required for training or fine-tuning these models. To address these challenges, we establish new standards for an effective and feasible backdoor attack in the context of large pre-trained models. In line with these standards, we introduce our EDT model, an \textbf{E}fficient, \textbf{D}ata-free, \textbf{T}raining-free backdoor attack method. Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models, which replaces the embedding of the poisoned image with the target image without poisoning the training dataset or training the victim model. Our experiments, conducted across various pre-trained models such as ViT, CLIP, BLIP, and stable diffusion, and on downstream tasks including image classification, image captioning, and image generation, demonstrate the effectiveness of our method. Our code is available in the supplementary material.
Published: 2024

6. Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

Author: Christ, Bryan R., Gottesman, Zack, Kropko, Jonathan, and Hartvigsen, Thomas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Math reasoning is a highly active area of Large Language Model (LLM) research because it is a hallmark of artificial intelligence. However, few works have explored how math reasoning is encoded within LLM parameters and if it is a skill that can be isolated within a model. Doing so could allow targeted intervention to improve math performance without altering non-math behavior and foster understanding of how models encode math reasoning. We introduce Math Neurosurgery (MathNeuro), a method for isolating math-specific parameters in LLMs using only forward passes. MathNeuro builds on existing work by using weights and activations to calculate parameter importance, but isolates math-specific parameters by removing those important for general language tasks. Pruning parameters MathNeuro identifies deletes a LLM's math reasoning ability without destroying its general language ability. Scaling these parameters by a small constant improves a pretrained or instruction-tuned LLM's performance by 4-17% on GSM8K while leaving non-math behavior unaltered. MathNeuro is also data efficient: most of its effectiveness holds when identifying math-specific parameters using a single sample. MathNeuro highlights the potential for future work to intervene on math-specific parameters., Comment: 21 pages, 29 figures
Published: 2024

7. Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation

Author: Chen, Shan, Gao, Mingye, Sasse, Kuleen, Hartvigsen, Thomas, Anthony, Brian, Fan, Lizhou, Aerts, Hugo, Gallifant, Jack, and Bitterman, Danielle
Subjects: Computer Science - Computation and Language
Abstract: Background: Large language models (LLMs) are trained to follow directions, but this introduces a vulnerability to blindly comply with user requests even if they generate wrong information. In medicine, this could accelerate the generation of misinformation that impacts human well-being. Objectives/Methods: We analyzed compliance to requests to generate misleading content about medications in settings where models know the request is illogical. We investigated whether in-context directions and instruction-tuning of LLMs to prioritize logical reasoning over compliance reduced misinformation risk. Results: While all frontier LLMs complied with misinformation requests, both prompt-based and parameter-based approaches can improve the detection of logic flaws in requests and prevent the dissemination of medical misinformation. Conclusion: Shifting LLMs to prioritize logic over compliance could reduce risks of exploitation for medical misinformation., Comment: Submitted for Review
Published: 2024

8. FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging

Author: Alhamoud, Kumail, Ghunaim, Yasir, Alfarra, Motasem, Hartvigsen, Thomas, Torr, Philip, Ghanem, Bernard, Bibi, Adel, and Ghassemi, Marzyeh
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: For medical imaging AI models to be clinically impactful, they must generalize. However, this goal is hindered by (i) diverse types of distribution shifts, such as temporal, demographic, and label shifts, and (ii) limited diversity in datasets that are siloed within single medical institutions. While these limitations have spurred interest in federated learning, current evaluation benchmarks fail to evaluate different shifts simultaneously. However, in real healthcare settings, multiple types of shifts co-exist, yet their impact on medical imaging performance remains unstudied. In response, we introduce FedMedICL, a unified framework and benchmark to holistically evaluate federated medical imaging challenges, simultaneously capturing label, demographic, and temporal distribution shifts. We comprehensively evaluate several popular methods on six diverse medical imaging datasets (totaling 550 GPU hours). Furthermore, we use FedMedICL to simulate COVID-19 propagation across hospitals and evaluate whether methods can adapt to pandemic changes in disease prevalence. We find that a simple batch balancing technique surpasses advanced methods in average performance across FedMedICL experiments. This finding questions the applicability of results from previous, narrow benchmarks in real-world medical settings., Comment: Accepted at MICCAI 2024. Code is available at: https://github.com/m1k2zoo/FedMedICL
Published: 2024

9. Composable Interventions for Language Models

Author: Kolbeinsson, Arinbjorn, O'Brien, Kyle, Huang, Tianjin, Gao, Shanghua, Liu, Shiwei, Schwarz, Jonathan Richard, Vaidya, Anurag, Mahmood, Faisal, Zitnik, Marinka, Chen, Tianlong, and Hartvigsen, Thomas
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: https://github.com/hartvigsen-group/composable-interventions.
Published: 2024

10. Are Language Models Actually Useful for Time Series Forecasting?

Author: Tan, Mingtian, Merrill, Mike A., Gupta, Vinayak, Althoff, Tim, and Hartvigsen, Thomas
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) are being applied to time series forecasting. But are language models actually useful for time series? In a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade forecasting performance -- in most cases, the results even improve! We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and find that patching and attention structures perform similarly to LLM-based forecasters., Comment: Accepted to NeurIPS 2024 (Spotlight)
Published: 2024

11. Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Author: Gallifant, Jack, Chen, Shan, Moreira, Pedro, Munch, Nikolaj, Gao, Mingye, Pond, Jackson, Celi, Leo Anthony, Aerts, Hugo, Hartvigsen, Thomas, and Bitterman, Danielle
Subjects: Computer Science - Computation and Language
Abstract: Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This is particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead of their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medical benchmarks after swapping brand and generic drug names using physician expert annotations. We assess both open-source and API-based LLMs on MedQA and MedMCQA, revealing a consistent performance drop ranging from 1-10\%. Furthermore, we identify a potential source of this fragility as the contamination of test data in widely used pre-training datasets. All code is accessible at https://github.com/BittermanLab/RABBITS, and a HuggingFace leaderboard is available at https://huggingface.co/spaces/AIM-Harvard/rabbits-leaderboard., Comment: submitted for review, total 15 pages
Published: 2024

12. Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding

Author: Sun, Shenghuan, Schubert, Alexander, Goldgof, Gregory M., Sun, Zhiqing, Hartvigsen, Thomas, Butte, Atul J., and Alaa, Ahmed
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. This challenge is particularly pronounced in the medical domain, where we do not only require VLM outputs to be accurate in single interactions but also to be consistent with clinical reasoning and diagnostic pathways throughout multi-turn conversations. For this purpose, we propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge. These representations are utilized to (i) generate GPT-4-guided visual instruction tuning data at scale, simulating clinician-VLM conversations with demonstrations of clinical reasoning, and (ii) create an automatic reward function that evaluates the clinical validity of VLM generations throughout clinician-VLM interactions. Our algorithm eliminates the need for human involvement in training data generation or reward model construction, reducing costs compared to standard reinforcement learning with human feedback (RLHF). We apply our alignment algorithm to develop Dr-LLaVA, a conversational VLM finetuned for analyzing bone marrow pathology slides, demonstrating strong performance in multi-turn medical conversations., Comment: Code available at: https://github.com/AlaaLab/Dr-LLaVA
Published: 2024

13. PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

Author: Jain, Devansh, Kumar, Priyanshu, Gehman, Samuel, Zhou, Xuhui, Hartvigsen, Thomas, and Sap, Maarten
Subjects: Computer Science - Computation and Language
Abstract: Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages. We overcome the scarcity of naturally occurring toxicity in web-text and ensure coverage across languages with varying resources by automatically scraping over 100M web-text documents. Using PTP, we investigate research questions to study the impact of model size, prompt language, and instruction and preference-tuning methods on toxicity by benchmarking over 60 LLMs. Notably, we find that toxicity increases as language resources decrease or model size increases. Although instruction- and preference-tuning reduce toxicity, the choice of preference-tuning method does not have any significant impact. Our findings shed light on crucial shortcomings of LLM safeguarding and highlight areas for future research., Comment: Accepted to COLM 2024
Published: 2024

14. TAXI: Evaluating Categorical Knowledge Editing for Language Models

Author: Powell, Derek, Gerych, Walter, and Hartvigsen, Thomas
Subjects: Computer Science - Computation and Language
Abstract: Humans rarely learn one fact in isolation. Instead, learning a new fact induces knowledge of other facts about the world. For example, in learning a korat is a type of cat, you also infer it is a mammal and has claws, ensuring your model of the world is consistent. Knowledge editing aims to inject new facts into language models to improve their factuality, but current benchmarks fail to evaluate consistency, which is critical to ensure efficient, accurate, and generalizable edits. We manually create TAXI, a new benchmark dataset specifically created to evaluate consistency in categorical knowledge edits. TAXI contains 11,120 multiple-choice queries for 976 edits spanning 41 categories (e.g., Dogs), 164 subjects (e.g., Labrador), and 183 properties (e.g., is a mammal). We then use TAXI to evaluate popular editors' categorical consistency, measuring how often editing a subject's category appropriately edits its properties. We find that 1) the editors achieve marginal, yet non-random consistency, 2) their consistency far underperforms human baselines, and 3) consistency is more achievable when editing atypical subjects Our code and data are available at https://github.com/derekpowell/taxi., Comment: Accepted to ACL 2024 (Findings)
Published: 2024

15. UNITS: A Unified Multi-Task Time Series Model

Author: Gao, Shanghua, Koker, Teddy, Queen, Owen, Hartvigsen, Thomas, Tsiligkaridis, Theodoros, and Zitnik, Marinka
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Advances in time series models are driving a shift from conventional deep learning methods to pre-trained foundational models. While pre-trained transformers and reprogrammed text-based LLMs report state-of-the-art results, the best-performing architectures vary significantly across tasks, and models often have limited scope, such as focusing only on time series forecasting. Models that unify predictive and generative time series tasks under a single framework remain challenging to achieve. We introduce UniTS, a multi-task time series model that uses task tokenization to express predictive and generative tasks within a single model. UniTS leverages a modified transformer block designed to obtain universal time series representations. This design induces transferability from a heterogeneous, multi-domain pre-training dataset-often with diverse dynamic patterns, sampling rates, and temporal scales-to many downstream datasets, which can also be diverse in task specifications and data domains. Across 38 datasets spanning human activity sensors, healthcare, engineering, and finance domains, UniTS model performs favorably against 12 forecasting models, 20 classification models, 18 anomaly detection models, and 16 imputation models, including repurposed text-based LLMs. UniTS demonstrates effective few-shot and prompt learning capabilities when evaluated on new data domains and tasks. In the conventional single-task setting, UniTS outperforms strong task-specialized time series models. The source code and datasets are available at https://github.com/mims-harvard/UniTS.
Published: 2024

16. MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations

Author: Christ, Bryan R, Kropko, Jonathan, and Hartvigsen, Thomas
Subjects: Computer Science - Computation and Language
Abstract: Math word problems are critical K-8 educational tools, but writing them is time consuming and requires extensive expertise. To be educational, problems must be solvable, have accurate answers, and, most importantly, be educationally appropriate. We propose that language models have potential to support K-8 math education by automatically generating word problems. However, evaluating educational appropriateness is hard to quantify. We fill this gap by having teachers evaluate problems generated by LLMs, who find existing models and data often fail to be educationally appropriate. We then explore automatically generating educational word problems, ultimately using our expert annotations to finetune a 70B language model. Our model, MATHWELL, is the first K-8 word problem generator targeted at educational appropriateness. Further expert studies find MATHWELL generates problems far more solvable, accurate, and appropriate than public models. MATHWELL also matches GPT-4's problem quality while attaining more appropriate reading levels for K-8 students and avoiding generating harmful questions., Comment: 24 pages, 10 figures Accepted to EMNLP 2024 (Findings)
Published: 2024

17. Improving Black-box Robustness with In-Context Rewriting

Author: O'Brien, Kyle, Ng, Nathan, Puri, Isha, Mendez, Jorge, Palangi, Hamid, Kim, Yoon, Ghassemi, Marzyeh, and Hartvigsen, Thomas
Subjects: Computer Science - Machine Learning
Abstract: Machine learning models for text classification often excel on in-distribution (ID) data but struggle with unseen out-of-distribution (OOD) inputs. Most techniques for improving OOD robustness are not applicable to settings where the model is effectively a black box, such as when the weights are frozen, retraining is costly, or the model is leveraged via an API. Test-time augmentation (TTA) is a simple post-hoc technique for improving robustness that sidesteps black-box constraints by aggregating predictions across multiple augmentations of the test input. TTA has seen limited use in NLP due to the challenge of generating effective natural language augmentations. In this work, we propose LLM-TTA, which uses LLM-generated augmentations as TTA's augmentation function. LLM-TTA outperforms conventional augmentation functions across sentiment, toxicity, and news classification tasks for BERT and T5 models, with BERT's OOD robustness improving by an average of 4.48 percentage points without regressing average ID performance. We explore selectively augmenting inputs based on prediction entropy to reduce the rate of expensive LLM augmentations, allowing us to maintain performance gains while reducing the average number of generated augmentations by 57.74\%. LLM-TTA is agnostic to the task model architecture, does not require OOD labels, and is effective across low and high-resource settings. We share our data, models, and code for reproducibility.
Published: 2024

18. Learning from Time Series under Temporal Label Noise

Author: Nagaraj, Sujay, Gerych, Walter, Tonekaboni, Sana, Goldenberg, Anna, Ustun, Berk, and Hartvigsen, Thomas
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Many sequential classification tasks are affected by label noise that varies over time. Such noise can cause label quality to improve, worsen, or periodically change over time. We first propose and formalize temporal label noise, an unstudied problem for sequential classification of time series. In this setting, multiple labels are recorded in sequence while being corrupted by a time-dependent noise function. We first demonstrate the importance of modelling the temporal nature of the label noise function and how existing methods will consistently underperform. We then propose methods that can train noise-tolerant classifiers by estimating the temporal label noise function directly from data. We show that our methods lead to state-of-the-art performance in the presence of diverse temporal label noise functions using real and synthetic data.
Published: 2024

19. Machine Learning for Health symposium 2023 -- Findings track

Author: Hegselmann, Stefan, Parziale, Antonio, Shanmugam, Divya, Tang, Shengpu, Asiedu, Mercy Nyamewaa, Chang, Serina, Hartvigsen, Thomas, and Singh, Harvineet
Subjects: Computer Science - Machine Learning, 68Txx, I.2, J.3, I.6, I.4
Abstract: A collection of the accepted Findings papers that were presented at the 3rd Machine Learning for Health symposium (ML4H 2023), which was held on December 10, 2023, in New Orleans, Louisiana, USA. ML4H 2023 invited high-quality submissions on relevant problems in a variety of health-related disciplines including healthcare, biomedicine, and public health. Two submission tracks were offered: the archival Proceedings track, and the non-archival Findings track. Proceedings were targeted at mature work with strong technical sophistication and a high impact to health. The Findings track looked for new ideas that could spark insightful discussion, serve as valuable resources for the community, or could enable new collaborations. Submissions to the Proceedings track, if not accepted, were automatically considered for the Findings track. All the manuscripts submitted to ML4H Symposium underwent a double-blind peer-review process.
Published: 2023

20. Explaining deep multi-class time series classifiers

Author: Doddaiah, Ramesh, Parvatharaju, Prathyush S., Rundensteiner, Elke, and Hartvigsen, Thomas
Published: 2024
Full Text: View/download PDF

21. Multi-State Brain Network Discovery

Author: Yin, Hang, Su, Yao, Liu, Xinyue, Hartvigsen, Thomas, Li, Yanhua, and Kong, Xiangnan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Brain network discovery aims to find nodes and edges from the spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains. Existing methods tend to derive representative or average brain networks, assuming observed signals are generated by only a single brain activity state. However, the human brain usually involves multiple activity states, which jointly determine the brain activities. The brain regions and their connectivity usually exhibit intricate patterns that are difficult to capture with only a single-state network. Recent studies find that brain parcellation and connectivity change according to the brain activity state. We refer to such brain networks as multi-state, and this mixture can help us understand human behavior. Thus, compared to a single-state network, a multi-state network can prevent us from losing crucial information of cognitive brain network. To achieve this, we propose a new model called MNGL (Multi-state Network Graphical Lasso), which successfully models multi-state brain networks by combining CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model). Using both synthetic and real world ADHD 200 fMRI datasets, we demonstrate that MNGL outperforms recent state-of-the-art alternatives by discovering more explanatory and realistic results., Comment: Published as a regular paper at IEEE BigData 2023
Published: 2023

22. Continuous Time Evidential Distributions for Irregular Time Series

Author: Killian, Taylor W., Zhang, Haoran, Hartvigsen, Thomas, and Amini, Ava P.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Prevalent in many real-world settings such as healthcare, irregular time series are challenging to formulate predictions from. It is difficult to infer the value of a feature at any given time when observations are sporadic, as it could take on a range of values depending on when it was last observed. To characterize this uncertainty we present EDICT, a strategy that learns an evidential distribution over irregular time series in continuous time. This distribution enables well-calibrated and flexible inference of partially observed features at any time of interest, while expanding uncertainty temporally for sparse, irregular observations. We demonstrate that EDICT attains competitive performance on challenging time series classification tasks and enabling uncertainty-guided inference when encountering noisy data., Comment: ICML 2023 Workshop on Interpretable Machine Learning in Healthcare. Code is available at https://github.com/twkillian/EDICT
Published: 2023

23. Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency

Author: Queen, Owen, Hartvigsen, Thomas, Koker, Teddy, He, Huan, Tsiligkaridis, Theodoros, and Zitnik, Marinka
Subjects: Computer Science - Machine Learning
Abstract: Interpreting time series models is uniquely challenging because it requires identifying both the location of time series signals that drive model predictions and their matching to an interpretable temporal pattern. While explainers from other modalities can be applied to time series, their inductive biases do not transfer well to the inherently challenging interpretation of time series. We present TimeX, a time series consistency model for training explainers. TimeX trains an interpretable surrogate to mimic the behavior of a pretrained time series model. It addresses the issue of model faithfulness by introducing model behavior consistency, a novel formulation that preserves relations in the latent space induced by the pretrained model with relations in the latent space induced by TimeX. TimeX provides discrete attribution maps and, unlike existing interpretability methods, it learns a latent space of explanations that can be used in various ways, such as to provide landmarks to visually aggregate similar explanations and easily recognize temporal patterns. We evaluate TimeX on eight synthetic and real-world datasets and compare its performance against state-of-the-art interpretability methods. We also conduct case studies using physiological time series. Quantitative evaluations demonstrate that TimeX achieves the highest or second-highest performance in every metric compared to baselines across all datasets. Through case studies, we show that the novel components of TimeX show potential for training faithful, interpretable models that capture the behavior of pretrained time series models., Comment: Accepted to NeurIPS 2023 (spotlight)
Published: 2023

24. Demographic bias in misdiagnosis by computational pathology models

Author: Vaidya, Anurag, Chen, Richard J., Williamson, Drew F. K., Song, Andrew H., Jaume, Guillaume, Yang, Yuzhe, Hartvigsen, Thomas, Dyer, Emma C., Lu, Ming Y., Lipkova, Jana, Shaban, Muhammad, Chen, Tiffany Y., and Mahmood, Faisal
Published: 2024
Full Text: View/download PDF

25. Interpretable Unified Language Checking

Author: Zhang, Tianhua, Luo, Hongyin, Chuang, Yung-Sung, Fang, Wei, Gaitskell, Luc, Hartvigsen, Thomas, Wu, Xixin, Fox, Danny, Meng, Helen, and Glass, James
Subjects: Computer Science - Computation and Language
Abstract: Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge. We present an interpretable, unified, language checking (UniLC) method for both human and machine-generated language that aims to check if language input is factual and fair. While fairness and fact-checking tasks have been handled separately with dedicated models, we find that LLMs can achieve high performance on a combination of fact-checking, stereotype detection, and hate speech detection tasks with a simple, few-shot, unified set of prompts. With the ``1/2-shot'' multi-task language checking method proposed in this work, the GPT3.5-turbo model outperforms fully supervised baselines on several language tasks. The simple approach and results suggest that based on strong latent knowledge representations, an LLM can be an adaptive and explainable tool for detecting misinformation, stereotypes, and hate speech., Comment: 10 + 5 pages
Published: 2023

26. Finding Short Signals in Long Irregular Time Series with Continuous-Time Attention Policy Networks

Author: Hartvigsen, Thomas, Thadajarassiri, Jidapa, Kong, Xiangnan, and Rundensteiner, Elke
Subjects: Computer Science - Machine Learning
Abstract: Irregularly-sampled time series (ITS) are native to high-impact domains like healthcare, where measurements are collected over time at uneven intervals. However, for many classification problems, only small portions of long time series are often relevant to the class label. In this case, existing ITS models often fail to classify long series since they rely on careful imputation, which easily over- or under-samples the relevant regions. Using this insight, we then propose CAT, a model that classifies multivariate ITS by explicitly seeking highly-relevant portions of an input series' timeline. CAT achieves this by integrating three components: (1) A Moment Network learns to seek relevant moments in an ITS's continuous timeline using reinforcement learning. (2) A Receptor Network models the temporal dynamics of both observations and their timing localized around predicted moments. (3) A recurrent Transition Model models the sequence of transitions between these moments, cultivating a representation with which the series is classified. Using synthetic and real data, we find that CAT outperforms ten state-of-the-art methods by finding short signals in long irregular time series.
Published: 2023

27. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors

Author: Hartvigsen, Thomas, Sankaranarayanan, Swami, Palangi, Hamid, Kim, Yoon, and Ghassemi, Marzyeh
Subjects: Computer Science - Machine Learning
Abstract: Deployed language models decay over time due to shifting inputs, changing user needs, or emergent world-knowledge gaps. When such problems are identified, we want to make targeted edits while avoiding expensive retraining. However, current model editors, which modify such behaviors of pre-trained models, degrade model performance quickly across multiple, sequential edits. We propose GRACE, a lifelong model editing method, which implements spot-fixes on streaming errors of a deployed model, ensuring minimal impact on unrelated inputs. GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights. This is the first method enabling thousands of sequential edits using only streaming errors. Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs. Our code is available at https://www.github.com/thartvigsen/grace}., Comment: Accepted to NeurIPS 2023
Published: 2022

28. Class-Specific Explainability for Deep Time Series Classifiers

Author: Doddaiah, Ramesh, Parvatharaju, Prathyush, Rundensteiner, Elke, and Hartvigsen, Thomas
Subjects: Computer Science - Machine Learning
Abstract: Explainability helps users trust deep learning solutions for time series classification. However, existing explainability methods for multi-class time series classifiers focus on one class at a time, ignoring relationships between the classes. Instead, when a classifier is choosing between many classes, an effective explanation must show what sets the chosen class apart from the rest. We now formalize this notion, studying the open problem of class-specific explainability for deep time series classifiers, a challenging and impactful problem setting. We design a novel explainability method, DEMUX, which learns saliency maps for explaining deep multi-class time series classifiers by adaptively ensuring that its explanation spotlights the regions in an input time series that a model uses specifically to its predicted class. DEMUX adopts a gradient-based approach composed of three interdependent modules that combine to generate consistent, class-specific saliency maps that remain faithful to the classifier's behavior yet are easily understood by end users. Our experimental study demonstrates that DEMUX outperforms nine state-of-the-art alternatives on five popular datasets when explaining two types of deep time series classifiers. Further, through a case study, we demonstrate that DEMUX's explanations indeed highlight what separates the predicted class from the others in the eyes of the classifier. Our code is publicly available at https://github.com/rameshdoddaiah/DEMUX., Comment: This paper is accepted in ICDM 2022
Published: 2022

29. Stop&Hop: Early Classification of Irregular Time Series

Author: Hartvigsen, Thomas, Gerych, Walter, Thadajarassiri, Jidapa, Kong, Xiangnan, and Rundensteiner, Elke
Subjects: Computer Science - Machine Learning
Abstract: Early classification algorithms help users react faster to their machine learning model's predictions. Early warning systems in hospitals, for example, let clinicians improve their patients' outcomes by accurately predicting infections. While early classification systems are advancing rapidly, a major gap remains: existing systems do not consider irregular time series, which have uneven and often-long gaps between their observations. Such series are notoriously pervasive in impactful domains like healthcare. We bridge this gap and study early classification of irregular time series, a new setting for early classifiers that opens doors to more real-world problems. Our solution, Stop&Hop, uses a continuous-time recurrent network to model ongoing irregular time series in real time, while an irregularity-aware halting policy, trained with reinforcement learning, predicts when to stop and classify the streaming series. By taking real-valued step sizes, the halting policy flexibly decides exactly when to stop ongoing series in real time. This way, Stop&Hop seamlessly integrates information contained in the timing of observations, a new and vital source for early classification in this setting, with the time series values to provide early classifications for irregular time series. Using four synthetic and three real-world datasets, we demonstrate that Stop&Hop consistently makes earlier and more-accurate predictions than state-of-the-art alternatives adapted to this new problem. Our code is publicly available at https://github.com/thartvigsen/StopAndHop., Comment: This paper was accepted to CIKM'22. Code at https://github.com/thartvigsen/StopAndHop
Published: 2022
Full Text: View/download PDF

30. TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks

Author: Hu, Ruofan, Zhang, Dongyu, Tao, Dandan, Hartvigsen, Thomas, Feng, Hao, and Rundensteiner, Elke
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Foodborne illness is a serious but preventable public health problem -- with delays in detecting the associated outbreaks resulting in productivity loss, expensive recalls, public safety hazards, and even loss of life. While social media is a promising source for identifying unreported foodborne illnesses, there is a dearth of labeled datasets for developing effective outbreak detection models. To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks. TWEET-FID collected from Twitter is annotated with three facets: tweet class, entity type, and slot type, with labels produced by experts as well as by crowdsource workers. We introduce several domain tasks leveraging these three facets: text relevance classification (TRC), entity mention detection (EMD), and slot filling (SF). We describe the end-to-end methodology for dataset design, creation, and labeling for supporting model development for these tasks. A comprehensive set of results for these tasks leveraging state-of-the-art single- and multi-task deep learning methods on the TWEET-FID dataset are provided. This dataset opens opportunities for future research in foodborne outbreak detection., Comment: LREC 2022
Published: 2022

31. The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

Author: Balagopalan, Aparna, Zhang, Haoran, Hamidieh, Kimia, Hartvigsen, Thomas, Rudzicz, Frank, and Ghassemi, Marzyeh
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: Machine learning models in safety-critical settings like healthcare are often blackboxes: they contain a large number of parameters which are not transparent to users. Post-hoc explainability methods where a simple, human-interpretable model imitates the behavior of these blackbox models are often proposed to help users trust model predictions. In this work, we audit the quality of such explanations for different protected subgroups using real data from four settings in finance, healthcare, college admissions, and the US justice system. Across two different blackbox model architectures and four popular explainability methods, we find that the approximation quality of explanation models, also known as the fidelity, differs significantly between subgroups. We also demonstrate that pairing explainability methods with recent advances in robust machine learning can improve explanation fairness in some settings. However, we highlight the importance of communicating details of non-zero fidelity gaps to users, since a single solution might not exist across all settings. Finally, we discuss the implications of unfair explanation models as a challenging and understudied problem facing the machine learning community., Comment: Published in FAccT 2022
Published: 2022
Full Text: View/download PDF

32. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Author: Hartvigsen, Thomas, Gabriel, Saadia, Palangi, Hamid, Sap, Maarten, Ray, Dipankar, and Kamar, Ece
Subjects: Computer Science - Computation and Language
Abstract: Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle with detecting implicitly toxic language. To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign statements about 13 minority groups. We develop a demonstration-based prompting framework and an adversarial classifier-in-the-loop decoding method to generate subtly toxic and benign text with a massive pretrained language model. Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale, and about more demographic groups, than previous resources of human-written text. We conduct a human evaluation on a challenging subset of ToxiGen and find that annotators struggle to distinguish machine-generated text from human-written language. We also find that 94.5% of toxic examples are labeled as hate speech by human annotators. Using three publicly-available datasets, we show that finetuning a toxicity classifier on our data improves its performance on human-written data substantially. We also demonstrate that ToxiGen can be used to fight machine-generated toxicity as finetuning improves the classifier significantly on our evaluation subset. Our code and data can be found at https://github.com/microsoft/ToxiGen., Comment: Published as a long paper at ACL 2022. Code: https://github.com/microsoft/TOXIGEN
Published: 2022

33. Dissecting the heterogeneity of “in the wild” stress from multimodal sensor data

Author: Nagaraj, Sujay, Goodday, Sarah, Hartvigsen, Thomas, Boch, Adrien, Garg, Kopal, Gowda, Sindhu, Foschini, Luca, Ghassemi, Marzyeh, Friend, Stephen, and Goldenberg, Anna
Published: 2023
Full Text: View/download PDF

34. MATHWELL: Generating Age-Appropriate Educational Math Word Problems

Author: Christ, Bryan R, Kropko, Jonathan, Hartvigsen, Thomas, Christ, Bryan R, Kropko, Jonathan, and Hartvigsen, Thomas
Abstract: Math word problems are critical K-8 educational tools, but writing them is time-consuming and requires domain expertise. We suggest that language models can support K-8 math education by automatically generating problems. To be educational, generated problems must be 1) solvable, 2) accurate, and 3) appropriate. Existing datasets are unlabeled for these criteria, making them ill-suited for training problem generators. To address this gap, we use domain expert annotation to curate a high-quality synthetic training dataset for this task. We show the value of this data by using it to iteratively finetune Llama-2 (70B) to create MATHWELL, a K-8 word problem generator. Domain experts find MATHWELL has a 40% higher share of problems that have executable solutions and meet all criteria than existing open-source models, with 74% of its problems with executable solutions being solvable, accurate, and appropriate. MATHWELL achieves 94.9% of GPT-4 Turbo's performance on this task while outputting problems written at a more appropriate reading level for K-8 students. MATHWELL's performance despite being trained by finetuning only highlights the quality of our synthetic data for training age-appropriate word problem generators. We release our model, data, and annotations., Comment: 26 pages, 9 figures
Published: 2024

35. Detecting MRSA Infections by Fusing Structured and Unstructured Electronic Health Record Data

Author: Hartvigsen, Thomas, Sen, Cansu, Rundensteiner, Elke A., Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Yuan, Junsong, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Cliquet Jr., Alberto, editor, Wiebe, Sheldon, editor, Anderson, Paul, editor, Saggio, Giovanni, editor, Zwiggelaar, Reyer, editor, Gamboa, Hugo, editor, Fred, Ana, editor, and Bermúdez i Badia, Sergi, editor
Published: 2019
Full Text: View/download PDF

36. Stabilizing Adversarial Training for Generative Networks

Author: Gerych, Walter, primary, Hickey, Kevin, additional, Hartvigsen, Thomas, additional, Buquicchio, Luke, additional, Alajaji, Abdulaziz, additional, Chandrasekaran, Kavin, additional, Mansoor, Hamid, additional, Agu, Emmanuel, additional, and Rundensteiner, Elke, additional
Published: 2023
Full Text: View/download PDF

37. Multi-State Brain Network Discovery

Author: Yin, Hang, primary, Su, Yao, additional, Liu, Xinyue, additional, Hartvigsen, Thomas, additional, Li, Yanhua, additional, and Kong, Xiangnan, additional
Published: 2023
Full Text: View/download PDF

38. CREST - Risk Prediction for Clostridium Difficile Infection Using Multimodal Data Mining

Author: Sen, Cansu, Hartvigsen, Thomas, Rundensteiner, Elke, Claypool, Kajal, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Altun, Yasemin, editor, Das, Kamalika, editor, Mielikäinen, Taneli, editor, Malerba, Donato, editor, Stefanowski, Jerzy, editor, Read, Jesse, editor, Žitnik, Marinka, editor, Ceci, Michelangelo, editor, and Džeroski, Sašo, editor
Published: 2017
Full Text: View/download PDF

39. Detecting MRSA Infections by Fusing Structured and Unstructured Electronic Health Record Data

Author: Hartvigsen, Thomas, primary, Sen, Cansu, additional, and Rundensteiner, Elke A., additional
Published: 2019
Full Text: View/download PDF

40. Algorithmic Fairness in Chest X-ray Diagnosis: A Case Study

Author: Zhang, Haoran, primary, Hartvigsen, Thomas, additional, and Ghassemi, Marzyeh, additional
Published: 2023
Full Text: View/download PDF

41. Explaining Deep Multi-Class Time Series Classifiers

Author: Doddaiah, Ramesh, primary, Parvatharaju, Prathyush, additional, Rundensteiner, Elke, additional, and Hartvigsen, Thomas, additional
Published: 2023
Full Text: View/download PDF

42. Class-Specific Explainability for Deep Time Series Classifiers

Author: Doddaiah, Ramesh, primary, Parvatharaju, Prathyush, additional, Rundensteiner, Elke, additional, and Hartvigsen, Thomas, additional
Published: 2022
Full Text: View/download PDF

43. Robust Recurrent Classifier Chains for Multi-Label Learning with Missing Labels

Author: Gerych, Walter, primary, Hartvigsen, Thomas, additional, Buquicchio, Luke, additional, Agu, Emmanuel, additional, and Rundensteiner, Elke, additional
Published: 2022
Full Text: View/download PDF

44. Stop&Hop: Early Classification of Irregular Time Series

Author: Hartvigsen, Thomas, primary, Gerych, Walter, additional, Thadajarassiri, Jidapa, additional, Kong, Xiangnan, additional, and Rundensteiner, Elke, additional
Published: 2022
Full Text: View/download PDF

45. CREST - Risk Prediction for Clostridium Difficile Infection Using Multimodal Data Mining

Author: Sen, Cansu, primary, Hartvigsen, Thomas, additional, Rundensteiner, Elke, additional, and Claypool, Kajal, additional
Published: 2017
Full Text: View/download PDF

46. Recovering the Propensity Score from Biased Positive Unlabeled Data

Author: Gerych, Walter, primary, Hartvigsen, Thomas, additional, Buquicchio, Luke, additional, Agu, Emmanuel, additional, and Rundensteiner, Elke, additional
Published: 2022
Full Text: View/download PDF

47. The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

Author: Balagopalan, Aparna, primary, Zhang, Haoran, additional, Hamidieh, Kimia, additional, Hartvigsen, Thomas, additional, Rudzicz, Frank, additional, and Ghassemi, Marzyeh, additional
Published: 2022
Full Text: View/download PDF

48. The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

Author: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Balagopalan, Aparna, Zhang, Haoran, Hamidieh, Kimia, Hartvigsen, Thomas, Rudzicz, Frank, Ghassemi, Marzyeh, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Balagopalan, Aparna, Zhang, Haoran, Hamidieh, Kimia, Hartvigsen, Thomas, Rudzicz, Frank, and Ghassemi, Marzyeh
Published: 2022

49. Stop&Hop: Early Classification of Irregular Time Series

Author: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Hartvigsen, Thomas, Gerych, Walter, Thadajarassiri, Jidapa, Kong, Xiangnan, Rundensteiner, Elke, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Hartvigsen, Thomas, Gerych, Walter, Thadajarassiri, Jidapa, Kong, Xiangnan, and Rundensteiner, Elke
Published: 2022

50. Robust Recurrent Classifier Chains For Multi-Label Learning With Missing Labels

Author: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Gerych, Walter, Hartvigsen, Thomas, Buquicchio, Luke, Agu, Emmanuel, Rundensteiner, Elke, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Gerych, Walter, Hartvigsen, Thomas, Buquicchio, Luke, Agu, Emmanuel, and Rundensteiner, Elke
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

100 results on '"Hartvigsen, Thomas"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources