Author: "Krishna, Kundan" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Krishna, Kundan"' showing total 13 results

Start Over Author "Krishna, Kundan" Publication Type Reports

13 results on '"Krishna, Kundan"'

1. Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Author: Shen, Hua, Knearem, Tiffany, Ghosh, Reshmi, Alkiek, Kenan, Krishna, Kundan, Liu, Yachuan, Ma, Ziqiao, Petridis, Savvas, Peng, Yi-Hao, Qiwei, Li, Rakshit, Sushrita, Si, Chenglei, Xie, Yutong, Bigham, Jeffrey P., Bentley, Frank, Chai, Joyce, Lipton, Zachary, Mei, Qiaozhu, Mihalcea, Rada, Terry, Michael, Yang, Diyi, Morris, Meredith Ringel, Resnick, Paul, and Jurgens, David
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML). We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including literature gaps and trends, human values, and interaction techniques. To pave the way for future studies, we envision three key challenges and give recommendations for future research., Comment: proposing "bidirectional human-AI alignment" framework after a systematic review of over 400 alignment papers
Published: 2024

2. GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

Author: Krishna, Kundan, Ramprasad, Sanjana, Gupta, Prakhar, Wallace, Byron C., Lipton, Zachary C., and Bigham, Jeffrey P.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that are not supported by the reference document, and also presents evidence from the reference for facts that do appear to have support. We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users. Comprehensive evaluation by human raters shows that GenAudit can detect errors in 8 different LLM outputs when summarizing documents from diverse domains. To ensure that most errors are flagged by the system, we propose a method that can increase the error recall while minimizing impact on precision. We release our tool (GenAudit) and fact-checking model for public use., Comment: Code and models available at https://genaudit.org
Published: 2024

3. Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

Author: Ramprasad, Sanjana, Krishna, Kundan, Lipton, Zachary C, and Wallace, Byron C
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (potentially more specialized) domains? In this work we evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills (in addition to standard news benchmarks for reference). We focus especially on the factuality of outputs. We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors. We analyze whether the prevalence of a given domain in the pretraining corpus affects extractiveness and faithfulness of generated summaries of articles in this domain. We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles. The dataset can be downloaded from https://github.com/sanjanaramprasad/zero_shot_faceval_domains
Published: 2024

4. USB: A Unified Summarization Benchmark Across Tasks and Domains

Author: Krishna, Kundan, Gupta, Prakhar, Ramprasad, Sanjana, Wallace, Byron C., Bigham, Jeffrey P., and Lipton, Zachary C.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization; (iii) topic-based summarization; (iv) compressing selected sentences into a one-line summary; (v) surfacing evidence for a summary sentence; (vi) predicting the factual accuracy of a summary sentence; (vii) identifying unsubstantiated spans in a summary sentence; (viii) correcting factual errors in summaries. We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models. For factuality-related tasks, we also evaluate existing heuristics to create training data and find that training on them results in worse performance than training on $20\times$ less human-labeled data. Our articles draw from $6$ domains, facilitating cross-domain analysis. On some tasks, the amount of training data matters more than the domain where it comes from, while for other tasks training specifically on data from the target domain, even if limited, is more beneficial., Comment: EMNLP Findings 2023 Camera Ready
Published: 2023

5. Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Author: Krishna, Kundan, Zhao, Yao, Ren, Jie, Lakshminarayanan, Balaji, Luo, Jiaming, Saleh, Mohammad, and Liu, Peter J.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical study quantifying the sometimes severe loss in performance (up to 12 ROUGE-1 points) from different types of input noise for a range of datasets and model sizes. We then propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any extra training, auxiliary models, or even prior knowledge of the type of noise. Our proposed approach effectively mitigates the loss in performance, recovering a large fraction of the performance drop, sometimes as large as 11 ROUGE-1 points., Comment: EMNLP Findings 2023 Camera Ready
Published: 2022

6. Out-of-Distribution Detection and Selective Generation for Conditional Language Models

Author: Ren, Jie, Luo, Jiaming, Zhao, Yao, Krishna, Kundan, Saleh, Mohammad, Lakshminarayanan, Balaji, and Liu, Peter J.
Subjects: Computer Science - Computation and Language
Abstract: Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on OOD inputs as the prediction is done auto-regressively over many steps. Furthermore, the space of potential low-quality outputs is larger as arbitrary text can be generated and it is important to know when to trust the generated output. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation (analogous to selective prediction for classification) of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models., Comment: Published in ICLR 2023
Published: 2022

7. Downstream Datasets Make Surprisingly Good Pretraining Corpora

Author: Krishna, Kundan, Garg, Saurabh, Bigham, Jeffrey P., and Lipton, Zachary C.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: For most natural language processing tasks, the dominant practice is to finetune large pretrained transformer models (e.g., BERT) using smaller downstream datasets. Despite the success of this approach, it remains unclear to what extent these gains are attributable to the massive background corpora employed for pretraining versus to the pretraining objectives themselves. This paper introduces a large-scale study of self-pretraining, where the same (downstream) training data is used for both pretraining and finetuning. In experiments addressing both ELECTRA and RoBERTa models and 10 distinct downstream classification datasets, we observe that self-pretraining rivals standard pretraining on the BookWiki corpus (despite using around $10\times$--$500\times$ less data), outperforming the latter on $7$ and $5$ datasets, respectively. Surprisingly, these task-specific pretrained models often perform well on other tasks, including the GLUE benchmark. Besides classification tasks, self-pretraining also provides benefits on structured output prediction tasks such as span based question answering and commonsense inference, often providing more than $50\%$ of the performance boosts provided by pretraining on the BookWiki corpus. Our results hint that in many scenarios, performance gains attributable to pretraining are driven primarily by the pretraining objective itself and are not always attributable to the use of external pretraining data in massive amounts. These findings are especially relevant in light of concerns about intellectual property and offensive content in web-scale pretraining data., Comment: ACL2023 Camera Ready
Published: 2022

8. Does Pretraining for Summarization Require Knowledge Transfer?

Author: Krishna, Kundan, Bigham, Jeffrey, and Lipton, Zachary C.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Pretraining techniques leveraging enormous datasets have driven recent advances in text summarization. While folk explanations suggest that knowledge transfer accounts for pretraining's benefits, little is known about why it works or what makes a pretraining task or dataset suitable. In this paper, we challenge the knowledge transfer story, showing that pretraining on documents consisting of character n-grams selected at random, we can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues. To see whether the small residual benefit of using real data could be accounted for by the structure of the pretraining task, we design several tasks motivated by a qualitative study of summarization corpora. However, these tasks confer no appreciable benefit, leaving open the possibility of a small role for knowledge transfer., Comment: Camera-ready for Findings of EMNLP 2021
Published: 2021

9. Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances

Author: Krishna, Kundan, Pavel, Amy, Schloss, Benjamin, Bigham, Jeffrey P., and Lipton, Zachary C.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Despite diverse efforts to mine various modalities of medical data, the conversations between physicians and patients at the time of care remain an untapped source of insights. In this paper, we leverage this data to extract structured information that might assist physicians with post-visit documentation in electronic health records, potentially lightening the clerical burden. In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels. We focus on the tasks of recognizing relevant diagnoses and abnormalities in the review of organ systems (RoS). One methodological challenge is that the conversations are long (around 1500 words), making it difficult for modern deep-learning models to use them as input. To address this challenge, we extract noteworthy utterances---parts of the conversation likely to be cited as evidence supporting some summary sentence. We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.
Published: 2020

10. Reinforced Rewards Framework for Text Style Transfer

Author: Sancheti, Abhilasha, Krishna, Kundan, Srinivasan, Balaji Vasan, and Natarajan, Anandhavelu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Style transfer deals with the algorithms to transfer the stylistic properties of a piece of text into that of another while ensuring that the core content is preserved. There has been a lot of interest in the field of text style transfer due to its wide application to tailored text generation. Existing works evaluate the style transfer models based on content preservation and transfer strength. In this work, we propose a reinforcement learning based framework that directly rewards the framework on these target metrics yielding a better transfer of the target style. We show the improved performance of our proposed framework based on automatic and human evaluation on three independent tasks: wherein we transfer the style of text from formal to informal, high excitement to low excitement, modern English to Shakespearean English, and vice-versa in all the three cases. Improved performance of the proposed framework over existing state-of-the-art frameworks indicates the viability of the approach., Comment: ECIR 2020
Published: 2020

11. Generating SOAP Notes from Doctor-Patient Conversations Using Modular Summarization Techniques

Author: Krishna, Kundan, Khosla, Sopan, Bigham, Jeffrey P., and Lipton, Zachary C.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Following each patient visit, physicians draft long semi-structured clinical summaries called SOAP notes. While invaluable to clinicians and researchers, creating digital SOAP notes is burdensome, contributing to physician burnout. In this paper, we introduce the first complete pipelines to leverage deep summarization models to generate these notes based on transcripts of conversations between physicians and patients. After exploring a spectrum of methods across the extractive-abstractive spectrum, we propose Cluster2Sent, an algorithm that (i) extracts important utterances relevant to each summary section; (ii) clusters together related utterances; and then (iii) generates one summary sentence per cluster. Cluster2Sent outperforms its purely abstractive counterpart by 8 ROUGE-1 points, and produces significantly more factual and coherent sentences as assessed by expert human evaluators. For reproducibility, we demonstrate similar benefits on the publicly available AMI dataset. Our results speak to the benefits of structuring summaries into sections and annotating supporting evidence when constructing summarization corpora., Comment: Published at ACL 2021 Main Conference
Published: 2020

12. Improving generation quality of pointer networks via guided attention

Author: Chawla, Kushal, Krishna, Kundan, and Srinivasan, Balaji Vasan
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Pointer generator networks have been used successfully for abstractive summarization. Along with the capability to generate novel words, it also allows the model to copy from the input text to handle out-of-vocabulary words. In this paper, we point out two key shortcomings of the summaries generated with this framework via manual inspection, statistical analysis and human evaluation. The first shortcoming is the extractive nature of the generated summaries, since the network eventually learns to copy from the input article most of the times, affecting the abstractive nature of the generated summaries. The second shortcoming is the factual inaccuracies in the generated text despite grammatical correctness. Our analysis indicates that this arises due to incorrect attention transition between different parts of the article. We propose an initial attempt towards addressing both these shortcomings by externally appending traditional linguistic information parsed from the input text, thereby teaching networks on the structure of the underlying text. Results indicate feasibility and potential of such additional cues for improved generation., Comment: In AAAI-19 Workshop on Network Interpretability for Deep Learning
Published: 2019

13. Normality of the Ehrenfeucht-Mycielski Sequence

Author: Krishna, Kundan and Nandakumar, Satyadev
Subjects: Computer Science - Discrete Mathematics, Mathematics - Combinatorics
Abstract: We study the binary Ehrenfeucht Mycielski sequence seeking a balance between the number of occurrences of different binary strings. There have been numerous attempts to prove the balance conjecture of the sequence, which roughly states that 1 and 0 occur equally often in it. Our contribution is twofold. First, we study weaker forms of the conjecture proved in the past and lay out detailed proofs for many lemmas which were stated without proofs. Secondly, we extend the claim of balance to that of normality and prove a weaker form of simple normality to word length 2.
Published: 2017

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Krishna, Kundan"'

1. Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

2. GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

3. Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

4. USB: A Unified Summarization Benchmark Across Tasks and Domains

5. Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

6. Out-of-Distribution Detection and Selective Generation for Conditional Language Models

7. Downstream Datasets Make Surprisingly Good Pretraining Corpora

8. Does Pretraining for Summarization Require Knowledge Transfer?

9. Extracting Structured Data from Physician-Patient Conversations By Predicting Noteworthy Utterances

10. Reinforced Rewards Framework for Text Style Transfer

11. Generating SOAP Notes from Doctor-Patient Conversations Using Modular Summarization Techniques

12. Improving generation quality of pointer networks via guided attention

13. Normality of the Ehrenfeucht-Mycielski Sequence

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

13 results on '"Krishna, Kundan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources