Author: "Iter, Dan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Iter, Dan"' showing total 32 results

Start Over Author "Iter, Dan"

32 results on '"Iter, Dan"'

1. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Author: Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Martin, Cai, Qin, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-Ling, Cheng, Hao, Chopra, Parul, Dai, Xiyang, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Gao, Jianfeng, Gao, Mei, Gao, Min, Garg, Amit, Del Giorno, Allie, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Hu, Wenxiang, Huynh, Jamie, Iter, Dan, Jacobs, Sam Ade, Javaheripi, Mojan, Jin, Xin, Karampatziakis, Nikos, Kauffmann, Piero, Khademi, Mahoud, Kim, Dongwoo, Kim, Young Jin, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Lin, Xihui, Lin, Zeqi, Liu, Ce, Liu, Liyuan, Liu, Mengchen, Liu, Weishung, Liu, Xiaodong, Luo, Chong, Madan, Piyush, Mahmoudzadeh, Ali, Majercak, David, Mazzola, Matt, Mendes, Caio César Teodoro, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Ren, Liliang, de Rosa, Gustavo, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shen, Yelong, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Vaddamanu, Praneetha, Wang, Chunyu, Wang, Guanhua, Wang, Lijuan, Wang, Shuohang, Wang, Xin, Wang, Yu, Ward, Rachel, Wen, Wen, Witte, Philipp, Wu, Haiping, Wu, Xiaoxia, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Xue, Jilong, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Yuan, Lu, Zhang, Chenruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, and Zhou, Xiren
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts., Comment: 24 pages
Published: 2024

2. Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Author: Zhang, Zhihan, Wang, Shuohang, Yu, Wenhao, Xu, Yichong, Iter, Dan, Zeng, Qingkai, Liu, Yang, Zhu, Chenguang, and Jiang, Meng
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) can perform a wide range of tasks by following natural language instructions, without the necessity of task-specific fine-tuning. Unfortunately, the performance of LLMs is greatly influenced by the quality of these instructions, and manually writing effective instructions for each task is a laborious and subjective process. In this paper, we introduce Auto-Instruct, a novel method to automatically improve the quality of instructions provided to LLMs. Our method leverages the inherent generative ability of LLMs to produce diverse candidate instructions for a given task, and then ranks them using a scoring model trained on a variety of 575 existing NLP tasks. In experiments on 118 out-of-domain tasks, Auto-Instruct surpasses both human-written instructions and existing baselines of LLM-generated instructions. Furthermore, our method exhibits notable generalizability even with other LLMs that are not incorporated into its training process., Comment: Accepted to EMNLP 2023 Findings. Work was done before July 2023
Published: 2023

3. The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

Author: Ouyang, Siru, Wang, Shuohang, Liu, Yang, Zhong, Ming, Jiao, Yizhu, Iter, Dan, Pryzant, Reid, Zhu, Chenguang, Ji, Heng, and Han, Jiawei
Subjects: Computer Science - Computation and Language
Abstract: Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks. However, it remains unclear whether the existing focus of NLP research accurately captures the genuine requirements of human users. This paper provides a comprehensive analysis of the divergence between current NLP research and the needs of real-world NLP applications via a large-scale collection of user-GPT conversations. We analyze a large-scale collection of real user queries to GPT. We compare these queries against existing NLP benchmark tasks and identify a significant gap between the tasks that users frequently request from LLMs and the tasks that are commonly studied in academic research. For example, we find that tasks such as ``design'' and ``planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks. We investigate these overlooked tasks, dissect the practical challenges they pose, and provide insights toward a roadmap to make LLMs better aligned with user needs., Comment: EMNLP 2023
Published: 2023

4. In-Context Demonstration Selection with Cross Entropy Difference

Author: Iter, Dan, Pryzant, Reid, Xu, Ruochen, Wang, Shuohang, Liu, Yang, Xu, Yichong, and Zhu, Chenguang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) can use in-context demonstrations to improve performance on zero-shot tasks. However, selecting the best in-context examples is challenging because model performance can vary widely depending on the selected examples. We present a cross-entropy difference (CED) method for selecting in-context demonstrations. Our method is based on the observation that the effectiveness of in-context demonstrations negatively correlates with the perplexity of the test example by a language model that was finetuned on that demonstration. We utilize parameter efficient finetuning to train small models on training data that are used for computing the cross-entropy difference between a test example and every candidate in-context demonstration. This metric is used to rank and select in-context demonstrations independently for each test input. We evaluate our method on a mix-domain dataset that combines 8 benchmarks, representing 4 text generation tasks, showing that CED for in-context demonstration selection can improve performance for a variety of LLMs.
Published: 2023

5. LMGQS: A Large-scale Dataset for Query-focused Summarization

Author: Xu, Ruochen, Wang, Song, Liu, Yang, Wang, Shuohang, Xu, Yichong, Iter, Dan, Zhu, Chenguang, and Zeng, Michael
Subjects: Computer Science - Computation and Language
Abstract: Query-focused summarization (QFS) aims to extract or generate a summary of an input document that directly answers or is relevant to a given query. The lack of large-scale datasets in the form of documents, queries, and summaries has hindered model development in this area. In contrast, multiple large-scale high-quality datasets for generic summarization exist. We hypothesize that there is a hidden query for each summary sentence in a generic summarization annotation, and we utilize a large-scale pretrained language model to recover it. In this way, we convert four generic summarization benchmarks into a new QFS benchmark dataset, LMGQS, which consists of over 1 million document-query-summary samples. We thoroughly investigate the properties of our proposed dataset and establish baselines with state-of-the-art summarization models. By fine-tuning a language model on LMGQS, we achieve state-of-the-art zero-shot and supervised performance on multiple existing QFS benchmarks, demonstrating the high quality and diversity of LMGQS., Comment: work in progress
Published: 2023

6. InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

Author: Xu, Yichong, Xu, Ruochen, Iter, Dan, Liu, Yang, Wang, Shuohang, Zhu, Chenguang, and Zeng, Michael
Subjects: Computer Science - Computation and Language
Abstract: While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications. Conversely, previous studies have found that although automatic metrics tend to favor smaller fine-tuned models, the quality of the summaries they generate is inferior to that of larger models like GPT-3 when assessed by human evaluators. To address this issue, we propose InheritSumm, a versatile and compact summarization model derived from GPT-3.5 through distillation. InheritSumm not only exhibits comparable zeroshot and fewshot summarization capabilities to GPT-3.5 but is also sufficiently compact for fine-tuning purposes. Experimental results demonstrate that InheritSumm achieves similar or superior performance to GPT-3.5 in zeroshot and fewshot settings. Furthermore, it outperforms the previously established best small models in both prefix-tuning and full-data fine-tuning scenarios., Comment: work in progress
Published: 2023

7. Automatic Prompt Optimization with 'Gradient Descent' and Beam Search

Author: Pryzant, Reid, Iter, Dan, Li, Jerry, Lee, Yin Tat, Zhu, Chenguang, and Zeng, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and nonparametric solution to this problem, Automatic Prompt Optimization (APO), which is inspired by numerical gradient descent to automatically improve prompts, assuming access to training data and an LLM API. The algorithm uses minibatches of data to form natural language "gradients" that criticize the current prompt. The gradients are then "propagated" into the prompt by editing the prompt in the opposite semantic direction of the gradient. These gradient descent steps are guided by a beam search and bandit selection procedure which significantly improves algorithmic efficiency. Preliminary results across three benchmark NLP tasks and the novel problem of LLM jailbreak detection suggest that Automatic Prompt Optimization can outperform prior prompt editing techniques and improve an initial prompt's performance by up to 31%, by using data to rewrite vague task descriptions into more precise annotation instructions., Comment: EMNLP 2023
Published: 2023

8. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Author: Liu, Yang, Iter, Dan, Xu, Yichong, Wang, Shuohang, Xu, Ruochen, and Zhu, Chenguang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The quality of texts generated by natural language generation (NLG) systems is hard to measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE, have been shown to have relatively low correlation with human judgments, especially for tasks that require creativity and diversity. Recent studies suggest using large language models (LLMs) as reference-free metrics for NLG evaluation, which have the benefit of being applicable to new tasks that lack human references. However, these LLM-based evaluators still have lower human correspondence than medium-size neural evaluators. In this work, we present G-Eval, a framework of using large language models with chain-of-thoughts (CoT) and a form-filling paradigm, to assess the quality of NLG outputs. We experiment with two generation tasks, text summarization and dialogue generation. We show that G-Eval with GPT-4 as the backbone model achieves a Spearman correlation of 0.514 with human on summarization task, outperforming all previous methods by a large margin. We also propose preliminary analysis on the behavior of LLM-based evaluators, and highlight the potential issue of LLM-based evaluators having a bias towards the LLM-generated texts. The code is at https://github.com/nlpyang/geval
Published: 2023

9. How Does In-Context Learning Help Prompt Tuning?

Author: Sun, Simeng, Liu, Yang, Iter, Dan, Zhu, Chenguang, and Iyyer, Mohit
Subjects: Computer Science - Computation and Language
Abstract: Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale. This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model, and in-context learning (ICL), in which demonstrations of the task are provided to the model in natural language without any additional training. Recently, Singhal et al. (2022) propose ``instruction prompt tuning'' (IPT), which combines PT with ICL by concatenating a natural language demonstration with learned prompt embeddings. While all of these methods have proven effective on different tasks, how they interact with each other remains unexplored. In this paper, we empirically study when and how in-context examples improve prompt tuning by measuring the effectiveness of ICL, PT, and IPT on five text generation tasks with multiple base language models. We observe that (1) IPT does \emph{not} always outperform PT, and in fact requires the in-context demonstration to be semantically similar to the test input to yield improvements; (2) PT is unstable and exhibits high variance, but combining PT and ICL (into IPT) consistently reduces variance across all five tasks; and (3) prompts learned for a specific source task via PT exhibit positive transfer when paired with in-context examples of a different target task. Our results offer actionable insights on choosing a suitable parameter-efficient adaptation method for a given task.
Published: 2023

10. Generate rather than Retrieve: Large Language Models are Strong Context Generators

Author: Yu, Wenhao, Iter, Dan, Wang, Shuohang, Xu, Yichong, Ju, Mingxuan, Sanyal, Soumya, Zhu, Chenguang, Zeng, Michael, and Jiang, Meng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large amount of world or domain knowledge. A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first retrieves a handful of relevant contextual documents from an external corpus such as Wikipedia and then predicts an answer conditioned on the retrieved documents. In this paper, we present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators. We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer. Furthermore, we propose a novel clustering-based prompting method that selects distinct prompts, resulting in the generated documents that cover different perspectives, leading to better recall over acceptable answers. We conduct extensive experiments on three different knowledge-intensive tasks, including open-domain QA, fact checking, and dialogue system. Notably, GenRead achieves 71.6 and 54.4 exact match scores on TriviaQA and WebQ, significantly outperforming the state-of-the-art retrieve-then-read pipeline DPR-FiD by +4.0 and +3.9, without retrieving any documents from any external knowledge source. Lastly, we demonstrate the model performance can be further improved by combining retrieval and generation. Our code and generated documents can be found at https://github.com/wyu97/GenRead., Comment: Accepted at ICLR 2023 (v3, add code and implementation details)
Published: 2022

11. Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference

Author: Held, William, Iter, Dan, and Jurafsky, Dan
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Performing event and entity coreference resolution across documents vastly increases the number of candidate mentions, making it intractable to do the full $n^2$ pairwise comparisons. Existing approaches simplify by considering coreference only within document clusters, but this fails to handle inter-cluster coreference, common in many applications. As a result cross-document coreference algorithms are rarely applied to downstream tasks. We draw on an insight from discourse coherence theory: potential coreferences are constrained by the reader's discourse focus. We model the entities/events in a reader's focus as a neighborhood within a learned latent embedding space which minimizes the distance between mentions and the centroids of their gold coreference clusters. We then use these neighborhoods to sample only hard negatives to train a fine-grained classifier on mention pairs and their local discourse features. Our approach achieves state-of-the-art results for both events and entities on the ECB+, Gun Violence, Football Coreference, and Cross-Domain Cross-Document Coreference corpora. Furthermore, training on multiple corpora improves average performance across all datasets by 17.2 F1 points, leading to a robust coreference resolution model for use in downstream tasks where link distribution is unknown., Comment: 9 pages, 8 figures, To be published in the 2021 Main Conference on Empirical Methods in Natural Language Processing
Published: 2021
Full Text: View/download PDF

12. The Trade-offs of Domain Adaptation for Neural Language Models

Author: Grangier, David and Iter, Dan
Subjects: Computer Science - Computation and Language
Abstract: This work connects language model adaptation with concepts of machine learning theory. We consider a training setup with a large out-of-domain set and a small in-domain set. We derive how the benefit of training a model on either set depends on the size of the sets and the distance between their underlying distributions. We analyze how out-of-domain pre-training before in-domain fine-tuning achieves better generalization than either solution independently. Finally, we present how adaptation techniques based on data selection, such as importance sampling, intelligent data selection and influence functions, can be presented in a common framework which highlights their similarity and also their subtle differences., Comment: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Published: 2021

13. On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation

Author: Iter, Dan and Grangier, David
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning. Data selection improves target domain generalization by training further on pretraining data identified by relying on a small sample of target domain data. This work examines the benefit of data selection for language modeling and machine translation. Our experiments assess the complementarity of selection with fine tuning and result in practical recommendations: (i) selected data must be similar to the fine-tuning domain but not so much as to erode the complementary effect of fine-tuning; (ii) there is a trade-off between selecting little data for fast but limited progress or much data for slow but long lasting progress; (iii) data selection can be applied early during pretraining, with performance gains comparable to long pretraining session; (iv) data selection from domain classifiers is often more effective than the popular contrastive data selection method.
Published: 2021

14. Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

Author: Iter, Dan, Guu, Kelvin, Lansing, Larry, and Jurafsky, Dan
Subjects: Computer Science - Computation and Language
Abstract: Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations. We propose CONPONO, an inter-sentence objective for pretraining language models that models discourse coherence and the distance between sentences. Given an anchor sentence, our model is trained to predict the text k sentences away using a sampled-softmax objective where the candidates consist of neighboring sentences and sentences randomly sampled from the corpus. On the discourse representation benchmark DiscoEval, our model improves over the previous state-of-the-art by up to 13% and on average 4% absolute across 7 tasks. Our model is the same size as BERT-Base, but outperforms the much larger BERT- Large model and other more recent approaches that incorporate discourse. We also show that CONPONO yields gains of 2%-6% absolute even for tasks that do not explicitly evaluate discourse: textual entailment (RTE), common sense reasoning (COPA) and reading comprehension (ReCoRD)., Comment: AC2020
Published: 2020

15. Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Author: Varma, Paroma, He, Bryan, Iter, Dan, Xu, Peng, Yu, Rose, De Sa, Christopher, and Ré, Christopher
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set. In particular, they fail to model latent subsets in the training data in which the supervision sources perform differently than on average. We present Socratic learning, a paradigm that uses feedback from a corresponding discriminative model to automatically identify these subsets and augments the structure of the generative model accordingly. Experimentally, we show that without any ground truth labels, the augmented generative model reduces error by up to 56.06% for a relation extraction task compared to a state-of-the-art weak supervision technique that utilizes generative models., Comment: 4 figures; 18 pages
Published: 2016

16. Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

Author: Hadjis, Stefan, Zhang, Ce, Mitliagkas, Ioannis, Iter, Dan, and Ré, Christopher
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Learning, I.2.6
Abstract: We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We first focus on the single-node setting and show that by using standard batching and data-parallel techniques, throughput can be improved by at least 5.5x over state-of-the-art systems on CPUs. This ensures an end-to-end training speed directly proportional to the throughput of a device regardless of its underlying hardware, allowing each node in the cluster to be treated as a black box. Our second contribution is a theoretical and empirical study of the tradeoffs affecting end-to-end training time in a multiple-device setting. We identify the degree of asynchronous parallelization as a key factor affecting both hardware and statistical efficiency. We see that asynchrony can be viewed as introducing a momentum term. Our results imply that tuning momentum is critical in asynchronous parallel configurations, and suggest that published results that have not been fully tuned might report suboptimal performance for some configurations. For our third contribution, we use our novel understanding of the interaction between system and optimization dynamics to provide an efficient hyperparameter optimizer. Our optimizer involves a predictive model for the total time to convergence and selects an allocation of resources to minimize that time. We demonstrate that the most popular distributed deep learning systems fall within our tradeoff space, but do not optimize within the space. By doing this optimization, our prototype runs 1.9x to 12x faster than the fastest state-of-the-art systems.
Published: 2016

17. G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment

Author: Liu, Yang, primary, Iter, Dan, additional, Xu, Yichong, additional, Wang, Shuohang, additional, Xu, Ruochen, additional, and Zhu, Chenguang, additional
Published: 2023
Full Text: View/download PDF

18. The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

Author: Ouyang, Siru, primary, Wang, Shuohang, additional, Liu, Yang, additional, Zhong, Ming, additional, Jiao, Yizhu, additional, Iter, Dan, additional, Pryzant, Reid, additional, Zhu, Chenguang, additional, Ji, Heng, additional, and Han, Jiawei, additional
Published: 2023
Full Text: View/download PDF

19. Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Author: Zhang, Zhihan, primary, Wang, Shuohang, additional, Yu, Wenhao, additional, Xu, Yichong, additional, Iter, Dan, additional, Zeng, Qingkai, additional, Liu, Yang, additional, Zhu, Chenguang, additional, and Jiang, Meng, additional
Published: 2023
Full Text: View/download PDF

20. InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

Author: Xu, Yichong, primary, Xu, Ruochen, additional, Iter, Dan, additional, Liu, Yang, additional, Wang, Shuohang, additional, Zhu, Chenguang, additional, and Zeng, Michael, additional
Published: 2023
Full Text: View/download PDF

21. In-Context Demonstration Selection with Cross Entropy Difference

Author: Iter, Dan, primary, Pryzant, Reid, additional, Xu, Ruochen, additional, Wang, Shuohang, additional, Liu, Yang, additional, Xu, Yichong, additional, and Zhu, Chenguang, additional
Published: 2023
Full Text: View/download PDF

22. LMGQS: A Large-scale Dataset for Query-focused Summarization

Author: Xu, Ruochen, primary, Wang, Song, additional, Liu, Yang, additional, Wang, Shuohang, additional, Xu, Yichong, additional, Iter, Dan, additional, He, Pengcheng, additional, Zhu, Chenguang, additional, and Zeng, Michael, additional
Published: 2023
Full Text: View/download PDF

23. Automatic Prompt Optimization with “Gradient Descent” and Beam Search

Author: Pryzant, Reid, primary, Iter, Dan, additional, Li, Jerry, additional, Lee, Yin, additional, Zhu, Chenguang, additional, and Zeng, Michael, additional
Published: 2023
Full Text: View/download PDF

24. The Trade-offs of Domain Adaptation for Neural Language Models

Author: Grangier, David, primary and Iter, Dan, additional
Published: 2022
Full Text: View/download PDF

25. Automatic Identification and Presentation of Twitter Content for Planned Events

Author: Becker, Hila, primary, Chen, Feiyang, additional, Iter, Dan, additional, Naaman, Mor, additional, and Gravano, Luis, additional
Published: 2021
Full Text: View/download PDF

26. Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference

Author: Held, William, primary, Iter, Dan, additional, and Jurafsky, Dan, additional
Published: 2021
Full Text: View/download PDF

27. Entity Attribute Relation Extraction with Attribute-Aware Embeddings

Author: Iter, Dan, primary, Yu, Xiao, additional, and Li, Fangtao, additional
Published: 2020
Full Text: View/download PDF

28. Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

Author: Iter, Dan, primary, Guu, Kelvin, additional, Lansing, Larry, additional, and Jurafsky, Dan, additional
Published: 2020
Full Text: View/download PDF

29. FrameIt: Ontology Discovery for Noisy User-Generated Text

Author: Iter, Dan, primary, Halevy, Alon, additional, and Tan, Wang-Chiew, additional
Published: 2018
Full Text: View/download PDF

30. Automatic Detection of Incoherent Speech for Diagnosing Schizophrenia

Author: Iter, Dan, primary, Yoon, Jong, additional, and Jurafsky, Dan, additional
Published: 2018
Full Text: View/download PDF

31. Flipper

Author: Varma, Paroma, primary, Iter, Dan, additional, De Sa, Christopher, additional, and Ré, Christopher, additional
Published: 2017
Full Text: View/download PDF

32. Identifying content for planned events across social media sites

Author: Becker, Hila, primary, Iter, Dan, additional, Naaman, Mor, additional, and Gravano, Luis, additional
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

32 results on '"Iter, Dan"'

1. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

2. Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

3. The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

4. In-Context Demonstration Selection with Cross Entropy Difference

5. LMGQS: A Large-scale Dataset for Query-focused Summarization

6. InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

7. Automatic Prompt Optimization with 'Gradient Descent' and Beam Search

8. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

9. How Does In-Context Learning Help Prompt Tuning?

10. Generate rather than Retrieve: Large Language Models are Strong Context Generators

11. Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference

12. The Trade-offs of Domain Adaptation for Neural Language Models

13. On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation

14. Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

15. Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

16. Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

17. G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment

18. The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

19. Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

20. InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

21. In-Context Demonstration Selection with Cross Entropy Difference

22. LMGQS: A Large-scale Dataset for Query-focused Summarization

23. Automatic Prompt Optimization with “Gradient Descent” and Beam Search

24. The Trade-offs of Domain Adaptation for Neural Language Models

25. Automatic Identification and Presentation of Twitter Content for Planned Events

26. Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference

27. Entity Attribute Relation Extraction with Attribute-Aware Embeddings

28. Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

29. FrameIt: Ontology Discovery for Noisy User-Generated Text

30. Automatic Detection of Incoherent Speech for Diagnosing Schizophrenia

31. Flipper

32. Identifying content for planned events across social media sites

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

32 results on '"Iter, Dan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources