Author: "Thakur, Nandan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Thakur, Nandan"' showing total 22 results

Start Over Author "Thakur, Nandan"

22 results on '"Thakur, Nandan"'

1. MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

Author: Thakur, Nandan, Kazi, Suleman, Luo, Ge, Lin, Jimmy, and Ahmad, Amin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Traditional Retrieval-Augmented Generation (RAG) benchmarks rely on different heuristic-based metrics for evaluation, but these require human preferences as ground truth for reference. In contrast, arena-based benchmarks, where two models compete each other, require an expensive Large Language Model (LLM) as a judge for a reliable evaluation. We present an easy and efficient technique to get the best of both worlds. The idea is to train a learning to rank model as a "surrogate" judge using RAG-based evaluation heuristics as input, to produce a synthetic arena-based leaderboard. Using this idea, We develop MIRAGE-Bench, a standardized arena-based multilingual RAG benchmark for 18 diverse languages on Wikipedia. The benchmark is constructed using MIRACL, a retrieval dataset, and extended for multilingual generation evaluation. MIRAGE-Bench evaluates RAG extensively coupling both heuristic features and LLM as a judge evaluator. In our work, we benchmark 19 diverse multilingual-focused LLMs, and achieve a high correlation (Kendall Tau ($\tau$) = 0.909) using our surrogate judge learned using heuristic features with pairwise evaluations and between GPT-4o as a teacher on the MIRAGE-Bench leaderboard using the Bradley-Terry framework. We observe proprietary and large open-source LLMs currently dominate in multilingual RAG. MIRAGE-Bench is available at: https://github.com/vectara/mirage-bench.
Published: 2024

2. Systematic Evaluation of Neural Retrieval Models on the Touch\'e 2020 Argument Retrieval Subset of BEIR

Author: Thakur, Nandan, Bonifacio, Luiz, Fröbe, Maik, Bondarenko, Alexander, Kamalloo, Ehsan, Potthast, Martin, Hagen, Matthias, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval
Abstract: The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark -- a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touch\'e 2020, an argument retrieval task, neural retrieval models are considerably less effective than BM25. Still, so far, no further investigation has been conducted on what makes argument retrieval so "special". To more deeply analyze the respective potential limits of neural retrieval models, we run a reproducibility study on the Touch\'e 2020 data. In our study, we focus on two experiments: (i) a black-box evaluation (i.e., no model retraining), incorporating a theoretical exploration using retrieval axioms, and (ii) a data denoising evaluation involving post-hoc relevance judgments. Our black-box evaluation reveals an inherent bias of neural models towards retrieving short passages from the Touch\'e 2020 data, and we also find that quite a few of the neural models' results are unjudged in the Touch\'e 2020 data. As many of the short Touch\'e passages are not argumentative and thus non-relevant per se, and as the missing judgments complicate fair comparison, we denoise the Touch\'e 2020 data by excluding very short passages (less than 20 words) and by augmenting the unjudged data with post-hoc judgments following the Touch\'e guidelines. On the denoised data, the effectiveness of the neural models improves by up to 0.52 in nDCG@10, but BM25 is still more effective. Our code and the augmented Touch\'e 2020 dataset are available at \url{https://github.com/castorini/touche-error-analysis}., Comment: SIGIR 2024 (Resource & Reproducibility Track)
Published: 2024
Full Text: View/download PDF

3. Ragnar\'ok: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Author: Pradeep, Ronak, Thakur, Nandan, Sharifymoghaddam, Sahel, Zhang, Eric, Nguyen, Ryan, Campos, Daniel, Craswell, Nick, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Did you try out the new Bing Search? Or maybe you fiddled around with Google AI~Overviews? These might sound familiar because the modern-day search stack has recently evolved to include retrieval-augmented generation (RAG) systems. They allow searching and incorporating real-time data into large language models (LLMs) to provide a well-informed, attributed, concise summary in contrast to the traditional search paradigm that relies on displaying a ranked list of documents. Therefore, given these recent advancements, it is crucial to have an arena to build, test, visualize, and systematically evaluate RAG-based search systems. With this in mind, we propose the TREC 2024 RAG Track to foster innovation in evaluating RAG systems. In our work, we lay out the steps we've made towards making this track a reality -- we describe the details of our reusable framework, Ragnar\"ok, explain the curation of the new MS MARCO V2.1 collection choice, release the development topics for the track, and standardize the I/O definitions which assist the end user. Next, using Ragnar\"ok, we identify and provide key industrial baselines such as OpenAI's GPT-4o or Cohere's Command R+. Further, we introduce a web-based user interface for an interactive arena allowing benchmarking pairwise RAG systems by crowdsourcing. We open-source our Ragnar\"ok framework and baselines to achieve a unified standard for future RAG systems.
Published: 2024

4. UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor

Author: Upadhyay, Shivani, Pradeep, Ronak, Thakur, Nandan, Craswell, Nick, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval
Abstract: Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems. Conventionally, these judgments are made by human assessors, rendering this process expensive and laborious. A recent study by Thomas et al. from Microsoft Bing suggested that large language models (LLMs) can accurately perform the relevance assessment task and provide human-quality judgments, but unfortunately their study did not yield any reusable software artifacts. Our work presents UMBRELA (a recursive acronym that stands for UMbrela is the Bing RELevance Assessor), an open-source toolkit that reproduces the results of Thomas et al. using OpenAI's GPT-4o model and adds more nuance to the original paper. Across Deep Learning Tracks from TREC 2019 to 2023, we find that LLM-derived relevance judgments correlate highly with rankings generated by effective multi-stage retrieval systems. Our toolkit is designed to be easily extensible and can be integrated into existing multi-stage retrieval and evaluation pipelines, offering researchers a valuable resource for studying retrieval evaluation methodologies. UMBRELA will be used in the TREC 2024 RAG Track to aid in relevance assessments, and we envision our toolkit becoming a foundation for further innovation in the field. UMBRELA is available at https://github.com/castorini/umbrela., Comment: 5 pages, 3 figures
Published: 2024

5. NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

Author: Thakur, Nandan, Bonifacio, Luiz, Zhang, Xinyu, Ogundepo, Odunayo, Kamalloo, Ehsan, Alfonso-Hermelo, David, Li, Xiaoguang, Liu, Qun, Chen, Boxing, Rezagholizadeh, Mehdi, and Lin, Jimmy
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Retrieval-augmented generation (RAG) grounds large language model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations. However, prior works lack a comprehensive evaluation of different language families, making it challenging to evaluate LLM robustness against errors in external retrieved knowledge. To overcome this, we establish NoMIRACL, a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages. NoMIRACL includes both a non-relevant and a relevant subset. Queries in the non-relevant subset contain passages judged as non-relevant, whereas queries in the relevant subset include at least a single judged relevant passage. We measure LLM robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset. In our work, we measure robustness for a wide variety of multilingual-focused LLMs and observe that most of the models struggle to balance the two capacities. Models such as LLAMA-2, Orca-2, and FLAN-T5 observe more than an 88% hallucination rate on the non-relevant subset, whereas, Mistral overall hallucinates less, but can achieve up to a 74.9% error rate on the relevant subset. Overall, GPT-4 is observed to provide the best tradeoff on both subsets, highlighting future work necessary to improve LLM robustness.
Published: 2023

6. Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

Author: Thakur, Nandan, Ni, Jianmo, Ábrego, Gustavo Hernández, Wieting, John, Lin, Jimmy, and Cer, Daniel
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: There has been limited success for dense retrieval models in multilingual retrieval, due to uneven and scarce training data available across multiple languages. Synthetic training data generation is promising (e.g., InPars or Promptagator), but has been investigated only for English. Therefore, to study model capabilities across both cross-lingual and monolingual retrieval tasks, we develop SWIM-IR, a synthetic retrieval training dataset containing 33 (high to very-low resource) languages for fine-tuning multilingual dense retrievers without requiring any human supervision. To construct SWIM-IR, we propose SAP (summarize-then-ask prompting), where the large language model (LLM) generates a textual summary prior to the query generation step. SAP assists the LLM in generating informative queries in the target language. Using SWIM-IR, we explore synthetic fine-tuning of multilingual dense retrieval models and evaluate them robustly on three retrieval benchmarks: XOR-Retrieve (cross-lingual), MIRACL (monolingual) and XTREME-UP (cross-lingual). Our models, called SWIM-X, are competitive with human-supervised dense retrieval models, e.g., mContriever-X, finding that SWIM-IR can cheaply substitute for expensive human-labeled retrieval training data. SWIM-IR dataset and SWIM-X models are available at https://github.com/google-research-datasets/SWIM-IR., Comment: Accepted at NAACL 2024. Data released at https://github.com/google-research-datasets/swim-ir
Published: 2023

7. HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

Author: Kamalloo, Ehsan, Jafari, Aref, Zhang, Xinyu, Thakur, Nandan, and Lin, Jimmy
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: The rise of large language models (LLMs) had a transformative impact on search, ushering in a new era of search engines that are capable of generating search results in natural language text, imbued with citations for supporting sources. Building generative information-seeking models demands openly accessible datasets, which currently remain lacking. In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations. Unlike recent efforts that focus on human evaluation of black-box proprietary search engines, we built our dataset atop the English subset of MIRACL, a publicly available information retrieval dataset. HAGRID is constructed based on human and LLM collaboration. We first automatically collect attributed explanations that follow an in-context citation style using an LLM, i.e. GPT-3.5. Next, we ask human annotators to evaluate the LLM explanations based on two criteria: informativeness and attributability. HAGRID serves as a catalyst for the development of information-seeking models with better attribution capabilities., Comment: Data released at https://github.com/project-miracl/hagrid
Published: 2023

8. SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Author: Thakur, Nandan, Wang, Kexin, Gurevych, Iryna, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here at https://github.com/thakur-nandan/sprint., Comment: Accepted at SIGIR 2023 (Resource Track)
Published: 2023

9. Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

Author: Kamalloo, Ehsan, Thakur, Nandan, Lassance, Carlos, Ma, Xueguang, Yang, Jheng-Hong, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations. In recent years, we have witnessed the growing popularity of a representation learning approach to building retrieval models, typically using pretrained transformers in a supervised setting. This naturally begs the question: How effective are these models when presented with queries and documents that differ from the training data? Examples include searching in different domains (e.g., medical or legal text) and with different types of queries (e.g., keywords vs. well-formed questions). While BEIR was designed to answer these questions, our work addresses two shortcomings that prevent the benchmark from achieving its full potential: First, the sophistication of modern neural methods and the complexity of current software infrastructure create barriers to entry for newcomers. To this end, we provide reproducible reference implementations that cover the two main classes of approaches: learned dense and sparse models. Second, there does not exist a single authoritative nexus for reporting the effectiveness of different models on BEIR, which has led to difficulty in comparing different methods. To remedy this, we present an official self-service BEIR leaderboard that provides fair and consistent comparisons of retrieval models. By addressing both shortcomings, our work facilitates future explorations in a range of interesting research questions that BEIR enables.
Published: 2023

10. Evaluating Embedding APIs for Information Retrieval

Author: Kamalloo, Ehsan, Zhang, Xinyu, Ogundepo, Odunayo, Thakur, Nandan, Alfonso-Hermelo, David, Rezagholizadeh, Mehdi, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs. One particular type, suitable for dense retrieval, is a semantic embedding service that builds vector representations of input text. With a growing number of publicly available APIs, our goal in this paper is to analyze existing offerings in realistic retrieval scenarios, to assist practitioners and researchers in finding suitable services according to their needs. Specifically, we investigate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. For this purpose, we evaluate these services on two standard benchmarks, BEIR and MIRACL. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English, in contrast to the standard practice of employing them as first-stage retrievers. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost. We hope our work lays the groundwork for evaluating semantic embedding APIs that are critical in search and more broadly, for information access., Comment: ACL 2023 Industry Track
Published: 2023

11. Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

Author: Lin, Jimmy, Alfonso-Hermelo, David, Jeronymo, Vitor, Kamalloo, Ehsan, Lassance, Carlos, Nogueira, Rodrigo, Ogundepo, Odunayo, Rezagholizadeh, Mehdi, Thakur, Nandan, Yang, Jheng-Hong, and Zhang, Xinyu
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another. However, the rapid pace of progress has led to a confusing panoply of methods and reproducibility has lagged behind the state of the art. In this context, our work makes two important contributions: First, we provide a conceptual framework for organizing different approaches to cross-lingual retrieval using multi-stage architectures for mono-lingual retrieval as a scaffold. Second, we implement simple yet effective reproducible baselines in the Anserini and Pyserini IR toolkits for test collections from the TREC 2022 NeuCLIR Track, in Persian, Russian, and Chinese. Our efforts are built on a collaboration of the two teams that submitted the most effective runs to the TREC evaluation. These contributions provide a firm foundation for future advances.
Published: 2023

12. Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

Author: Zhang, Xinyu, Thakur, Nandan, Ogundepo, Odunayo, Kamalloo, Ehsan, Alfonso-Hermelo, David, Li, Xiaoguang, Liu, Qun, Rezagholizadeh, Mehdi, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world. These languages have diverse typologies, originate from many different language families, and are associated with varying amounts of available resources -- including what researchers typically characterize as high-resource as well as low-resource languages. Our dataset is designed to support the creation and evaluation of models for monolingual retrieval, where the queries and the corpora are in the same language. In total, we have gathered over 700k high-quality relevance judgments for around 77k queries over Wikipedia in these 18 languages, where all assessments have been performed by native speakers hired by our team. Our goal is to spur research that will improve retrieval across a continuum of languages, thus enhancing information access capabilities for diverse populations around the world, particularly those that have been traditionally underserved. This overview paper describes the dataset and baselines that we share with the community. The MIRACL website is live at http://miracl.ai/.
Published: 2022

13. Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval

Author: Thakur, Nandan, Reimers, Nils, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Dense retrieval overcome the lexical gap and has shown great success in ad-hoc information retrieval (IR). Despite their success, dense retrievers are expensive to serve across practical use cases. For use cases requiring to search from millions of documents, the dense index becomes bulky and requires high memory usage for storing the index. More recently, learning-to-hash (LTH) techniques, for e.g., BPR and JPQ, produce binary document vectors, thereby reducing the memory requirement to efficiently store the dense index. LTH techniques are supervised and finetune the retriever using a ranking loss. They outperform their counterparts, i.e., traditional out-of-the-box vector compression techniques such as PCA or PQ. A missing piece from prior work is that existing techniques have been evaluated only in-domain, i.e., on a single dataset such as MS MARCO. In our work, we evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever while maintaining efficiency at inference. Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10 on the BEIR benchmark. To solve this limitation, in our work, we propose an easy yet effective solution of injecting domain adaptation with existing supervised LTH techniques. We experiment with two well-known unsupervised domain adaptation techniques: GenQ and GPL. Our domain adaptation injection technique can improve the downstream zero-shot retrieval effectiveness for both BPR and JPQ variants of the TAS-B model by on average 11.5% and 8.2% nDCG@10 while both maintaining 32$\times$ memory efficiency and 14$\times$ and 2$\times$ speedup respectively in CPU retrieval latency on BEIR. All our code, models, and data are publicly available at https://github.com/thakur-nandan/income., Comment: Accepted at ReNeuIR 2023 Workshop
Published: 2022

14. GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

Author: Wang, Kexin, Thakur, Nandan, Reimers, Nils, and Gurevych, Iryna
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrieval approaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 9.3 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of domain adaptation for retrieval tasks, where only three could yield improved results. The best approach, TSDAE (Wang et al., 2021) can be combined with GPL, yielding another average improvement of 1.4 points nDCG@10 across the six tasks. The code and the models are available at https://github.com/UKPLab/gpl., Comment: Accepted at NAACL 2022
Published: 2021

15. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Author: Thakur, Nandan, Reimers, Nils, Rücklé, Andreas, Srivastava, Abhishek, and Gurevych, Iryna
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. We hope this framework allows us to better evaluate and understand existing retrieval systems, and contributes to accelerating progress towards better robust and generalizable systems in the future. BEIR is publicly available at https://github.com/UKPLab/beir., Comment: Accepted at NeurIPS 2021 Dataset and Benchmark Track
Published: 2021

16. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

Author: Thakur, Nandan, Reimers, Nils, Daxenberger, Johannes, and Gurevych, Iryna
Subjects: Computer Science - Computation and Language
Abstract: There are two approaches for pairwise sentence scoring: Cross-encoders, which perform full-attention over the input pair, and Bi-encoders, which map each input independently to a dense vector space. While cross-encoders often achieve higher performance, they are too slow for many practical use cases. Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance. We present a simple yet efficient data augmentation strategy called Augmented SBERT, where we use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method. We evaluate our approach on multiple tasks (in-domain) as well as on a domain adaptation task. Augmented SBERT achieves an improvement of up to 6 points for in-domain and of up to 37 points for domain adaptation tasks compared to the original bi-encoder performance., Comment: Accepted at NAACL 2021
Published: 2020

17. SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Author: Thakur, Nandan, primary, Wang, Kexin, additional, Gurevych, Iryna, additional, and Lin, Jimmy, additional
Published: 2023
Full Text: View/download PDF

18. MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages

Author: Zhang, Xinyu, primary, Thakur, Nandan, additional, Ogundepo, Odunayo, additional, Kamalloo, Ehsan, additional, Alfonso-Hermelo, David, additional, Li, Xiaoguang, additional, Liu, Qun, additional, Rezagholizadeh, Mehdi, additional, and Lin, Jimmy, additional
Published: 2023
Full Text: View/download PDF

19. Evaluating Embedding APIs for Information Retrieval

Author: Kamalloo, Ehsan, primary, Zhang, Xinyu, additional, Ogundepo, Odunayo, additional, Thakur, Nandan, additional, Alfonso-hermelo, David, additional, Rezagholizadeh, Mehdi, additional, and Lin, Jimmy, additional
Published: 2023
Full Text: View/download PDF

20. GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

Author: Wang, Kexin, primary, Thakur, Nandan, additional, Reimers, Nils, additional, and Gurevych, Iryna, additional
Published: 2022
Full Text: View/download PDF

21. BWS Argument Similarity Corpus

Author: Thakur, Nandan, Daxenberger, Johannes, and Gurevych, Iryna
Subjects: argument pairs, argument similarity, pairwise regression, cross-topic, 409-05 Interaktive und intelligente Systeme, Bild- und Sprachverarbeitung, Computergraphik und Visualisierung
Published: 2020

22. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

Author: Thakur, Nandan, primary, Reimers, Nils, additional, Daxenberger, Johannes, additional, and Gurevych, Iryna, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

22 results on '"Thakur, Nandan"'

1. MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

2. Systematic Evaluation of Neural Retrieval Models on the Touch\'e 2020 Argument Retrieval Subset of BEIR

3. Ragnar\'ok: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

4. UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor

5. NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

6. Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

7. HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

8. SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

9. Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

10. Evaluating Embedding APIs for Information Retrieval

11. Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

12. Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

13. Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval

14. GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

15. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

16. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

17. SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

18. MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages

19. Evaluating Embedding APIs for Information Retrieval

20. GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

21. BWS Argument Similarity Corpus

22. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

22 results on '"Thakur, Nandan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources