Search

Your search keyword '"Roth, Dan"' showing total 1,323 results

Search Constraints

Start Over You searched for: Author "Roth, Dan" Remove constraint Author: "Roth, Dan"
1,323 results on '"Roth, Dan"'

Search Results

1. Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations

2. Benchmarking LLM Guardrails in Handling Multilingual Toxicity

3. ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

4. Open Domain Question Answering with Conflicting Contexts

5. GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation

6. Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

7. Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

8. Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

9. MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

10. Knowledge-Aware Reasoning over Multimodal Semi-structured Tables

11. Enhancing Temporal Understanding in LLMs for Semi-structured Tables

12. Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

13. NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

14. On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning

15. Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks

16. Grammar Error Correction in Morphologically Rich Languages: The Case of Russian

17. H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables

18. FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts

19. FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation

20. A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

21. MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

22. Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

23. Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

24. ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models

25. Devil's Advocate: Anticipatory Reflection for LLM Agents

26. BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models

27. BLINK: Multimodal Large Language Models Can See but Not Perceive

28. Fewer Truncations Improve Language Modeling

29. Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval

30. Conceptual and Unbiased Reasoning in Language Models

31. Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering

32. From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification

33. Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering

34. DeAL: Decoding-time Alignment for Large Language Models

35. Code Representation Learning At Scale

36. Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?

37. What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception

38. On the Calibration of Multilingual Question Answering LLMs

39. Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets

40. Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

41. ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks

42. CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion

43. Causal inference with textual data: A quasi-experimental design assessing the association between author metadata and acceptance among ICLR submissions from 2017 to 2022

44. SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation

45. ExpertQA: Expert-Curated Questions and Attributed Answers

46. Building Interpretable and Reliable Open Information Retriever for New Domains Overnight

47. Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning

48. On Regularization and Inference with Label Constraints

49. The Integer Linear Programming Inference Cookbook

50. Towards Open-Domain Topic Classification

Catalog

Books, media, physical & digital resources