Search

Your search keyword '"Roth, Dan"' showing total 36 results

Search Constraints

Start Over You searched for: Author "Roth, Dan" Remove constraint Author: "Roth, Dan" Publication Year Range This year Remove constraint Publication Year Range: This year
36 results on '"Roth, Dan"'

Search Results

1. Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations

2. Benchmarking LLM Guardrails in Handling Multilingual Toxicity

3. ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

4. Open Domain Question Answering with Conflicting Contexts

5. GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation

6. Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

7. Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

8. Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

9. MAPWise: Evaluating Vision-Language Models for Advanced Map Queries

10. Knowledge-Aware Reasoning over Multimodal Semi-structured Tables

11. Enhancing Temporal Understanding in LLMs for Semi-structured Tables

12. Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

13. NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

14. On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning

15. H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables

16. FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts

17. FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation

18. A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

19. MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

20. Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

21. Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

22. ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models

23. Devil's Advocate: Anticipatory Reflection for LLM Agents

24. BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models

25. BLINK: Multimodal Large Language Models Can See but Not Perceive

26. Fewer Truncations Improve Language Modeling

27. Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval

28. Conceptual and Unbiased Reasoning in Language Models

29. Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering

30. From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification

31. Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering

32. DeAL: Decoding-time Alignment for Large Language Models

33. Code Representation Learning At Scale

34. Causal inference with textual data: A quasi-experimental design assessing the association between author metadata and acceptance among ICLR submissions from 2017 to 2022

35. Disparities in seizure outcomes revealed by large language models

Catalog

Books, media, physical & digital resources