168 results on '"Ranjay Krishna"'
Search Results
2. Efficient Inference of Vision Instruction-Following Models with Elastic Cache.
3. SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision.
4. ImageInWords: Unlocking Hyper-Detailed Image Descriptions.
5. Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning.
6. Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps.
7. BLINK: Multimodal Large Language Models Can See but Not Perceive.
8. m &m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks.
9. EVE: Enabling Anyone to Train Robots using Augmented Reality.
10. Iterated Learning Improves Compositionality in Large Vision-Language Models.
11. SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World.
12. Holodeck: Language Guided Generation of 3D Embodied AI Environments.
13. Modeling Collaborator: Enabling Subjective Vision Classification with Minimal Human Effort via LLM Tool-Use.
14. Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models.
15. MIMIC: Masked Image Modeling with Image Correspondences.
16. Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos.
17. Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization.
18. Offline Training of Language Model Agents with Functions as Learnable Weights.
19. Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation.
20. Selective Visual Representations Improve Convergence and Generalization for Embodied AI.
21. Agile Modeling: From Concept to Classifier in Minutes.
22. TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering.
23. @ CREPE: Can Vision-Language Foundation Models Reason Compositionally?
24. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes.
25. AR2-D2: Training a Robot Without a Robot.
26. Scaling Up LLM Reviews for Google Ads Content Moderation.
27. Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models.
28. Language Model Preference Evaluation with Multiple Weak Evaluators.
29. AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation.
30. NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples.
31. ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition.
32. Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model.
33. Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report].
34. Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions.
35. Task Me Anything.
36. RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics.
37. Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models.
38. Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass.
39. Multilingual Diversity Improves Vision-Language Representations.
40. m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks.
41. The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better.
42. Manipulate-Anything: Automating Real-World Robots using Vision-Language Models.
43. THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation.
44. Training Language Model Agents without Modifying Language Models.
45. Explanations Can Reduce Overreliance on AI Systems During Decision-Making.
46. VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building.
47. EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions.
48. EQUI-VOCAL Demonstration: Synthesizing Video Queries from User Interactions.
49. Measuring Compositional Consistency for Video Question Answering.
50. SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.