Search

Your search keyword '"Manocha, Dinesh"' showing total 327 results

Search Constraints

Start Over You searched for: Author "Manocha, Dinesh" Remove constraint Author: "Manocha, Dinesh" Database arXiv Remove constraint Database: arXiv
327 results on '"Manocha, Dinesh"'

Search Results

1. Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

2. MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

3. Do Audio-Language Models Understand Linguistic Variations?

4. DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

5. PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification

6. Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

7. EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

8. ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera

9. MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering

10. Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering

11. AIME: AI System Optimization via Multiple LLM Evaluators

12. Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

13. Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments

14. SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining

15. CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains

16. BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes

17. ImPoster: Text and Frequency Guidance for Subject Driven Action Personalization using Diffusion Models

18. GND: Global Navigation Dataset with Multi-Modal Perception and Multi-Category Traversability in Outdoor Campus Environments

19. ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds

20. VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps

21. 3D-free meets 3D priors: Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance

22. TOPGN: Real-time Transparent Obstacle Detection using Lidar Point Cloud Intensity for Autonomous Robot Navigation

23. TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

24. Improving Zero-Shot ObjectNav with Generative Communication

25. CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

26. Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

27. Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs

28. IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning

29. GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

30. Multi-LLM QA with Embodied Exploration

31. AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

32. MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

33. LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

34. ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

35. Transfer Q Star: Principled Decoding for LLM Alignment

36. EM-GANSim: Real-time and Accurate EM Simulation Using Conditional GANs for 3D Indoor Scenes

37. GAMEOPT+: Improving Fuel Efficiency in Unregulated Heterogeneous Traffic Intersections via Optimal Multi-agent Cooperative Control

38. Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

39. Text Prompting for Multi-Concept Video Customization by Autoregressive Generation

40. Prompt Mixing in Diffusion Models using the Black Scholes Algorithm

41. LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

42. S-EQA: Tackling Situational Queries in Embodied Question Answering

43. TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes

44. 'Don't forget to put the milk back!' Dataset for Enabling Embodied Agents to Detect Anomalous Situations

45. AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

46. PoCo: Point Context Cluster for RGBD Indoor Place Recognition

47. Do Vision-Language Models Understand Compound Nouns?

48. CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP

49. VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models

50. CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

Catalog

Books, media, physical & digital resources