Search

Your search keyword '"Xiong, Caiming"' showing total 883 results

Search Constraints

Start Over You searched for: Author "Xiong, Caiming" Remove constraint Author: "Xiong, Caiming"
883 results on '"Xiong, Caiming"'

Search Results

1. SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs

2. CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval

3. Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

4. BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

5. CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models

6. Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

7. CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

8. JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking

9. Asynchronous Tool Usage for Real-Time Agents

10. PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

11. Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency

12. xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

13. Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage

14. XForecast: Evaluating Natural Language Explanations for Time Series Forecasting

15. Trust but Verify: Programmatic VLM Evaluation in the Wild

16. Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

17. GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation

18. P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

19. Automatic Curriculum Expert Iteration for Reliable LLM Reasoning

20. MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

21. ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

22. FaithEval: Can Your Language Model Stay Faithful to Context, Even If 'The Moon is Made of Marshmallows'

23. Direct Judgement Preference Optimization

24. xLAM: A Family of Large Action Models to Empower AI Agent Systems

25. xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

26. xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

27. Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

28. Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research

29. Personalized Multi-task Training for Recommender System

30. ThinK: Thinner Key Cache by Query-Driven Pruning

31. Shared Imagination: LLMs Hallucinate Alike

32. Consent in Crisis: The Rapid Decline of the AI Data Commons

33. Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

34. Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

35. APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

36. INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness

37. MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

38. MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

39. UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting

40. RLHF Workflow: From Reward Modeling to Online RLHF

41. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

42. What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases

43. How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library

45. FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

46. AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

47. AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

48. Text2Data: Low-Resource Data Generation with Textual Control

49. Unified Training of Universal Time Series Forecasting Transformers

50. Causal Layering via Conditional Entropy

Catalog

Books, media, physical & digital resources