Search

Your search keyword '"Lin, Xudong"' showing total 569 results

Search Constraints

Start Over You searched for: Author "Lin, Xudong" Remove constraint Author: "Lin, Xudong"
569 results on '"Lin, Xudong"'

Search Results

1. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

2. Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

3. BLINK: Multimodal Large Language Models Can See but Not Perceive

4. SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

5. Video Summarization: Towards Entity-Aware Captions

6. InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

7. Non-Sequential Graph Script Induction via Multimedia Grounding

8. Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

9. Supervised Masked Knowledge Distillation for Few-Shot Transformers

10. In Defense of Structural Symbolic Representation for Video Event-Relation Prediction

11. TempCLR: Temporal Alignment Representation with Contrastive Learning

12. Video Event Extraction via Tracking Visual States of Arguments

13. Weakly-Supervised Temporal Article Grounding

14. Learning to Decompose Visual Features with Latent Textual Prompts

15. Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities

16. Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

17. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

18. Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval

19. All in One: Exploring Unified Video-Language Pre-training

20. Learning To Recognize Procedural Activities with Distant Supervision

21. CLIP-Event: Connecting Text and Images with Event Structures

24. MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

25. Video-Text Pre-training with Learned Regions

26. Object-aware Video-language Pre-training for Retrieval

28. Joint Multimedia Event Extraction from Video and Article

36. Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos

37. VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

40. Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

41. Towards Train-Test Consistency for Semi-supervised Temporal Action Localization

42. Context-Gated Convolution

Catalog

Books, media, physical & digital resources