Search

Your search keyword '"Lu Tong"' showing total 75 results

Search Constraints

Start Over You searched for: Author "Lu Tong" Remove constraint Author: "Lu Tong" Database arXiv Remove constraint Database: arXiv
75 results on '"Lu Tong"'

Search Results

1. Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

2. MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

3. CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning

4. EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images

5. MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

6. EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

7. OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

8. VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

9. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

10. Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

11. Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

12. PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

13. MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

14. Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

15. CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

16. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

17. Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

18. Evaluating the effects of high-throughput structural neuroimaging predictors on whole-brain functional connectome outcomes via network-based vector-on-matrix regression

19. Multiple Imputation Method for High-Dimensional Neuroimaging Data

20. Deep Video Restoration for Under-Display Camera

21. Memory-and-Anticipation Transformer for Online Action Understanding

22. FB-BEV: BEV Representation from Forward-Backward View Transformations

23. The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

24. AVSegFormer: Audio-Visual Segmentation with Transformer

25. GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

26. VideoLLM: Modeling Video Sequence with Large Language Models

27. Graph Propagation Transformer for Graph Representation Learning

28. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

29. Network method for voxel-pair-level brain connectivity analysis under spatial-contiguity constraints

30. MRSN: Multi-Relation Support Network for Video Action Detection

31. DDP: Diffusion Model for Dense Visual Prediction

32. Champion Solution for the WSDM2023 Toloka VQA Challenge

33. Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method

34. Restoring Vision in Hazy Weather with Hierarchical Contrastive Learning

35. InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

36. Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022

37. Exploring Detection-based Method For Speaker Diarization @ Ego4D Audio-only Diarization Challenge 2022

38. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

39. A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal

40. On Efficient Reinforcement Learning for Full-length Game of StarCraft II

41. Incremental Few-Shot Semantic Segmentation via Embedding Adaptive-Update and Hyper-class Representation

42. SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

43. Vision Transformer Adapter for Dense Predictions

44. Uncertainty-based Network for Few-shot Image Classification

45. BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

46. BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

47. Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

48. DCAN: Improving Temporal Action Detection via Dual Context Aggregation

49. FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

50. Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution

Catalog

Books, media, physical & digital resources