Search

Your search keyword '"Shan, Ying"' showing total 2,275 results

Search Constraints

Start Over You searched for: Author "Shan, Ying" Remove constraint Author: "Shan, Ying"
2,275 results on '"Shan, Ying"'

Search Results

1. Image Inpainting Models are Effective Tools for Instruction-guided Image Editing

2. Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

3. SEED-Story: Multimodal Long Story Generation with Large Language Model

4. How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

5. EA-VTR: Event-Aware Video-Text Retrieval

6. MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

7. Image Conductor: Precision Control for Interactive Video Synthesis

8. VoCo-LLaMA: Towards Vision Compression with Large Language Models

9. GrootVL: Tree Topology is All You Need in State Space Model

10. PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

11. ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

12. CV-VAE: A Compatible Video VAE for Latent Generative Video Models

13. MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

14. Programmable Motion Generation for Open-Set Motion Control Tasks

15. ToonCrafter: Generative Cartoon Interpolation

16. Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

17. ReVideo: Remake a Video with Motion and Content Control

18. Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

19. SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

20. Learning High-Quality Navigation and Zooming on Omnidirectional Images in Virtual Reality

21. SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

22. SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

23. InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

24. ST-LLM: Large Language Models Are Effective Temporal Learners

25. UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling

26. Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing

27. SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

28. HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback

29. BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

30. DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos

31. Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

32. DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

33. Advances in 3D Generation: A Survey

34. YOLO-World: Real-Time Open-Vocabulary Object Detection

35. RecDCL: Dual Contrastive Learning for Recommendation

36. TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

37. Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

38. Supervised Fine-tuning in turn Improves Visual Foundation Models

39. LLaMA Pro: Progressive LLaMA with Block Expansion

40. VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

41. SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

42. EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning

43. AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-step Cross-attention for Robust Speaker Diarization in the Wild

44. Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion

45. PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

46. AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

47. MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

48. MagicStick: Controllable Video Editing via Control Handle Transformations

49. StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

50. HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

Catalog

Books, media, physical & digital resources