Search

Your search keyword '"Jiang, Yu-Gang"' showing total 974 results

Search Constraints

Start Over You searched for: Author "Jiang, Yu-Gang" Remove constraint Author: "Jiang, Yu-Gang"
974 results on '"Jiang, Yu-Gang"'

Search Results

1. CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

2. Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

3. DiffPatch: Generating Customizable Adversarial Patches using Diffusion Model

4. SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

5. LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

6. ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

7. Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

8. SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition

9. REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

10. Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning

11. Retrieval Augmented Recipe Generation

12. Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization

13. IDEATOR: Jailbreaking Large Vision-Language Models Using Themselves

14. BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

15. Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

16. Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

17. UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation

18. EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models

19. EventHallusion: Diagnosing Event Hallucinations in Video LLMs

20. DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation

21. FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

22. GenRec: Unifying Video Generation and Recognition with Diffusion Models

23. Decoder Pre-Training with only Text for Scene Text Recognition

24. ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack

25. Navigating Weight Prediction with Diet Diary

26. EnJa: Ensemble Jailbreak on Large Language Models

27. AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning

28. Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

29. Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

30. Out of Length Text Recognition with Sub-String Matching

31. Infinite Motion: Extended Motion Generation via Long Text Instructions

32. Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

33. PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer

34. MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

35. A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

36. Extracting Training Data from Unconditional Diffusion Models

37. V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

38. OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

39. AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

40. Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

41. AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

42. DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

43. AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

44. MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

45. White-box Multimodal Jailbreaks Against Large Vision-Language Models

46. ModelLock: Locking Your Model With a Spell

47. Brain3D: Generating 3D Objects from fMRI

48. Adaptive Rentention & Correction for Continual Learning

49. FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

50. Zero-shot High-fidelity and Pose-controllable Character Animation

Catalog

Books, media, physical & digital resources