Search

Your search keyword '"Computer Science - Multimedia"' showing total 17,010 results

Search Constraints

Start Over You searched for: Descriptor "Computer Science - Multimedia" Remove constraint Descriptor: "Computer Science - Multimedia"
17,010 results on '"Computer Science - Multimedia"'

Search Results

1. LAR-IQA: A Lightweight, Accurate, and Robust No-Reference Image Quality Assessment Model

2. MultiMediate'24: Multi-Domain Engagement Estimation

3. Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing

4. WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

5. Video to Music Moment Retrieval

6. MSLIQA: Enhancing Learning Representations for Image Quality Assessment through Multi-Scale Learning

7. See or Guess: Counterfactually Regularized Image Captioning

8. Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

9. A Simple Baseline with Single-encoder for Referring Image Segmentation

10. SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge

11. Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

12. Sec2Sec Co-attention for Video-Based Apparent Affective Prediction

13. Alfie: Democratising RGBA Image Generation With No $$$

14. LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming

15. Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

16. Digital Fingerprinting on Multimedia: A Survey

17. HABD: a houma alliance book ancient handwritten character recognition database

18. SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding

19. PPVF: An Efficient Privacy-Preserving Online Video Fetching Framework with Correlated Differential Privacy

20. StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech

21. Localization of Synthetic Manipulations in Western Blot Images

22. Analyzing the Impact of Splicing Artifacts in Partially Fake Speech Signals

23. SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description

24. An Open, Cross-Platform, Web-Based Metaverse Using WebXR and A-Frame

25. Riemann-based Multi-scale Attention Reasoning Network for Text-3D Retrieval

26. SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

27. VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints

28. Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition

29. Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

30. DreamCinema: Cinematic Transfer with Free Camera and 3D Character

31. Exploring the Role of Audio in Multimodal Misinformation Detection

32. MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

33. Cap2Sum: Learning to Summarize Videos by Generating Captions

34. MultiMed: Massively Multimodal and Multitask Medical Understanding

35. MCDubber: Multimodal Context-Aware Expressive Video Dubbing

36. Let Community Rules Be Reflected in Online Content Moderation

37. AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

38. Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound

39. SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

40. Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation

41. Webcam-based Pupil Diameter Prediction Benefits from Upscaling

42. Perceptual Depth Quality Assessment of Stereoscopic Omnidirectional Images

43. Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement

44. Harmonizing Attention: Training-free Texture-aware Geometry Transfer

45. Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

46. SpeechEE: A Novel Benchmark for Speech Event Extraction

47. Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition

48. FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model

49. ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement

50. Scaling up Multimodal Pre-training for Sign Language Understanding

Catalog

Books, media, physical & digital resources