Search

Your search keyword '"Zisserman, A."' showing total 3,966 results

Search Constraints

Start Over You searched for: Author "Zisserman, A." Remove constraint Author: "Zisserman, A."
3,966 results on '"Zisserman, A."'

Search Results

1. ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval

2. Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

3. Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation

4. VoiceVector: Multimodal Enrolment Vectors for Speaker Separation

5. Scaling 4D Representations

6. New keypoint-based approach for recognising British Sign Language (BSL) from sequences

7. 3D Spine Shape Estimation from Single 2D DXA

8. Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark

9. The Sound of Water: Inferring Physical Properties from Pouring Liquids

10. A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos

11. Automated Spinal MRI Labelling from Reports Using a Large Language Model

12. It's Just Another Day: Unique Video Captioning by Discriminative Prompting

13. Character-aware audio-visual subtitling in context

14. The VoxCeleb Speaker Recognition Challenge: A Retrospective

15. 3D-Aware Instance Segmentation and Tracking in Egocentric Videos

16. Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

17. OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos

18. AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description

19. TAPVid-3D: A Benchmark for Tracking Any Point in 3D

20. CountGD: Multi-Modal Open-World Counting

21. Moving Object Segmentation: All You Need is SAM (and Flow)

22. BootsTAP: Bootstrapped Training for Tracking-Any-Point

23. It’s Just Another Day: Unique Video Captioning by Discriminitive Prompting

24. Character-Aware Audio-Visual Subtitling in Context

25. 3D-Aware Instance Segmentation and Tracking in Egocentric Videos

26. AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description

27. Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

28. N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

29. Made to Order: Discovering Monotonic Temporal Changes via Self-supervised Video Ordering

30. Text-Conditioned Resampler For Long Form Video Understanding

31. Appearance-Based Refinement for Object-Centric Motion Segmentation

32. Separating the 'Chirp' from the 'Chat': Self-supervised Visual Grounding of Sound and Language

33. A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

34. Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

35. AutoAD III: The Prequel -- Back to the Pixels

36. Moving Object Segmentation: All You Need Is SAM (and Flow)

37. TIM: A Time Interval Machine for Audio-Visual Action Recognition

38. FlexCap: Describe Anything in Images in Controllable Detail

39. N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

40. A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval

41. BootsTAP: Bootstrapped Training for Tracking-Any-Point

42. Synchformer: Efficient Synchronization from Sparse Cues

43. Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling

44. The Manga Whisperer: Automatically Generating Transcriptions for Comics

45. Amodal Ground Truth and Completion in the Wild

46. Perception Test 2023: A Summary of the First Challenge And Outcome

47. Text-Conditioned Resampler For Long Form Video Understanding

48. Appearance-Based Refinement for Object-Centric Motion Segmentation

49. A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

50. Learning from One Continuous Video Stream

Catalog

Books, media, physical & digital resources