Search

Your search keyword '"Zisserman, A."' showing total 242 results

Search Constraints

Start Over You searched for: Author "Zisserman, A." Remove constraint Author: "Zisserman, A." Database arXiv Remove constraint Database: arXiv
242 results on '"Zisserman, A."'

Search Results

1. ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval

2. Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues

3. Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation

4. VoiceVector: Multimodal Enrolment Vectors for Speaker Separation

5. Scaling 4D Representations

6. New keypoint-based approach for recognising British Sign Language (BSL) from sequences

7. 3D Spine Shape Estimation from Single 2D DXA

8. Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark

9. The Sound of Water: Inferring Physical Properties from Pouring Liquids

10. A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos

11. Automated Spinal MRI Labelling from Reports Using a Large Language Model

12. It's Just Another Day: Unique Video Captioning by Discriminative Prompting

13. Character-aware audio-visual subtitling in context

14. The VoxCeleb Speaker Recognition Challenge: A Retrospective

15. 3D-Aware Instance Segmentation and Tracking in Egocentric Videos

16. Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

17. OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos

18. AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description

19. TAPVid-3D: A Benchmark for Tracking Any Point in 3D

20. CountGD: Multi-Modal Open-World Counting

21. Separating the 'Chirp' from the 'Chat': Self-supervised Visual Grounding of Sound and Language

22. A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

23. Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

24. AutoAD III: The Prequel -- Back to the Pixels

25. Moving Object Segmentation: All You Need Is SAM (and Flow)

26. TIM: A Time Interval Machine for Audio-Visual Action Recognition

27. FlexCap: Describe Anything in Images in Controllable Detail

28. N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

29. A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval

30. BootsTAP: Bootstrapped Training for Tracking-Any-Point

31. Synchformer: Efficient Synchronization from Sparse Cues

32. Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling

33. The Manga Whisperer: Automatically Generating Transcriptions for Comics

34. Amodal Ground Truth and Completion in the Wild

35. Perception Test 2023: A Summary of the First Challenge And Outcome

36. Text-Conditioned Resampler For Long Form Video Understanding

37. Appearance-Based Refinement for Object-Centric Motion Segmentation

38. A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

39. Learning from One Continuous Video Stream

40. Predicting Spine Geometry and Scoliosis from DXA Scans

41. Show from Tell: Audio-Visual Modelling in Clinical Settings

42. AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

43. A General Protocol to Probe Large Vision Models for 3D Physical Understanding

44. GestSync: Determining who is speaking without a talking head

45. The Making and Breaking of Camouflage

46. The Change You Want to See (Now in 3D)

47. Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

48. OxfordVGG Submission to the EGO4D AV Transcription Challenge

49. TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

50. Multi-Modal Classifiers for Open-Vocabulary Object Detection

Catalog

Books, media, physical & digital resources