Search

Your search keyword '"Arnab, Anurag"' showing total 122 results

Search Constraints

Start Over You searched for: Author "Arnab, Anurag" Remove constraint Author: "Arnab, Anurag"
122 results on '"Arnab, Anurag"'

Search Results

1. Mixture of Nested Experts: Adaptive Processing of Visual Tokens

2. Planted: a dataset for planted forest identification from multi-satellite time series

3. Streaming Dense Video Captioning

4. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

5. Time-, Memory- and Parameter-Efficient Visual Adaptation

6. Pixel Aligned Language Models

7. Video Summarization: Towards Entity-Aware Captions

8. UnLoc: A Unified Framework for Video Localization Tasks

9. Does Visual Pretraining Help End-to-End Reasoning?

10. Dense Video Object Captioning from Disjoint Supervision

11. How can objects help action recognition?

12. Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

13. PaLI-X: On Scaling up a Multilingual Vision and Language Model

14. End-to-End Spatio-Temporal Action Localisation with Video Transformers

15. VicTR: Video-conditioned Text Representations for Activity Recognition

16. CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

17. Scaling Vision Transformers to 22 Billion Parameters

18. Adaptive Computation with Elastic Input Sequence

19. Audiovisual Masked Autoencoders

20. Token Turing Machines

21. Dynamic Graph Message Passing Networks for Visual Recognition

22. Beyond Transfer Learning: Co-finetuning for Action Localisation

23. M&M Mix: A Multimodal Multiview Transformer Ensemble

24. Simple Open-Vocabulary Object Detection with Vision Transformers

25. Learning with Neighbor Consistency for Noisy Labels

26. End-to-end Generative Pretraining for Multimodal Video Captioning

27. Multiview Transformers for Video Recognition

28. PolyViT: Co-training Vision Transformers on Images, Videos and Audio

29. The Efficiency Misnomer

30. SCENIC: A JAX Library for Computer Vision Research and Beyond

31. Compressive Visual Representations

32. Attention Bottlenecks for Multimodal Fusion

33. TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

35. ViViT: A Video Vision Transformer

36. Unified Graph Structured Models for Video Understanding

37. Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos

38. Dual Graph Convolutional Network for Semantic Segmentation

39. Dynamic Graph Message Passing Networks

40. Exploiting temporal context for 3D human pose estimation in the wild

41. Meta Learning Deep Visual Words for Fast Video Object Segmentation

42. Simple Open-Vocabulary Object Detection

43. Pixel-level scene understanding with deep structured models

44. Weakly- and Semi-Supervised Panoptic Segmentation

45. On the Robustness of Semantic Segmentation Models to Adversarial Attacks

46. Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

47. Holistic, Instance-Level Human Parsing

48. Pixelwise Instance Segmentation with a Dynamically Instantiated Network

49. A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials

50. Bottom-up Instance Segmentation using Deep Higher-Order CRFs

Catalog

Books, media, physical & digital resources