Back to Search
Start Over
Vision-language navigation: a survey and taxonomy.
- Source :
-
Neural Computing & Applications . Mar2024, Vol. 36 Issue 7, p3291-3316. 26p. - Publication Year :
- 2024
-
Abstract
- Vision-language navigation (VLN) tasks require an agent to follow language instructions from a human guide to navigate in previously unseen environments using visual observations. This challenging field, involving problems in natural language processing (NLP), computer vision (CV), robotics, etc., has spawned many excellent works focusing on various VLN tasks. This paper provides a comprehensive survey and an insightful taxonomy of these tasks based on the different characteristics of language instructions. Depending on whether navigation instructions are given once or multiple times, we divide the tasks into two categories, i.e., single-turn and multiturn tasks. We subdivide single-turn tasks into goal-oriented and route-oriented tasks based on whether the instructions designate a single goal location or specify a sequence of multiple locations. We subdivide multiturn tasks into interactive and passive tasks based on whether the agent is allowed to ask questions. These tasks require different agent capabilities and entail various model designs. We identify the progress made on these tasks and examine the limitations of the existing VLN models and task settings. Hopefully, a well-designed taxonomy of the task family enables comparisons among different approaches across papers concerning the same tasks and clarifies the advances made in these tasks. Furthermore, we discuss several open issues in this field and some promising directions for future research, including the incorporation of knowledge into VLN models and transferring them to the real physical world. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 09410643
- Volume :
- 36
- Issue :
- 7
- Database :
- Academic Search Index
- Journal :
- Neural Computing & Applications
- Publication Type :
- Academic Journal
- Accession number :
- 175359171
- Full Text :
- https://doi.org/10.1007/s00521-023-09217-1