1,327 results
Search Results
2. The U. V. Helava Award – Best Paper Volumes 171-182 (2021).
- Published
- 2022
- Full Text
- View/download PDF
3. Augmented paper maps: Exploring the design space of a mixed reality system
- Author
-
Paelke, Volker and Sester, Monika
- Subjects
- *
MOBILE communication systems , *ELECTRONIC equipment , *MAPS , *HIKING , *GLOBAL Positioning System , *INFORMATION processing , *REAL-time computing - Abstract
Abstract: Paper maps and mobile electronic devices have complementary strengths and shortcomings in outdoor use. In many scenarios, like small craft sailing or cross-country trekking, a complete replacement of maps is neither useful nor desirable. Paper maps are fail-safe, relatively cheap, offer superior resolution and provide large scale overview. In uses like open-water sailing it is therefore mandatory to carry adequate maps/charts. GPS based mobile devices, on the other hand, offer useful features like automatic positioning and plotting, real-time information update and dynamic adaptation to user requirements. While paper maps are now commonly used in combination with mobile GPS devices, there is no meaningful integration between the two, and the combined use leads to a number of interaction problems and potential safety issues. In this paper we explore the design space of augmented paper maps in which maps are augmented with additional functionality through a mobile device to achieve a meaningful integration between device and map that combines their respective strengths. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
4. The U. V. Helava Award – Best Paper Volumes 147-158 (2019).
- Subjects
- *
AWARDS , *BATHYMETRIC maps , *REMOTE sensing - Published
- 2020
- Full Text
- View/download PDF
5. The U. V. Helava Award – Best Paper Volumes 159–170 (2020).
- Author
-
Weng, Qihao
- Published
- 2021
- Full Text
- View/download PDF
6. Theme issue “Papers from Geospatial Week 2015”.
- Author
-
Paparoditis, Nicolas and Dowman, Ian
- Subjects
- *
GEOSPATIAL data , *REMOTE sensing - Published
- 2017
- Full Text
- View/download PDF
7. The U. V. Helava Award – Best Paper Volumes 135-146 (2018).
- Subjects
- *
AWARDS , *JURORS , *FOURTH of July - Published
- 2019
- Full Text
- View/download PDF
8. CodeUNet: Autonomous underwater vehicle real visual enhancement via underwater codebook priors.
- Author
-
Wang, Linling, Xu, Xiaoyan, An, Shunmin, Han, Bing, and Guo, Yi
- Subjects
- *
AUTONOMOUS underwater vehicles , *IMAGE intensifiers , *PRIOR learning , *EVALUATION methodology , *GENERALIZATION - Abstract
The vision enhancement of autonomous underwater vehicle (AUV) has received increasing attention and rapid development in recent years. However, existing methods based on prior knowledge struggle to adapt to all scenarios, while learning-based approaches lack paired datasets from real-world scenes, limiting their enhancement capabilities. Consequently, this severely hampers their generalization and application in AUVs. Besides, the existing deep learning-based methods largely overlook the advantages of prior knowledge-based approaches. To address the aforementioned issues, a novel architecture called CodeUNet is proposed in this paper. Instead of relying on physical scattering models, a real-world scene vision enhancement network based on a codebook prior is considered. First, the VQGAN is pretrained on underwater datasets to obtain a discrete codebook, encapsulating the underwater priors (UPs). The decoder is equipped with a novel feature alignment module that effectively leverages underwater features to generate clean results. Then, the distance between the features and the matches is recalibrated by controllable matching operations, enabling better matching. Extensive experiments demonstrate that CodeUNet outperforms state-of-the-art methods in terms of visual quality and quantitative metrics. The testing results of geometric rotation, SIFT salient point detection, and edge detection applications are shown in this paper, providing strong evidence for the feasibility of CodeUNet in the field of autonomous underwater vehicles. Specifically, on the full reference dataset, the proposed method outperforms most of the 14 state-of-the-art methods in four evaluation metrics, with an improvement of up to 3.7722 compared to MLLE. On the no-reference dataset, the proposed method achieves excellent results, with an improvement of up to 0.0362 compared to MLLE. Links to the dataset and code for this project can be found at: https://github.com/An-Shunmin/CodeUNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. The U. V. Helava Award – Best Paper Volumes 123-134 (2017).
- Subjects
- *
PHOTOGRAMMETRY , *REMOTE sensing , *IMAGE quality analysis - Published
- 2018
- Full Text
- View/download PDF
10. Quick calibration of massive urban outdoor surveillance cameras.
- Author
-
Shi, Lin, Lan, Xiaoji, Lan, Xin, and Zhang, Tianliang
- Subjects
- *
VIDEO surveillance , *COMPUTER vision , *URBAN transportation , *SMART cities , *CALIBRATION , *SPACE vehicles - Abstract
The wide application of urban outdoor surveillance systems has greatly improved the efficiency of urban management and social security index. However, most of the existing urban outdoor surveillance cameras lack the records of important parameters such as geospatial coordinates, field of view angle and lens distortion, which brings difficulties to the unified management and layout optimization of the cameras, geospatial analysis of video data, and the computer vision applications such as the trajectory tracking of moving targets. To address this problem, this paper designs a marker with a chessboard pattern and a positioning device, makes the marker move in outdoor space through vehicles and other mobile carriers, and utilizes the marker image captured by the surveillance camera and the spatial position information obtained by the positioning device to batch calibrate the outdoor surveillance cameras and calculate its geospatial coordinates and field of view angle, which achieves the rapid acquisition of important parameters of the surveillance camera, and provides a new method for the rapid calibration of urban outdoor surveillance cameras, which contributes to the informationization management of urban surveillance resources and the spatial analysis and computation of surveillance video data, and make it play a greater role in the application of smart transportation and smart city. Taking the outdoor surveillance cameras within 2.5Km2 of a city as an example, calibration tests were performed on 295 surveillance cameras in the test area, and the geospatial coordinates, field of view angle and lens parameters of 269 surveillance cameras were obtained, and the average error of the spatial position was 0.527 m, and the maximum error was 1.573 m, and the average error of the field of view angle was 1.63°, and the maximum error was 3.4°, which verified the effectiveness and accuracy of the method in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. The U.V. Helava Award — Best Paper Volume 62 (2007)
- Author
-
Vosselman, George
- Published
- 2008
- Full Text
- View/download PDF
12. The U.V. Helava Award—Best Paper 2002
- Published
- 2004
- Full Text
- View/download PDF
13. The U. V. Helava Award – Best Paper Volumes 87–98 (2014).
- Subjects
- *
PHOTOGRAMMETRY , *REMOTE sensing , *IMAGE reconstruction - Published
- 2015
- Full Text
- View/download PDF
14. Photogrammetric Computer Vision 2014 – Best Papers of the ISPRS Technical Commission III Symposium.
- Author
-
Schindler, Konrad
- Subjects
- *
PHOTOGRAMMETRY , *COMPUTER vision , *CONFERENCES & conventions - Published
- 2015
- Full Text
- View/download PDF
15. Semantic change detection using a hierarchical semantic graph interaction network from high-resolution remote sensing images.
- Author
-
Long, Jiang, Li, Mengmeng, Wang, Xiaoqin, and Stein, Alfred
- Subjects
- *
REMOTE-sensing images , *DESIGN - Abstract
Current semantic change detection (SCD) methods face challenges in modeling temporal correlations (TCs) between bitemporal semantic features and difference features. These methods lead to inaccurate detection results, particularly for complex SCD scenarios. This paper presents a hierarchical semantic graph interaction network (HGINet) for SCD from high-resolution remote sensing images. This multitask neural network combines semantic segmentation and change detection tasks. For semantic segmentation, we construct a multilevel perceptual aggregation network with a pyramidal architecture. It extracts semantic features that discriminate between different categories at multiple levels. We model the correlations between bitemporal semantic features using a TC module that enhances the identification of unchanged areas. For change detection, we design a semantic difference interaction module based on a graph convolutional network. It measures the interactions among bitemporal semantic features, their corresponding difference features, and the combination of both. Extensive experiments on four datasets, namely SECOND, HRSCD, Fuzhou, and Xiamen, show that HGINet performs better in identifying changed areas and categories across various scenarios and regions than nine existing methods. Compared with the existing methods applied on the four datasets, it achieves the highest F 1 scd values of 59.48%, 64.12%, 64.45%, and 84.93%, and SeK values of 19.34%, 14.55%, 18.28%, and 51.12%, respectively. Moreover, HGINet mitigates the influence of fake changes caused by seasonal effects, producing results with well-delineated boundaries and shapes. Furthermore, HGINet trained on the Fuzhou dataset is successfully transferred to the Xiamen dataset, demonstrating its effectiveness and robustness in identifying changed areas and categories from high-resolution remote sensing images. The code of our paper is accessible at https://github.com/long123524/HGINet-torch. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Recognition for SAR deformation military target from a new MiniSAR dataset using multi-view joint transformer approach.
- Author
-
Lv, Jiming, Zhu, Daiyin, Geng, Zhe, Han, Shengliang, Wang, Yu, Ye, Zheng, Zhou, Tao, Chen, Hongren, and Huang, Jiawei
- Subjects
- *
TRANSFORMER models , *SYNTHETIC aperture radar , *IMAGE denoising , *RECOGNITION (Psychology) , *TARGET acquisition - Abstract
Accurately detecting ground armored weapons is crucial for achieving initiative advantages in military operations. Generally, satellite or airborne synthetic aperture radar (SAR) systems face limitations due to their revisit cycles and fixed flight trajectories, resulting in single-view imaging of targets, thereby hampering the recognition of small SAR ground targets. In contrast, MiniSAR possesses the capability to capture the multi-view of a target by acquiring images from different azimuth angles. In this research, our team utilizes a self-developed MiniSAR system to generate multi-view SAR images of real ground armored targets and recognize targets. However, the recognition of small targets in SAR images encounters two significant difficulties. First, small targets in SAR images are prone to interference from background noise. Second, SAR target deformation arises from variations in depression angles and imaging processes. To tackle these difficulties, this paper proposes a novel SAR ground deformation target recognition approach based on a joint multi-view transformer model. The method first preprocesses SAR images based on a low-frequency priori SAR image denoising method. Next, it obtains multi-view joint information through a self-attentive mechanism, inputs joint features to the transformer structure. The outputs are jointly updated by a multi-way averaging adaptive loss function to improve the recognition accuracy of deformed targets. The experimental results demonstrate the superiority of the proposed method in SAR ground deformation target recognition, outperforming other representative approaches such as information fusion of target and shadow (IFTS) and Vision Transformer (ViT). It is concluded that the proposed method has high recognition accuracies of 98.37% and 93.86 % on the moving and stationary target acquisition and recognition (Mstar) and our SAR images dataset, respectively, in the field of SAR ground deformation target recognition. We have included links to the code and data in the abstract of this paper for ease of access. The source code and sample dataset are available at https://github.com/Lvjiming/MJT. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. WHU-Urban3D: An urban scene LiDAR point cloud dataset for semantic instance segmentation.
- Author
-
Han, Xu, Liu, Chong, Zhou, Yuzhou, Tan, Kai, Dong, Zhen, and Yang, Bisheng
- Subjects
- *
POINT cloud , *MACHINE learning , *LIDAR , *AIRBORNE lasers , *CITIES & towns - Abstract
With the rapid advancement of 3D sensors, there is an increasing demand for 3D scene understanding and an increasing number of 3D deep learning algorithms have been proposed. However, a large-scale and richly annotated 3D point cloud dataset is critical to understanding complicated road and urban scenes. Motivated by the need to bridge the gap between the rising demand for 3D urban scene understanding and limited LiDAR point cloud datasets, this paper proposes a richly annotated WHU-Urban3D dataset and an effective method for semantic instance segmentation. WHU-Urban3D stands out from existing datasets due to its distinctive features: (1) extensive coverage of both Airborne Laser Scanning and Mobile Laser Scanning point clouds, along with panoramic images; (2) containing large-scale road and urban scenes in different cities (over 3. 2 × 1 0 6 m 2 area), with richly point-wise semantic instance labels (over 200 million points); (3) inclusion of particular attributes (e.g., reflected intensity, number of returns) in addition to 3D coordinates. This paper also provides the performance of several representative baseline methods and outlines potential future works and challenges for fully exploiting this dataset. The WHU-Urban3D dataset is publicly accessible at https://whu3d.com/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. A novel Building Section Skeleton for compact 3D reconstruction from point clouds: A study of high-density urban scenes.
- Author
-
Wu, Yijie, Xue, Fan, Li, Maosu, and Chen, Sou-Han
- Subjects
- *
POINT cloud , *BUILDING repair , *SKELETON , *ARCHITECTURAL designs , *URBAN growth , *SPACE - Abstract
Compact building models are demanded by global smart city applications, while high-definition urban 3D data is increasingly accessible by dint of the advanced reality capture technologies. Yet, existing building reconstruction methods encounter crucial bottlenecks against high-definition data of large scales and high-level complexity, particularly in high-density urban scenes. This paper proposes a Building Section Skeleton (BSS) to reflect architectural design principles about parallelism and symmetries. A BSS atom describes a pair of intrinsic parallel or symmetric points; a BSS segment clusters dense BSS atoms of a pair of symmetric surfaces; the polyhedra of all BSS segments further echo the architectural forms and reconstructability. To prove the concepts of BSS for automatic compact reconstruction, this paper presents a BSS method for building reconstruction that consists of one stage of BSS segments hypothesizing and another stage of BSS segments merging. Experiments and comparisons with four state-of-the-art methods have been conducted on 15 diverse scenes encompassing more than 60 buildings. Results confirmed that the BSS method achieves frontiers in compactness, robustness, geometric accuracy, and efficiency, simultaneously, especially for high-density urban scenes. On average, the BSS method reconstructed each scene into 623 triangles with a root-mean-square deviation (RMSD) of 0.82 m, completing the process in 110 s. First, the proposed BSS is an expressive 3D feature reflecting architectural designs in high-density cities, and can open new avenues to city modeling and other urban remote sensing and photogrammetry studies. Second, for practitioners in smart city development, the BSS method for building reconstruction offers an accurate and efficient approach to compact building and city modeling. The source code and tested scenes are available at https://github.com/eiiijiiiy/sobss. [Display omitted] • Building Section Skeleton (BSS) is proposed with novel definitions of BSS atoms and segments. • BSS revamps traditional shape skeletons to reflect architectural design principles about parallelism and symmetry. • A BSS method consisting of two stages is developed for compact building reconstruction from urban point clouds. • The BSS method of reconstruction was confirmed compact, robust, geometrically accurate, and efficient. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Call for papers for Theme Issue: High-Resolution Earth Imaging for Geospatial Information.
- Published
- 2013
- Full Text
- View/download PDF
20. Call for Papers-Theme Issue “Global Land Cover Mapping and Monitoring: Progress, Challenges, and opportunities”
- Published
- 2013
- Full Text
- View/download PDF
21. The U.V. Helava Award – Best Paper Volume 65 (2010)
- Published
- 2011
- Full Text
- View/download PDF
22. Call for papers
- Published
- 2011
- Full Text
- View/download PDF
23. The U.V. Helava Award — Best Paper Volume 64 (2009)
- Author
-
Vosselman, George
- Subjects
- *
AWARDS , *PHOTOGRAMMETRY , *REMOTE sensing , *OCEAN surface topography - Abstract
Abstract: x [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
24. Call for papers
- Published
- 2010
- Full Text
- View/download PDF
25. The U.V. Helava Award — Best Paper Volume 63 (2008)
- Author
-
Vosselman, George
- Published
- 2010
- Full Text
- View/download PDF
26. Call for Papers
- Published
- 2009
- Full Text
- View/download PDF
27. The U.V. Helava Award — Best Paper Volume 60 (2005)
- Published
- 2007
- Full Text
- View/download PDF
28. The U.V. Helava Award — Best paper volume 59 (2004)
- Published
- 2007
- Full Text
- View/download PDF
29. Call for Papers
- Published
- 2007
- Full Text
- View/download PDF
30. The U.V. Helava Award — Best Paper 2003
- Published
- 2004
- Full Text
- View/download PDF
31. The U.V. Helava Award—Best Paper 2001
- Published
- 2004
- Full Text
- View/download PDF
32. Call for Papers.
- Published
- 2002
- Full Text
- View/download PDF
33. Call for Papers.
- Published
- 2002
- Full Text
- View/download PDF
34. TransCNNLoc: End-to-end pixel-level learning for 2D-to-3D pose estimation in dynamic indoor scenes.
- Author
-
Tang, Shengjun, Li, Yusong, Wan, Jiawei, Li, You, Zhou, Baoding, Guo, Renzhong, Wang, Weixi, and Feng, Yuhong
- Subjects
- *
POSE estimation (Computer vision) , *PIXELS , *TRANSFORMER models , *SINGLE-degree-of-freedom systems , *COMPUTER vision - Abstract
Accurate localization in GPS-denied environments has always been a core issue in computer vision and robotics research. In indoor environments, vision-based localization methods are susceptible to changes in lighting conditions, viewing angles, and environmental factors, resulting in localization failures or limited generalization capabilities. In this paper, we propose the TransCNNLoc framework, which consists of an encoding–decoding network designed to learn more robust image features for camera pose estimation. In the image feature encoding stage, CNN and Swin Transformer are integrated to construct the image feature encoding module, enabling the network to fully extract global context and local features from images. In the decoding stage, multi-level image features are decoded through cross-layer connections while computing per-pixel feature weight maps. To enhance the framework's robustness to dynamic objects, a dynamic object recognition network is introduced to optimize the feature weights. Finally, a multi-level iterative optimization from coarse to fine levels is performed to recover six degrees of freedom camera pose. Experiments were conducted on the publicly available 7scenes dataset as well as a dataset collected under changing lighting conditions and dynamic scenes for accuracy validation and analysis. The experimental results demonstrate that the proposed TransCNNLoc framework exhibits superior adaptability to dynamic scenes and lighting changes. In the context of static environments within publicly available datasets, the localization technique introduced in this study attains a maximal precision of up to 5 centimeters, consistently achieving superior outcomes across a majority of the scenarios. Under the conditions of dynamic scenes and fluctuating illumination, this approach demonstrates an enhanced precision capability, reaching up to 3 centimeters. This represents a substantial refinement from the decimeter scale to a centimeter scale in precision, marking a significant advancement over the existing state-of-the-art (SOTA) algorithms. The open-source repository for the method proposed in this paper can be found at the following URL: github.com/Geelooo/TransCNNloc. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Satellite remote sensing of vegetation phenology: Progress, challenges, and opportunities.
- Author
-
Gong, Zheng, Ge, Wenyan, Guo, Jiaqi, and Liu, Jincheng
- Subjects
- *
CLIMATE change , *EXTREME weather , *REMOTE sensing , *MACHINE learning , *ECOLOGICAL disturbances , *PLANT phenology - Abstract
[Display omitted] Vegetation phenology serves as a crucial indicator of ecosystem dynamics and its response to environmental cues. Against the backdrop of global climate warming, it plays a pivotal role in delving into global climate change, terrestrial ecosystem dynamics, and guiding agricultural production. Ground-based field observations of vegetation phenology are increasingly challenged by rapid global ecological changes. Since the 1970 s, the development and application of remote sensing technology have offered a novel approach to address these challenges. Utilizing satellite remote sensing to acquire phenological parameters has been widely applied in monitoring vegetation phenology, significantly advancing phenological research. This paper describes commonly used vegetation indices, smoothing methods, and extraction techniques in monitoring vegetation phenology using satellite remote sensing. It systematically summarizes the applications and progress of vegetation phenology remote sensing at a global scale in recent years and analyzes the challenges of vegetation phenology remote sensing: These challenges include the need for higher spatiotemporal resolution data to capture vegetation changes, the necessity to compare remote sensing monitoring methods with direct field observations, the requirement to compare different remote sensing techniques to ensure accuracy, and the importance of incorporating seasonal variations and differences into phenology extraction models. It delves into the key issues and challenges existing in current vegetation phenology remote sensing, including the limitations of existing vegetation indices, the impact of spatiotemporal scale effects on phenology parameter extraction, uncertainties in phenology algorithms and machine learning, and the relationship between vegetation phenology and global climate change. Based on these discussions, the it proposes several opportunities and future prospects, containing improving the temporal and spatial resolution of data sources, using multiple datasets to monitor vegetation phenology dynamics, quantifying uncertainties in the algorithm and machine learning processes for phenology parameter extraction, clarifying the adaptive mechanisms of vegetation phenology to environmental changes, focusing on the impact of extreme weather, and establishing an integrated "sky-space-ground" vegetation phenology monitoring network. These developments aim to enhance the accuracy of phenology extraction, explore and understand the mechanisms of surface phenology changes, and impart more biophysical significance to vegetation phenology parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Beyond clouds: Seamless flood mapping using Harmonized Landsat and Sentinel-2 time series imagery and water occurrence data.
- Author
-
Li, Zhiwei, Xu, Shaofen, and Weng, Qihao
- Subjects
- *
MARKOV random fields , *BODIES of water , *EMERGENCY management , *NATURAL disasters , *ARTIFICIAL satellites , *SYNTHETIC aperture radar - Abstract
Floods are among the most devastating natural disasters, posing significant risks to life, property, and infrastructure globally. Earth observation satellites provide data for continuous and extensive flood monitoring, yet limitations exist in the spatial completeness of monitoring using optical images due to cloud cover. Recent studies have developed gap-filling methods for reconstructing cloud-covered areas in water maps. However, these methods are not tailored for and validated in cloudy and rainy flooding scenarios with rapid water extent changes and limited clear-sky observations, leaving room for further improvements. This study investigated and developed a novel reconstruction method for time series flood extent mapping, supporting spatially seamless monitoring of flood extents. The proposed method first identified surface water from time series images using a fine-tuned large foundation model. Then, the cloud-covered areas in the water maps were reconstructed, adhering to the introduced submaximal stability assumption, on the basis of the prior water occurrence data in the Global Surface Water dataset. The reconstructed time series water maps were refined through spatiotemporal Markov random field modeling for the final delineation of flooding areas. The effectiveness of the proposed method was evaluated with Harmonized Landsat and Sentinel-2 datasets under varying cloud cover conditions, enabling seamless flood mapping at 2–3-day frequency and 30 m resolution. Experiments at four global sites confirmed the superiority of the proposed method. It achieved higher reconstruction accuracy with average F1-scores of 0.931 during floods and 0.903 before/after floods, outperforming the typical gap-filling method with average F1-scores of 0.871 and 0.772, respectively. Additionally, the maximum flood extent maps and flood duration maps, which were composed on the basis of the reconstructed water maps, were more accurate than those using the original cloud-contaminated water maps. The benefits of synthetic aperture radar images (e.g., Sentinel-1) for enhancing flood mapping under cloud cover conditions were also discussed. The method proposed in this paper provided an effective way for flood monitoring in cloudy and rainy scenarios, supporting emergency response and disaster management. The code and datasets used in this study have been made available online (https://github.com/dr-lizhiwei/SeamlessFloodMapper). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Catadioptric omnidirectional thermal odometry in dynamic environment.
- Author
-
Wu, Yuzhen, Wang, Lingxue, Zhang, Lian, Han, Xudong, Zheng, Dezhi, Wang, Shuigen, Li, Yanqiu, and Cai, Yi
- Subjects
- *
CAMERA calibration , *SINGLE-degree-of-freedom systems , *VISUAL odometry , *THERMOGRAPHY , *CAMERAS , *OMNIRANGE system - Abstract
This paper presents a catadioptric omnidirectional thermal odometry (COTO) system that estimates the six degrees of freedom (DoF) pose of a camera using only omnidirectional thermal images in visually degraded, fast-motion, and dynamic environments. First, we design and fabricate a central hyperbolic catadioptric omnidirectional thermal camera that captures surrounding thermal images with 360 ° horizontal field of view (FoV), and improve the omnidirectional camera model and calibration method to obtain high-precision camera intrinsic parameter. Second, we propose the epipolar curve constraint combining with omnidirectional thermal object detection to significantly reduce the interference of moving objects on pose estimation. Third, the implemented COTO pipeline consists of photometric calibration, dynamic region removal, tracking and mapping to overcome the drawbacks of photometric inconsistency and large distortion in omnidirectional thermal images. Experiments have been conducted on a total of 17 sequences of Lab, Outdoor and Driving, amounting to more than 60,000 omnidirectional thermal images of real environments. The experimental results indicate that the proposed COTO system has excellent localization accuracy and unparalleled robustness over the current state-of-the-art methods. The average localization accuracy measured by the absolute trajectory error (ATE) is less than 15 cm from the ground truth in both Lab and Outdoor sequences. In addition, COTO was the only system with complete and successful tracking in all sequences. The system can be used as an innovative localization solution, particularly in challenging environments with changes in ambient light, rapid vehicle motion, and moving object interference, which can be a difficult problem for visual odometry to solve. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Deep neural network based on dynamic attention and layer attention for meteorological data downscaling.
- Author
-
Wang, Junkai, Lin, Lianlei, Zhang, Zongwei, Gao, Sheng, and Yu, Hangyi
- Subjects
- *
ARTIFICIAL neural networks , *DOWNSCALING (Climatology) , *DATA distribution , *HOMOSCEDASTICITY - Abstract
The scale of meteorological data products does not match the requirements of application scenarios, which limits their application. It is suggested that large-scale reanalysis data must be downscaled before use. Attention mechanism is the key to high-performance downscaling models. However, in different application scenarios and different locations on the network, the attention mechanism is not always beneficial. In this paper, we propose a dynamic attention module that can adaptively generate weights for each branch based on input features, thereby dynamically suppressing unnecessary attention adjustments. At the same time, we propose a layer attention module, which can independently and adaptively aggregate the feature representation of different network layers. In addition, we design a unique loss function based on homoscedasticity uncertainty, which can directly guide the model to learn the numerical mapping relationship from low resolution to high resolution at the pixel level, and implicitly motivate the model to better reconstruct the data distribution of each meteorological field by guiding the model to learn the distribution difference between different meteorological fields. Experiments show that our model is more robust in time dimension, with an MAE average reduction of about 40% compared to VDSR and other methods in downscaling composite meteorological data. It can more accurately reconstruct multivariate high-resolution meteorological fields. Codes available at https://github.com/HitKarry/SDDN. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Unifying remote sensing change detection via deep probabilistic change models: From principles, models to applications.
- Author
-
Zheng, Zhuo, Zhong, Yanfei, Zhao, Ji, Ma, Ailong, and Zhang, Liangpei
- Subjects
- *
ARTIFICIAL neural networks , *TRANSFORMER models , *SURFACE of the earth , *ARCHITECTURAL design , *REMOTE sensing - Abstract
Change detection in high-resolution Earth observation is a fundamental Earth vision task to understand the subtle temporal dynamics of Earth's surface, significantly promoted by generic vision technologies in recent years. Vision Transformer is a powerful component to learning spatiotemporal representation but with enormous computation complexity, especially for high-resolution images. Besides, there is still lacking principles in designing macro architectures integrating these advanced vision components for various change detection tasks. In this paper, we present a deep probabilistic change model (DPCM) to provide a unified, interpretable, modular probabilistic change process modeling to address multiple change detection tasks, including binary change detection, one-to-many semantic change detection, and many-to-many semantic change detection. DPCM describes any complex change process as a probabilistic graphical model to provide theoretical evidence for macro architecture design and generic change detection task modeling. We refer to this probabilistic graphical model as the probabilistic change model (PCM), where DPCM is the PCM parameterized by deep neural networks. For parameterization, the PCM is factorized into many easy-to-solve distributions based on task-specific assumptions, and then we can use deep neural modules to parameterize these distributions to solve the change detection problem uniformly. In this way, DPCM has both theoretical macro architecture from PCM and strong representation capability of deep networks. We also present the sparse change Transformer for better parameterization. Inspired by domain knowledge, i.e., the sparsity of change and the local correlation of change, the sparse change Transformer computes self-attention within change regions to model spatiotemporal correlations, which has a quadratic computational complexity of the change region size but independent of image size, significantly reducing computation overhead for high-resolution image change detection. We refer to this instance of DPCM with sparse change Transformer as ChangeSparse to demonstrate their effectiveness. The experiments confirm ChangeSparse's superiority in speed and accuracy for multiple real-world application scenarios, such as disaster response and urban development monitoring. The code is available at https://github.com/Z-Zheng/pytorch-change-models. More resources can be found in http://rsidea.whu.edu.cn/resource_sharing.htm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. PriNeRF: Prior constrained Neural Radiance Field for robust novel view synthesis of urban scenes with fewer views.
- Author
-
Chen, Kaiqiang, Dong, Bo, Wang, Zhirui, Cheng, Peirui, Yan, Menglong, Sun, Xian, Weinmann, Michael, and Weinmann, Martin
- Subjects
- *
CITIES & towns , *URBAN planning , *RADIANCE , *CAMERAS , *ALGORITHMS - Abstract
Novel view synthesis (NVS) of urban scenes enables the exploration of cities virtually and interactively, which can further be used for urban planning, navigation, digital tourism, etc. However, many current NVS methods require a large amount of images from known views as input and are sensitive to intrinsic and extrinsic camera parameters. In this paper, we propose a new unified framework for NVS of urban scenes with fewer required views via the integration of scene priors and the joint optimization of camera parameters under an geometric constraint along with NeRF weights. The integration of scene priors makes full use of the priors from the neighbor reference views to reduce the number of required known views. The joint optimization can correct the errors in camera parameters, which are usually derived from algorithms like Structure-from-Motion (SfM), and then further improves the quality of the generated novel views. Experiments show that our method achieves about 25. 375 dB and 25. 512 dB in average in terms of peak signal-to-noise (PSNR) on synthetic and real data, respectively. It outperforms popular state-of-the-art methods (i.e., BungeeNeRF and MegaNeRF) by about 2– 4 dB in PSNR. Notably, our method achieves better or competitive results than the baseline method with only one third of the known view images required for the baseline. The code and dataset are available at https://github.com/Dongber/PriNeRF. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Incremental multi temporal InSAR analysis via recursive sequential estimator for long-term landslide deformation monitoring.
- Author
-
Ao, Meng, Wei, Lianhuan, Liao, Mingsheng, Zhang, Lu, Dong, Jie, and Liu, Shanjun
- Subjects
- *
GLOBAL Positioning System , *TIME series analysis , *DEFORMATION of surfaces , *BATCH processing , *DATA libraries , *LANDSLIDES - Abstract
Distributed Scatterers Interferometry (DS-InSAR) has been widely applied to increase the number of measurement points (MP) in complex mountainous areas with dense vegetation and complicated topography. However, DS-InSAR method adopts batch processing mode. When new observation data acquired, the entire archived data is reprocessed, completely ignoring the existing results, and not suitable for high-performance processing of operational observation data. The current research focuses on the automation of SAR data acquisition and processing optimization, but the core time series analysis method remains unchanged. In this paper, based on the traditional Sequential Estimator proposed by Ansari in 2017, a Recursive Sequential Estimator with Flexible Batches (RSEFB) is improved to divide the large dataset flexibly without requirements on the number of images in each subset. This method updates and processes the newly acquired SAR data in near real-time, and obtains long-time sequence results without reprocessing the entire data archived, helpful to the early warning of landslide disaster in the future. 132 Sentinel-1 SAR images and 44 TerraSAR-X SAR images were utilized to inverse the line of sight (LOS) surface deformation of Xishancun landslide and Huangnibazi landslide in Li County, Sichuan Province, China. RSEFB method is applied to retrieve time-series displacements from Sentinel-1 and TerraSAR-X datasets, respectively. The comparison with the traditional Sequential Estimator and validation through Global Position System (GPS) monitoring data proved the effectiveness and reliability of the RSEFB method. The research shows that Xishancun landslide is in a state of slow and uneven deformation, and the non-sliding part of Huangnibazi landslide has obvious deformation signal, so continuous monitoring is needed to prevent and mitigate possible catastrophic slope failure events. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Pano2Geo: An efficient and robust building height estimation model using street-view panoramas.
- Author
-
Fan, Kaixuan, Lin, Anqi, Wu, Hao, and Xu, Zhenci
- Subjects
- *
CONSTRUCTION cost estimates , *GRAPHICAL projection , *EXHIBITION buildings , *SOURCE code , *PANORAMAS - Abstract
Building height serves as a crucial parameter in characterizing urban vertical structure, which has a profound impact on urban sustainable development. The emergence of street-view data offers the opportunity to observe urban 3D scenarios from the human perspective, benefiting the estimation of building height. In this paper, we propose an efficient and robust building height estimation model, which we call the Pano2Geo model, by precisely projecting street-view panorama (SVP) coordinates to geospatial coordinates. Firstly, an SVP refinement stratagem is designed, incorporating NENO rules for observation quality assessment from four aspects: number of buildings, extent of the buildings, number of nodes, and orthogonal observations, followed by the application of the art gallery theorem to further refine the SVPs. Secondly, the Pano2Geo model is constructed, which provides a pixel-level projection transformation from SVP coordinates to 3D geospatial coordinates for locating the height features of buildings in the SVP. Finally, the valid building height feature points in the SVP are extracted based on a slope mutation test, and the 3D geospatial coordinates of the building height feature points are projected using the Pano2Geo model, so as to obtain the building height. The proposed model was evaluated in the city of Wuhan in China, and the results indicate that the Pano2Geo model can accurately estimate building height, with an average error of 1.85 m. Furthermore, compared with three state-of-the-art methods, the Pano2Geo model shows superior performance, with only 10.2 % of buildings have absolute errors exceeding 2 m, compared to the Map-image-based (27.2 %), Corner-based (16.8 %), and Single-view-based (13.9 %) height estimation methods. The SVP refinement method achieves optimal observation quality with less than 50 % of existing SVPs, leading to highly efficient building height estimation, particularly in areas of a high building density. Moreover, the Pano2Geo model exhibits robustness in building height estimation, maintaining errors within 2 m even as building shape complexity and occlusion degree increase within the SVP. Our source dataset and code are available at https://github.com/Giser317/Pano2Geo.git. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Global Streetscapes — A comprehensive dataset of 10 million street-level images across 688 cities for urban science and analytics.
- Author
-
Hou, Yujun, Quintana, Matias, Khomiakov, Maxim, Yap, Winston, Ouyang, Jiani, Ito, Koichi, Wang, Zeyu, Zhao, Tianhong, and Biljecki, Filip
- Subjects
- *
SPATIAL data infrastructures , *COMPUTER vision , *CITIES & towns , *RESEARCH questions , *MULTISENSOR data fusion , *DEEP learning - Abstract
Street view imagery (SVI) is instrumental for sensing urban environments, benefitting numerous domains such as urban morphology, health, greenery, and accessibility. Billions of images worldwide have been made available by commercial services such as Google Street View and crowdsourcing services such as Mapillary and KartaView where anyone from anywhere can upload imagery while moving. However, while the data tend to be plentiful, have high coverage and quality, and are used to derive rich insights, they remain simple and limited in metadata as characteristics such as weather, quality, and lighting conditions remain unknown, making it difficult to evaluate the suitability of the images for specific analyses. We introduce Global Streetscapes — a dataset of 10 million crowdsourced and free-to-use SVIs sampled from 688 cities across 210 countries and territories, enriched with more than 300 camera, geographical, temporal, contextual, semantic, and perceptual attributes. The cities included are well balanced and diverse, and are home to about 10% of the world's population. Deep learning models are trained on a subset of manually labelled images for eight visual-contextual attributes pertaining to the usability of SVI — panoramic status, lighting condition, view direction, weather, platform, quality, presence of glare and reflections, achieving accuracy ranging from 68.3% to 99.9%, and used to automatically label the entire dataset. Thanks to its scale and pre-computed standard semantic information, the data can be readily used to benefit existing use cases and to unlock new applications, including multi-city comparative studies and longitudinal analyses, as affirmed by a couple of use cases in the paper. Moreover, the automated processes and open-source code facilitate the expansion and updates of the dataset and encourage users to create their own datasets. With the rich manual annotations, some of which are provided for the first time, and diverse conditions present in the images, the dataset also facilitates assessing the heterogeneous properties of crowdsourced SVIs and provides a benchmark for evaluating future computer vision models. We make the Global Streetscapes dataset and the code to reproduce and use it publicly available in https://github.com/ualsg/global-streetscapes. [Display omitted] • Largest labelled dataset, with 346 attributes that characterise street photos. • Baseline models and ground truth labels for benchmarking computer vision models. • Reproducible framework to sample and enrich SVIs from cities all around the world. • In-depth discussion of how the dataset could drive novel research questions. • Taking forward the work of Mapillary and KartaView, and their contributors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Explaining the decisions and the functioning of a convolutional spatiotemporal land cover classifier with channel attention and redescription mining.
- Author
-
Pelous, Enzo, Méger, Nicolas, Benoit, Alexandre, Atto, Abdourrahmane, Ienco, Dino, Courteille, Hermann, and Lin-Kwong-Chon, Christophe
- Subjects
- *
CONVOLUTIONAL neural networks , *LAND cover , *REMOTE-sensing images , *TIME series analysis , *ARTIFICIAL intelligence - Abstract
Convolutional neural networks trained with satellite image time series have demonstrated their potential in land cover classification in recent years. Nevertheless, the rationale leading to their decisions remains obscure by nature. Methods for providing relevant and simplified explanations of their decisions as well as methods for understanding their inner functioning have thus emerged. However, both kinds of methods generally work separately and no explicit connection between their findings is made available. This paper presents an innovative method for refining the explanations provided by channel-based attention mechanisms. It consists in identifying correspondence rules between neuronal activation levels and the presence of spatiotemporal patterns in the input data for each channel and target class. These rules provide both class-level and instance-level explanations, as well as an explicit understanding of the network operations. They are extracted using a state-of-the-art redescription mining algorithm. Experiments on the Reunion Island Sentinel-2 dataset show that both correct and incorrect decisions can be explained using convenient spatiotemporal visualizations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Spatio-temporal registration of plants non-rigid 3-D structure.
- Author
-
Zhang, Tian, Elnashef, Bashar, and Filin, Sagi
- Subjects
- *
PLANT growth , *POINT cloud , *RECORDING & registration , *TRIANGULATION , *PLANT development , *PLANT species - Abstract
Monitoring plants growth dynamics requires continuously tracing their evolution over time. When using point cloud data, such a process requires associating the individual organs among scans, spatially aligning them, and accounting for their evolution, decay, or split. It is common to address this challenge by abstracting the point cloud into its skeletal form and defining point correspondence by Euclidean measures. As the paper demonstrates, standard skeletonization approaches do not capture the actual plant topology, and Euclidean measures do not document its evolving form. To address this alignment challenge, we propose in this paper a registration model that traces high-degree deformations and accommodates the complex plant topology. We develop an embedded deformation graph-based solution and introduce manifold measures to trace the plant non-isometric development. We demonstrate how a path-seeking strategy and invariant features capture the plant topological form, and then use a probabilistic linear assignment solution to associate organs across scans. By minimizing deviations from rigidity, our registration form maintains elasticity, and by solving locally rigid transformations, regularized by structure-related constraints, we secure smoothness and optimality. We also demonstrate how data arrangement and linear path-finding models make our solution computationally efficient. Our model is applied on high quality laser triangulation data, commonly tested in 4-D plant registration studies, but is also verified on low resolution and noisy pointsets reconstructed from a limited number of images by multiview stereo (MVS). Our results demonstrate 0. 1 mm levels of accuracy when applied to plant species exhibiting complex geometric structures. They improve by tenfold or more state-of-the-art results and transform correctly the plants form. Paper related resources are available at PLANT4D. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. AdaTreeFormer: Few shot domain adaptation for tree counting from a single high-resolution image.
- Author
-
Amirkolaee, Hamed Amini, Shi, Miaojing, He, Lianghua, and Mulligan, Mark
- Subjects
- *
FOREST management , *FOREST density , *REMOTE-sensing images , *FEATURE extraction , *REMOTE sensing - Abstract
The process of estimating and counting tree density using only a single aerial or satellite image is a difficult task in the fields of photogrammetry and remote sensing. However, it plays a crucial role in the management of forests. The huge variety of trees in varied topography severely hinders tree counting models to perform well. The purpose of this paper is to propose a framework that is learnt from the source domain with sufficient labeled trees and is adapted to the target domain with only a limited number of labeled trees. Our method, termed as AdaTreeFormer, contains one shared encoder with a hierarchical feature extraction scheme to extract robust features from the source and target domains. It also consists of three subnets: two for extracting self-domain attention maps from source and target domains respectively and one for extracting cross-domain attention maps. For the latter, an attention-to-adapt mechanism is introduced to distill relevant information from different domains while generating tree density maps; a hierarchical cross-domain feature alignment scheme is proposed that progressively aligns the features from the source and target domains. We also adopt adversarial learning into the framework to further reduce the gap between source and target domains. Our AdaTreeFormer is evaluated on six designed domain adaptation tasks using three tree counting datasets, i.e. Jiangsu, Yosemite, and London. Experimental results show that AdaTreeFormer significantly surpasses the state of the art, e.g. in the cross domain from the Yosemite to Jiangsu dataset, it achieves a reduction of 15.9 points in terms of the absolute counting errors and an increase of 10.8% in the accuracy of the detected trees' locations. The codes and datasets are available at https://github.com/HAAClassic/AdaTreeFormer. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Legged robot-aided 3D tunnel mapping via residual compensation and anomaly detection.
- Author
-
Zhang, Xing, Huang, Zhanpeng, Li, Qingquan, Wang, Ruisheng, and Zhou, Baoding
- Subjects
- *
TUNNELS , *OPTICAL radar , *LIDAR , *INTRUSION detection systems (Computer security) , *POINT cloud , *ROBOTIC exoskeletons - Abstract
Three-dimensional (3D) mapping is important to achieve early warning for construction safety and support the long-term safety maintenance of tunnels. However, generating 3D point cloud maps of excavation tunnels that tend to be deficient in features, have rough lining structures, and suffer from dynamic construction interference, can be a challenging task. In this paper, we propose a novel legged robot-aided 3D tunnel mapping method to address the influence of point clouds in the mapping phase. First, a method of kinematic model construction that integrates information from both the robot's motors and the inertial measurement unit (IMU) is proposed to correct the motion distortion of point clouds. Then, a residual compensation model for unreliable regions (abbreviated as the URC model) is proposed to eliminate the inherent alignment errors in the 3D structures. The structural regions of a tunnel are divided into different reliabilities using the K-means method, and an inherent alignment metric is compensated based on region residual estimation. The compensated alignment metric is then incorporated into a rotation-guided anomaly consistency detection (RAD) model. An isolation forest-based anomaly consistency indicator is designed to remove anomalous light detection and ranging (LiDAR) points and reduce sensor noise caused by ultralong distances. To verify the proposed method, we conduct numerous experiments in three tunnels, namely, a drilling and blasting tunnel, a TBM tunnel, and an underground pedestrian tunnel. According to the experimental results, the proposed method achieves 0.84 ‰, 0.40 ‰, and 0.31 ‰ closure errors (CEs) for the three tunnels, respectively, and the absolute map error (AME) and relative map error (RME) are approximately 1.45 cm and 0.57 %, respectively. The trajectory estimation and mapping errors of our method are smaller than those of existing methods, such as FAST-LIO2, Faster-LIO and LiLi-OM. In addition, ablation tests are conducted to further reveal the roles of the different models used in our method for legged robot-aided 3D mapping in tunnels. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Light4Mars: A lightweight transformer model for semantic segmentation on unstructured environment like Mars.
- Author
-
Xiong, Yonggang, Xiao, Xueming, Yao, Meibao, Cui, Hutao, and Fu, Yuegang
- Subjects
- *
MARS rovers , *IMAGE segmentation , *FEATURE extraction , *COMPUTATIONAL complexity , *GLOBAL method of teaching , *MARS (Planet) - Abstract
Auto-semantic segmentation is important for robots on unstructured and dynamic environments like planets where ambient conditions cannot be controlled and the scale is larger than that found indoors. Current learning-based methods have achieved breathtaking improvements on this topic. For onboard applications, however, all those methods still suffer from huge computational costs and are difficult to deploy on edge devices. In this paper, unlike previous transformer-based SOTA approaches that heavily relied on complex design, we proposed Light4Mars, a lightweight model with minimal computational complexity while maintaining high segmenting accuracy. We designed a lightweight squeeze window transformer module that focuses on window-scale feature extraction and is more effective in learning global and local contextual information. The aggregated local attention decoder is utilized to fuse semantic information at different scales, especially for unstructured scenes. Since there are few all-terrain datasets for semantic segmentation of unstructured scenes like Mars, we built a synthetic dataset SynMars-TW, referencing images collected by the ZhuRong rover on the Tianwen-1 mission and the Curiosity rover. Extensive experiments on SynMars-TW and the real Mars dataset, MarsScapes show that our approach achieves state-of-the-art performance with favorable computational simplicity. To the best of our knowledge, the proposed Light4Mars-T network is the first segmentation model for Mars image segmentation with parameters lower than 0.1M. Code and datasets are available at https://github.com/CVIR-Lab/Light4Mars. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Towards assessing the synthetic-to-measured adversarial vulnerability of SAR ATR.
- Author
-
Peng, Bowen, Peng, Bo, Xia, Jingyuan, Liu, Tianpeng, Liu, Yongxiang, and Liu, Li
- Subjects
- *
ARTIFICIAL neural networks , *AUTOMATIC target recognition , *SYNTHETIC aperture radar , *COMPUTER vision , *REMOTE sensing - Abstract
Recently, there has been increasing concern about the vulnerability of deep neural network (DNN)-based synthetic aperture radar (SAR) automatic target recognition (ATR) to adversarial attacks, where a DNN could be easily deceived by clean input with imperceptible but aggressive perturbations. This paper studies the synthetic-to-measured (S2M) transfer setting, where an attacker generates adversarial perturbation based solely on synthetic data and transfers it against victim models trained with measured data. Compared with the current measured-to-measured (M2M) transfer setting, our approach does not need direct access to the victim model or the measured SAR data. We also propose the transferability estimation attack (TEA) to uncover the adversarial risks in this more challenging and practical scenario. The TEA makes full use of the limited similarity between the synthetic and measured data pairs for blind estimation and optimization of S2M transferability, leading to feasible surrogate model enhancement without mastering the victim model and data. Comprehensive evaluations based on the publicly available synthetic and measured paired labeled experiment (SAMPLE) dataset demonstrate that the TEA outperforms state-of-the-art methods and can significantly enhance various attack algorithms in computer vision and remote sensing applications. Codes and data are available at https://github.com/scenarri/S2M-TEA. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. HRVQA: A Visual Question Answering benchmark for high-resolution aerial images.
- Author
-
Li, Kun, Vosselman, George, and Yang, Michael Ying
- Subjects
- *
COMPUTER vision , *URBAN planning , *SOURCE code , *QUESTION answering systems , *SCARCITY , *PIXELS - Abstract
Visual question answering (VQA) is an important and challenging multimodal task in computer vision and photogrammetry. Recently, efforts have been made to bring the VQA task to aerial images, due to its potential real-world applications in disaster monitoring, urban planning, and digital earth product generation. However, the development of VQA in this domain is restricted by the huge variation in the appearance, scale, and orientation of the concepts in aerial images, along with the scarcity of well-annotated datasets. In this paper, we introduce a new dataset, HRVQA, which provides a collection of 53,512 aerial images of 1024 × 1024 pixels and semi-automatically generated 1,070,240 QA pairs. To benchmark the understanding capability of VQA models for aerial images, we evaluate the recent methods on the HRVQA dataset. Moreover, we propose a novel model, GFTransformer, with gated attention modules and a mutual fusion module. The experiments show that the proposed dataset is quite challenging, especially the specific attribute-related questions. Our method achieves superior performance in comparison to the previous state-of-the-art approaches. The dataset and the source code are released at https://hrvqa.nl/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.