660 results on '"Multi-view Stereo"'
Search Results
2. Transformer-guided Feature Pyramid Network for Multi-View Stereo
- Author
-
Wang, Lina, She, Jiangfeng, Qiang, Zhao, Wen, Xiang, and Guan, Yuzheng
- Published
- 2025
- Full Text
- View/download PDF
3. Visibility-Aware Pixelwise View Selection for Multi-View Stereo Matching
- Author
-
Huang, Zhentao, Shi, Yukun, Gong, Minglun, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
4. MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
- Author
-
Xu, Wangze, Gao, Huachen, Shen, Shihe, Peng, Rui, Jiao, Jianbo, Wang, Ronggang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-view Stereo with DIV Loss
- Author
-
Rich, Alex, Stier, Noah, Sen, Pradeep, Höllerer, Tobias, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
6. MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
- Author
-
Liu, Tianqi, Wang, Guangcong, Hu, Shoukang, Shen, Liao, Ye, Xinyi, Zang, Yuhang, Cao, Zhiguo, Li, Wei, Liu, Ziwei, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
7. MVSGS: Gaussian splatting radiation field enhancement using multi-view stereo.
- Author
-
Fei, Teng, Bi, Ligong, Gao, Jieming, Chen, Shuixuan, and Zhang, Guowei
- Abstract
With the advent of 3D Gaussian Splatting (3DGS), new and effective solutions have emerged for 3D reconstruction pipelines and scene representation. However, achieving high-fidelity reconstruction of complex scenes and capturing low-frequency features remain long-standing challenges in the field of visual 3D reconstruction. Relying solely on sparse point inputs and simple optimization criteria often leads to non-robust reconstructions of the radiance field, with reconstruction quality heavily dependent on the proper initialization of inputs. Notably, Multi-View Stereo (MVS) techniques offer a mature and reliable approach for generating structured point cloud data using a limited number of views, camera parameters, and feature matching. In this paper, we propose combining MVS with Gaussian Splatting, along with our newly introduced density optimization strategy, to address these challenges. This approach bridges the gap in scene representation by enhancing explicit geometry radiance fields with MVS, and our experimental results demonstrate its effectiveness. Additionally, we have explored the potential of using Gaussian Splatting for non-face template single-process end-to-end Avatar Reconstruction, yielding promising experimental results. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
8. CT-MVSNet: Curvature-guided multi-view stereo with transformers.
- Author
-
Wang, Liang, Sun, Licheng, and Duan, Fuqing
- Subjects
DEEP learning ,TRANSFORMER models ,CURVATURE ,TEMPLES - Abstract
Multi-view stereo (MVS) can fulfill dense three-dimensional reconstruction from a collection of multi-view images. Although deep learning-based MVS has significantly enhanced the reconstruction performance, the reconstruction accuracy and completeness still require improvements to meet the need of real applications of three-dimensional content generation. So, a new MVS method, the curvature-guided multi-view stereo with transformers, is presented. By exploring inter-view relationships and measuring the size of the receptive field and feature information on the image surface using the surface curvature, the proposed method adapts to various candidate scales of curvatures to extract more detailed features adaptively for precise cost computation. Furthermore, a transformer-based feature-matching network is proposed to identify inter-view similarity better and enhance feature-matching accuracy. Additionally, a similarity measurement module based on feature matching integrates curvature and inter-view similarity measurement tightly to further improve reconstruction accuracy. Experiments on the DTU dataset and Tanks and Temples dataset validate the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Leveraging Neural Radiance Fields for Large-Scale 3D Reconstruction from Aerial Imagery.
- Author
-
Hermann, Max, Kwak, Hyovin, Ruf, Boitumelo, and Weinmann, Martin
- Subjects
- *
POINT cloud , *RADIANCE , *CAMERAS , *DENSITY - Abstract
Since conventional photogrammetric approaches struggle with with low-texture, reflective, and transparent regions, this study explores the application of Neural Radiance Fields (NeRFs) for large-scale 3D reconstruction of outdoor scenes, since NeRF-based methods have recently shown very impressive results in these areas. We evaluate three approaches: Mega-NeRF, Block-NeRF, and Direct Voxel Grid Optimization, focusing on their accuracy and completeness compared to ground truth point clouds. In addition, we analyze the effects of using multiple sub-modules, estimating the visibility by an additional neural network and varying the density threshold for the extraction of the point cloud. For performance evaluation, we use benchmark datasets that correspond to the setting off standard flight campaigns and therefore typically have nadir camera perspective and relatively little image overlap, which can be challenging for NeRF-based approaches that are typically trained with significantly more images and varying camera angles. We show that despite lower quality compared to classic photogrammetric approaches, NeRF-based reconstructions provide visually convincing results in challenging areas. Furthermore, our study shows that in particular increasing the number of sub-modules and predicting the visibility using an additional neural network improves the quality of the resulting reconstructions significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Selective weighted least square and piecewise bilinear transformation for accurate satellite DSM generation.
- Author
-
Mohammadi, Nazila and Sedaghat, Amin
- Subjects
- *
DIGITAL elevation models , *LEAST squares , *REMOTE sensing , *POINT cloud , *GEOMETRIC modeling , *PIXELS - Abstract
One of the main products of multi-view stereo (MVS) high-resolution satellite (HRS) images in photogrammetry and remote sensing is digital surface model (DSM). Producing DSMs from MVS HRS images still faces serious challenges due to various reasons such as complexity of imaging geometry and exterior orientation model in HRS, as well as large dimensions and various geometric and illumination variations. The main motivation for conducting this research is to provide a novel and efficient method that enhances the accuracy and completeness of extracting DSM from HRS images compared to existing recent methods. The proposed method called Sat-DSM, consists of five main stages. Initially, a very dense set of tie-points is extracted from the images using a tile-based matching method, phase congruency-based feature detectors and descriptors, and a local geometric consistency correspondence method. Then, the process of Rational Polynomial Coefficients (RPC) block adjustment is performed to compensate the RPC bias errors. After that, a dense matching process is performed to generate 3D point clouds for each pair of input HRS images using a new geometric transformation called PWB (pricewise bilinear) and an accurate area-based matching method called SWLSM (selective weighted least square matching). The key innovations of this research include the introduction of SWLSM and PWB methods for an accurate dense matching process. The PWB is a novel and simple piecewise geometric transformation model based on superpixel over-segmentation that has been proposed for accurate registration of each pair of HRS images. The SWLSM matching method is based on phase congruency measure and a selection strategy to improve the well-known LSM (least square matching) performance. After dense matching process, the final stage is spatial intersection to generate 3D point clouds, followed by elevation interpolation to produce DSM. To evaluate the Sat-DSM method, 12 sets of MVS-HRS data from IRS-P5, ZY3-1, ZY3-2, and Worldview-3 sensors were selected from areas with different landscapes such as urban, mountainous, and agricultural areas. The results indicate the superiority of the proposed Sat-DSM method over four other methods CATALYST, SGM (Semi-global matching), SS-DSM (structural similarity based DSM extraction), and Sat-MVSF in terms of completeness, RMSE, and MEE. The demo code is available at https://www.researchgate.net/publication/377721674_SatDSM. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Integration of multiple dense point clouds based on estimated parameters in photogrammetry with QR code for reducing computation time.
- Author
-
Nakamura, Keita, Baba, Keita, Watanobe, Yutaka, Hanari, Toshihide, Matsumoto, Taku, Imabuchi, Takashi, and Kawabata, Kuniaki
- Abstract
This paper describes a method for integrating multiple dense point clouds using a shared landmark to generate a single real-scale integrated result for photogrammetry. It is difficult to integrate high-density point clouds reconstructed by photogrammetry because the scale differs with each photogrammetry. To solve this problem, this study places a QR code of known sizes, which is a shared landmark, in the reconstruction target environment and divides the reconstruction target environment based on the position of the QR code that is placed. Then, photogrammetry is performed for each divided environment to obtain each high-density point cloud. Finally, we propose a method of scaling each high-density point cloud based on the size of the QR code and aligning each high-density point cloud as a single high-point cloud by partial-to-partial registration. To verify the effectiveness of the method, this paper compares the results obtained by applying all images to photogrammetry with those obtained by the proposed method in terms of accuracy and computation time. In this verification, ideal images generated by simulation and images obtained in real environments are applied to photogrammetry. We clarify the relationship between the number of divided environments, the accuracy of the reconstruction result, and the computation time required for the reconstruction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. MVSGS: Gaussian splatting radiation field enhancement using multi-view stereo
- Author
-
Teng Fei, Ligong Bi, Jieming Gao, Shuixuan Chen, and Guowei Zhang
- Subjects
3D reconstruction ,3D Gaussian splatting ,Multi-view stereo ,Radiation field ,Explicit geometry ,Density optimization ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract With the advent of 3D Gaussian Splatting (3DGS), new and effective solutions have emerged for 3D reconstruction pipelines and scene representation. However, achieving high-fidelity reconstruction of complex scenes and capturing low-frequency features remain long-standing challenges in the field of visual 3D reconstruction. Relying solely on sparse point inputs and simple optimization criteria often leads to non-robust reconstructions of the radiance field, with reconstruction quality heavily dependent on the proper initialization of inputs. Notably, Multi-View Stereo (MVS) techniques offer a mature and reliable approach for generating structured point cloud data using a limited number of views, camera parameters, and feature matching. In this paper, we propose combining MVS with Gaussian Splatting, along with our newly introduced density optimization strategy, to address these challenges. This approach bridges the gap in scene representation by enhancing explicit geometry radiance fields with MVS, and our experimental results demonstrate its effectiveness. Additionally, we have explored the potential of using Gaussian Splatting for non-face template single-process end-to-end Avatar Reconstruction, yielding promising experimental results.
- Published
- 2024
- Full Text
- View/download PDF
13. FaSS-MVS: Fast Multi-View Stereo with Surface-Aware Semi-Global Matching from UAV-Borne Monocular Imagery.
- Author
-
Ruf, Boitumelo, Weinmann, Martin, and Hinz, Stefan
- Subjects
- *
DRONE aircraft , *POINT cloud , *ACCURACY of information , *CLOUD computing , *MONOCULARS - Abstract
With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D mapping, meaning that a 3D map of the scene is immediately and incrementally generated as the image data are acquired or being received. FaSS-MVS is composed of a hierarchical processing scheme in which depth and normal data, as well as corresponding confidence scores, are estimated in a coarse-to-fine manner, allowing efficient processing of large scene depths, such as those inherent in oblique images acquired by UAVs flying at low altitudes. The actual depth estimation uses a plane-sweep algorithm for dense multi-image matching to produce depth hypotheses from which the actual depth map is extracted by means of a surface-aware semi-global optimization, reducing the fronto-parallel bias of Semi-Global Matching (SGM). Given the estimated depth map, the pixel-wise surface normal information is then computed by reprojecting the depth map into a point cloud and computing the normal vectors within a confined local neighborhood. In a thorough quantitative and ablative study, we show that the accuracy of the 3D information computed by FaSS-MVS is close to that of state-of-the-art offline multi-view stereo approaches, with the error not even an order of magnitude higher than that of COLMAP. At the same time, however, the average runtime of FaSS-MVS for estimating a single depth and normal map is less than 14% of that of COLMAP, allowing us to perform online and incremental processing of full HD images at 1–2 Hz. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. LLR-MVSNet: a lightweight network for low-texture scene reconstruction.
- Author
-
Wang, Lina, She, Jiangfeng, Zhao, Qiang, Wen, Xiang, Wan, Qifeng, and Wu, Shuangpin
- Abstract
In recent years, learning-based MVS methods have achieved excellent performance compared with traditional methods. However, these methods still have notable shortcomings, such as the low efficiency of traditional convolutional networks and simple feature fusion, which lead to incomplete reconstruction. In this research, we propose a lightweight network for low-texture scene reconstruction (LLR-MVSNet). To improve accuracy and efficiency, a lightweight network is proposed, including a multi-scale feature extraction module and a weighted feature fusion module. The multi-scale feature extraction module uses depth-separable convolution and point-wise convolution to replace traditional convolution, which can reduce network parameters and improve the model efficiency. In order to improve the fusion accuracy, a weighted feature fusion module is proposed, which can selectively emphasize features, suppress useless information and improve the fusion accuracy. With rapid computational speed and high performance, our method surpasses the state-of-the-art benchmarks and performs well on the DTU and the Tanks & Temples datasets. The code of our method will be made available at . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. SA-SatMVS: Slope Feature-Aware and Across-Scale Information Integration for Large-Scale Earth Terrain Multi-View Stereo.
- Author
-
Chen, Xiangli, Diao, Wenhui, Zhang, Song, Wei, Zhiwei, and Liu, Chunbo
- Subjects
- *
SURFACE of the earth , *DIGITAL elevation models , *REMOTE-sensing images , *FEATURE extraction , *SURFACE reconstruction - Abstract
Satellite multi-view stereo (MVS) is a fundamental task in large-scale Earth surface reconstruction. Recently, learning-based multi-view stereo methods have shown promising results in this field. However, these methods are mainly developed by transferring the general learning-based MVS framework to satellite imagery, which lacks consideration of the specific terrain features of the Earth's surface and results in inadequate accuracy. In addition, mainstream learning-based methods mainly use equal height interval partition, which insufficiently utilizes the height hypothesis surface, resulting in inaccurate height estimation. To address these challenges, we propose an end-to-end terrain feature-aware height estimation network named SA-SatMVS for large-scale Earth surface multi-view stereo, which integrates information across different scales. Firstly, we transform the Sobel operator into slope feature-aware kernels to extract terrain features, and a dual encoder–decoder architecture with residual blocks is applied to incorporate slope information and geometric structural characteristics to guide the reconstruction process. Secondly, we introduce a pixel-wise unequal interval partition method using a Laplacian distribution based on the probability volume obtained from other scales, resulting in more accurate height hypotheses for height estimation. Thirdly, we apply an adaptive spatial feature extraction network to search for the optimal fusion method for feature maps at different scales. Extensive experiments on the WHU-TLC dataset also demonstrate that our proposed model achieves the best MAE metric of 1.875 and an RMSE metric of 3.785, which constitutes a state-of-the-art performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. HC-MVSNet: A probability sampling-based multi-view-stereo network with hybrid cascade structure for 3D reconstruction.
- Author
-
Gao, Tianxiang, Hong, Zijian, Tan, Yixing, Sun, Lizhuo, Wei, Yichen, and Ma, Jianwei
- Subjects
- *
CASCADE connections , *PROBABILITY theory , *DEEP learning - Published
- 2024
- Full Text
- View/download PDF
17. Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume.
- Author
-
Han, Ming, Yin, Hui, Chong, Aixin, and Du, Qianqian
- Subjects
CASCADE connections ,REFERENCE sources ,PYRAMIDS ,INTENTION ,COST - Abstract
Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. DAR-MVSNet: a novel dual attention residual network for multi-view stereo.
- Author
-
Li, Tingshuai, Liang, Hu, Wen, Changchun, Qu, Jiacheng, Zhao, Shengrong, and Zhang, Qingmeng
- Abstract
Learning-based multi-view stereo (MVS) has shown great promise in the field of 3D reconstruction. However, existing MVS methods suffer from fixed receptive field sizes during feature learning. This issue leads to information loss and affects the understanding of the geometric structure of the scene, posing a challenge to the reconstruction quality of regions with complex geometric structures and lighting conditions. Therefore, we propose DAR-MVSNet, which consists of a dual-attention-guided feature pyramid network (DA-FPN) and a 3D residual U-net module (3D-RUM). DA-FPN includes two modules: attention-based context extraction module (ACEM) and self-attention-based module (SAM). ACEM is proposed to dilate the receptive field and preliminarily filter deep features through multiple dilated convolutions, spatial and channel attention. To further eliminate redundant characteristics, SAM is proposed to enhance the representation capability of depth features. Moreover, 3D-RUM is designed to enhance feature transfer and information flow, thereby addressing the problem of severe global feature information loss. This article demonstrates the effectiveness of DAR-MVSNet through an extensive series of experiments. The result on the DTU dataset and Tanks and Temples benchmark in comparison to state-of-the-art MVS methods verify its superior performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. A Super-Resolution and 3D Reconstruction Method Based on OmDF Endoscopic Images.
- Author
-
Sun, Fujia and Song, Wenxuan
- Subjects
- *
IMAGE reconstruction , *IMAGE processing , *HIGH resolution imaging , *TEXTURE mapping , *THREE-dimensional imaging - Abstract
In the field of endoscopic imaging, challenges such as low resolution, complex textures, and blurred edges often degrade the quality of 3D reconstructed models. To address these issues, this study introduces an innovative endoscopic image super-resolution and 3D reconstruction technique named Omni-Directional Focus and Scale Resolution (OmDF-SR). This method integrates an Omnidirectional Self-Attention (OSA) mechanism, an Omnidirectional Scale Aggregation Group (OSAG), a Dual-stream Adaptive Focus Mechanism (DAFM), and a Dynamic Edge Adjustment Framework (DEAF) to enhance the accuracy and efficiency of super-resolution processing. Additionally, it employs Structure from Motion (SfM) and Multi-View Stereo (MVS) technologies to achieve high-precision medical 3D models. Experimental results indicate significant improvements in image processing with a PSNR of 38.2902 dB and an SSIM of 0.9746 at a magnification factor of ×2, and a PSNR of 32.1723 dB and an SSIM of 0.9489 at ×4. Furthermore, the method excels in reconstructing detailed 3D models, enhancing point cloud density, mesh quality, and texture mapping richness, thus providing substantial support for clinical diagnosis and surgical planning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Refined equivalent pinhole model for large-scale 3D reconstruction from spaceborne CCD imagery
- Author
-
Danyang Hong, Anzhu Yu, Song Ji, Xuanbei Lu, Wenyue Guo, Xuefeng Cao, and Chunping Qiu
- Subjects
CCD imagery ,3D reconstruction ,Multi-view stereo ,Rational function model ,Rational polynomial coefficients ,Image geometric correction ,Physical geography ,GB3-5030 ,Environmental sciences ,GE1-350 - Abstract
Automatic 3D reconstruction from spaceborne charge-coupled device (CCD) optical imagery is still a challenge as the rational functional model (RFM) based reconstruction pipeline failed to amount to the advances of pinhole based approaches in computer vision and photogrammetry. As a consequence, the accuracy and completeness of the reconstructed surface by RFM based pipeline improved slightly recent years. Though the perspective camera approximation model was explored to convert the RFM to pinhole model, it could hardly guarantee the reconstruction accuracy due to the re-projection error introduced when approximating the linear push broom camera to perspective camera. Hence, we present a refined equivalent pinhole model (REPM) for 3D reconstruction from spaceborne CCD imagery. We initially investigated the aspects that influence the re-projection error thru mathematical induction and discovered that the image size and height range of the captured area are the two key factors. To ensure the performance of the 3D reconstruction while minimizing the re-projection error, we explored the optimal image size to crop large-scale image with, while alleviated the height range effect on the image space by re-projecting the cropped images to be closed to the pseudo-image that is captured by the approximated perspective camera. The above-mentioned improvements are implemented in an image partition module and an image geometric correction module respectively, and are encompassed in the proposed REPM-based 3D reconstruction pipeline. We conducted extensive experiments on different images covering various areas from different linear-array CCD sensors to verify the proposed approach. The results indicate that our pipeline can achieve higher accuracy and completeness and exhibits great potential. The implementation of the pipeline is available at here.
- Published
- 2024
- Full Text
- View/download PDF
21. Context-Aware Multi-view Stereo Network for Efficient Edge-Preserving Depth Estimation
- Author
-
Su, Wanjuan and Tao, Wenbing
- Published
- 2025
- Full Text
- View/download PDF
22. FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction
- Author
-
Li, Yao, Zhou, Yong, Zhao, Jiaqi, Du, Wen-Liang, and Yao, Rui
- Published
- 2024
- Full Text
- View/download PDF
23. PlaneStereo: Plane-aware Multi-view Stereo
- Author
-
Guo, Haoyu, Peng, Sida, Shen, Ting, and Zhou, Xiaowei
- Published
- 2024
- Full Text
- View/download PDF
24. Multi-view stereo algorithms based on deep learning: a survey
- Author
-
Huang, Hongbo, Yan, Xiaoxu, Zheng, Yaolin, He, Jiayu, Xu, Longfei, and Qin, Dechun
- Published
- 2024
- Full Text
- View/download PDF
25. Uanet: uncertainty-aware cost volume aggregation-based multi-view stereo for 3D reconstruction
- Author
-
Lu, Ping, Cai, Youcheng, Yang, Jiale, Wang, Dong, and Wu, Tingting
- Published
- 2024
- Full Text
- View/download PDF
26. Strand-accurate multi-view facial hair reconstruction and tracking.
- Author
-
Li, Hanchao and Liu, Xinguo
- Subjects
- *
EULER method , *POINT cloud , *ARTIFICIAL satellite tracking , *HAIR , *BEARDS - Abstract
Accurate modeling of facial hair is crucial not only for reconstructing a clean-shaven face but also for enhancing realism when creating digital human avatars. Previous methods have primarily focused on reconstructing sparse facial hairs from static scans, and more recently, efforts have been made to track facial performances. However, there is still room for improvement in reconstructing thick hairs. In this paper, we address the challenges of facial hair reconstruction and tracking, enhancing the realism and detail of facial hair in digital human avatars. For facial hair reconstruction, we propose a method that combines line-based multi-view stereo with line segment matching to recover a dense hair point cloud. From this point cloud, hair strands are extracted using the forward Euler method. For tracking, we introduce a space-time optimization method that registers the reference facial hair strands to subsequent frames, taking into account both the global shape and the motion between frames. After capturing the facial hair structures, we refine the underlying skin meshes by replacing the noisy hair region points with strand roots. We conducted experiments on various examples captured under different systems, demonstrating that our facial hair capture method outperforms previous methods in terms of accuracy and completeness. Our method provides a comprehensive and accurate solution for capturing and modeling facial hair in various facial performance scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Optimizing Application of UAV-Based SfM Photogrammetric 3D Mapping in Urban Areas.
- Author
-
Noori, Abbas Mohammed, Al-Saedi, Ali Salah J., and Abed, Fanar M.
- Subjects
- *
CITIES & towns , *COMPUTER vision , *DATA acquisition systems , *FLIGHT planning (Aeronautics) , *SPATIAL systems , *IMAGE processing , *DRONE aircraft - Abstract
In recent years, the extensive need for high-quality acquisition platforms for various 3D mapping applications has rapidly increased, especially in sensor performance, portability, and low cost. Image-based UAV sensors have overwhelming merits over alternative solutions for their high timeline and resilience data acquisition systems and the high-resolution spatial data they can provide through extensive Computer Vision (CV) data processing approaches. However, applying this technique, including the appropriate selection of flight mission and image acquisition parameters, ground settings and targeting, and Structure from Motion- Multi-View Stereo (SfM-MVS) post-processing, must be optimized to the type of study site and feature characteristics. This research focuses on optimizing the application of UAV-SfM photogrammetry in an urban area on the east bank of the Tigris River in the north region of Iraq following optimized data capturing plan and SfM-MVS photogrammetric workflow. The research presented the practical application of optimized flight planning, data acquisition, image processing, accuracy analysis, and evaluation based on ground truth targets designed for the proposed optimal routine. This includes investigating the influence of the number and distribution of GCPs, flying heights, and processing parameters on the quality of the produced 3D data. The research showed the potential of low-budget and affordable UAV devices to deliver robust 3D products in a relatively short period by demonstrating the value of UAV-based image techniques when contributed to CV algorithms. The results showed powerful outcomes with validation errors reaching a centimeter-level from 100 m flying height when applying the optimized flight plan settings and the appropriate selection of the number and distribution of GCPs. The study established a streamlined UAV mapping procedure, demonstrated the viability of UAV use for 3D mapping applications, offered suggestions for enhancing future applications, and offered clues as to whether or not UAVs could serve as a viable alternative to conventional ground-based surveying techniques in accurate applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Estimated camera trajectory-based integration among local 3D models sequentially generated from image sequences by SfM–MVS.
- Author
-
Matsumoto, Taku, Hanari, Toshihide, Kawabata, Kuniaki, Nakamura, Keita, and Yashiro, Hiroshi
- Abstract
This paper describes a three-dimensional (3D) modeling method for sequentially and spatially understanding situations in unknown environments from an image sequence acquired from a camera. The proposed method chronologically divides the image sequence into sub-image sequences by the number of images, generates local 3D models from the sub-image sequences by the Structure from Motion and Multi-View Stereo (SfM–MVS), and integrates the models. Images in each sub-image sequence partially overlap with previous and subsequent sub-image sequences. The local 3D models are integrated into a 3D model using transformation parameters computed from camera trajectories estimated by the SfM–MVS. In our experiment, we quantitatively compared the quality of integrated models with a 3D model generated from all images in a batch and the computational time to obtain these models using three real data sets acquired from a camera. Consequently, the proposed method can generate a quality integrated model that is compared with a 3D model using all images in a batch by the SfM–MVS and reduce the computational time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Research on Cultural Relic Restoration and Digital Presentation Based on 3D Reconstruction MVS Algorithm: A Case Study of Mogao Grottoes’ Cave 285
- Author
-
Gao, Mengyao, Fournier-Viger, Philippe, Series Editor, and Wang, Yulin, editor
- Published
- 2024
- Full Text
- View/download PDF
30. Charting the Landscape of Multi-view Stereo: An In-Depth Exploration of Deep Learning Techniques
- Author
-
Zhou, Zhe, Liu, Xiaozhang, Tang, Xiangyan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Tian, Yuan, editor, Ma, Tinghuai, editor, and Khan, Muhammad Khurram, editor
- Published
- 2024
- Full Text
- View/download PDF
31. Self-supervised Edge Structure Learning for Multi-view Stereo and Parallel Optimization
- Author
-
Li, Pan, Wu, Suping, Zhang, Xitie, Peng, Yuxin, Zhang, Boyang, Wang, Bin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rudinac, Stevan, editor, Hanjalic, Alan, editor, Liem, Cynthia, editor, Worring, Marcel, editor, Jónsson, Björn Þór, editor, Liu, Bei, editor, and Yamakata, Yoko, editor
- Published
- 2024
- Full Text
- View/download PDF
32. CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer
- Author
-
Wang, Sicheng, Jiang, Hao, Xiang, Lei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rudinac, Stevan, editor, Hanjalic, Alan, editor, Liem, Cynthia, editor, Worring, Marcel, editor, Jónsson, Björn Þór, editor, Liu, Bei, editor, and Yamakata, Yoko, editor
- Published
- 2024
- Full Text
- View/download PDF
33. PLKA-MVSNet: Parallel Multi-view Stereo with Large Kernel Convolution Attention
- Author
-
Huang, Bingsen, Lu, Jinzheng, Li, Qiang, Liu, Qiyuan, Lin, Maosong, Cheng, Yongqiang, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
- Published
- 2024
- Full Text
- View/download PDF
34. Multi-view Stereo by Fusing Monocular and a Combination of Depth Representation Methods
- Author
-
Yu, Fanqi, Sun, Xinyang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
- Published
- 2024
- Full Text
- View/download PDF
35. UseGeo - A UAV-based multi-sensor dataset for geospatial research
- Author
-
F. Nex, E.K. Stathopoulou, F. Remondino, M.Y. Yang, L. Madhuanand, Y. Yogender, B. Alsadik, M. Weinmann, B. Jutzi, and R. Qin
- Subjects
UAV ,LiDAR ,Monocular depth estimation ,Stereo matching ,Multi-view stereo ,3D reconstruction ,Geography (General) ,G1-922 ,Surveying ,TA501-625 - Abstract
3D reconstruction is a long-standing research topic in the photogrammetric and computer vision communities; although a plethora of open-source and commercial solutions for 3D reconstruction have been released in the last few years, several open challenges and limitations still exist. Undoubtedly, deep learning algorithms have demonstrated great potential in several remote sensing tasks, including image-based 3D reconstruction. State-of-the-art monocular and stereo algorithms leverage deep learning techniques and achieve increased performance in depth estimation and 3D reconstruction. However, one of the limitations of such methods is that they highly rely on large training sets that are often tedious to obtain; even when available, they typically refer to indoor, close-range scenarios and low-resolution images. Especially while considering UAV (Unmanned Aerial Vehicle) scenarios, such data are not available and domain adaptation is not a trivial challenge. To fill this gap, the UAV-based multi-sensor dataset for geospatial research (UseGeo - https://usegeo.fbk.eu/home) is introduced in this paper. It contains both image and LiDAR data and aims to support relevant research in photogrammetry and computer vision with a useful training set for both stereo and monocular 3D reconstruction algorithms. In this regard, the dataset provides ground truth data for both point clouds and depth maps. In addition, UseGeo can be also a valuable dataset for other tasks such as feature extraction and matching, aerial triangulation, or image and LiDAR co-registration. The paper introduces the UseGeo dataset and validates some state-of-the-art algorithms to assess their usability for both monocular and multi-view 3D reconstruction.
- Published
- 2024
- Full Text
- View/download PDF
36. Shading aware DSM generation from high resolution multi-view satellite images
- Author
-
Zhihua Hu, Pengjie Tao, Xiaoxiang Long, and Haiyan Wang
- Subjects
Shape from Shading (SfS) ,multi-view stereo ,Digital Surface Model (DSM) ,high resolution multi-view satellite images ,Mathematical geography. Cartography ,GA1-1776 ,Geodesy ,QB275-343 - Abstract
ABSTRACTIn many cases, the Digital Surface Models (DSMs) and Digital Elevation Models (DEMs) are obtained with Light Detection and Ranging (LiDAR) or stereo matching. As an active method, LiDAR is very accurate but expensive, thus often limiting its use in small-scale acquisition. Stereo matching is suitable for large-scale acquisition of terrain information as the increase of satellite stereo sensors. However, underperformance of stereo matching easily occurs in textureless areas. Accordingly, this study proposed a Shading Aware DSM GEneration Method (SADGE) with high resolution multi-view satellite images. Considering the complementarity of stereo matching and Shape from Shading (SfS), SADGE combines the advantage of stereo matching and SfS technique. First, an improved Semi-Global Matching (SGM) technique is used to generate an initial surface expressed by a DSM; then, it is refined by optimizing the objective function which modeled the imaging process with the illumination, surface albedo, and normal object surface. Different from the existing shading-based DEM refinement or generation method, no information about the illumination or the viewing angle is needed while concave/convex ambiguity can be avoided as multi-view images are utilized. Experiments with ZiYuan-3 and GaoFen-7 images show that the proposed method can generate higher accuracy DSM (12.5–56.3% improvement) with sound overall shape and temporarily detailed surface compared with a software solution (SURE) for multi-view stereo.
- Published
- 2024
- Full Text
- View/download PDF
37. BSI-MVS: multi-view stereo network with bidirectional semantic information
- Author
-
Ruiming Jia, Jun Yu, Zhenghui Hu, and Fei Yuan
- Subjects
Multi-view stereo ,Bidirectional-LSTM ,3D reconstruction ,Transformer ,Medicine ,Science - Abstract
Abstract The basic principle of multi-view stereo (MVS) is to perform 3D reconstruction by extracting depth information from multiple views. Most current SOTA MVS networks are based on Vision Transformer, which usually means expensive computational complexity. To reduce computational complexity and improve depth map accuracy, we propose a MVS network with Bidirectional Semantic Information (BSI-MVS). Firstly, we design a Multi-Level Spatial Pyramid module to generate multiple layers of feature map for extracting multi-scale information. Then we propose a 2D Bidirectional-LSTM module to capture bidirectional semantic information at different time steps in the horizontal and vertical directions, which contains abundant depth information. Finally, cost volumes are built based on various levels of feature maps to optimize the final depth map. We experiment on the DTU and BlendedMVS datasets. The result shows that our network, in terms of overall metrics, surpasses TransMVSNet, CasMVSNet, CVP-MVSNet, and AACVP-MVSNet respectively by 17.84%, 36.42%, 14.96%, and 4.86%, which also shows a noticeable performance enhancement in objective metrics and visualizations.
- Published
- 2024
- Full Text
- View/download PDF
38. Leveraging Neural Radiance Fields for Large-Scale 3D Reconstruction from Aerial Imagery
- Author
-
Max Hermann, Hyovin Kwak, Boitumelo Ruf, and Martin Weinmann
- Subjects
Neural Radiance Fields ,Multi-View Stereo ,aerial imagery ,3D reconstruction ,large-scale ,Science - Abstract
Since conventional photogrammetric approaches struggle with with low-texture, reflective, and transparent regions, this study explores the application of Neural Radiance Fields (NeRFs) for large-scale 3D reconstruction of outdoor scenes, since NeRF-based methods have recently shown very impressive results in these areas. We evaluate three approaches: Mega-NeRF, Block-NeRF, and Direct Voxel Grid Optimization, focusing on their accuracy and completeness compared to ground truth point clouds. In addition, we analyze the effects of using multiple sub-modules, estimating the visibility by an additional neural network and varying the density threshold for the extraction of the point cloud. For performance evaluation, we use benchmark datasets that correspond to the setting off standard flight campaigns and therefore typically have nadir camera perspective and relatively little image overlap, which can be challenging for NeRF-based approaches that are typically trained with significantly more images and varying camera angles. We show that despite lower quality compared to classic photogrammetric approaches, NeRF-based reconstructions provide visually convincing results in challenging areas. Furthermore, our study shows that in particular increasing the number of sub-modules and predicting the visibility using an additional neural network improves the quality of the resulting reconstructions significantly.
- Published
- 2024
- Full Text
- View/download PDF
39. BSI-MVS: multi-view stereo network with bidirectional semantic information
- Author
-
Jia, Ruiming, Yu, Jun, Hu, Zhenghui, and Yuan, Fei
- Published
- 2024
- Full Text
- View/download PDF
40. LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction.
- Author
-
Luo, Weiming, Lu, Zongqing, and Liao, Qingmin
- Subjects
- *
DEEP learning , *SHARED virtual environments - Abstract
With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Vision through Obstacles—3D Geometric Reconstruction and Evaluation of Neural Radiance Fields (NeRFs).
- Author
-
Petrovska, Ivana and Jutzi, Boris
- Subjects
- *
RADIANCE , *POINT cloud , *BINOCULAR vision - Abstract
In this contribution we evaluate the 3D geometry reconstructed by Neural Radiance Fields (NeRFs) of an object's occluded parts behind obstacles through a point cloud comparison in 3D space against traditional Multi-View Stereo (MVS), addressing the accuracy and completeness. The key challenge lies in recovering the underlying geometry, completing the occluded parts of the object and investigating if NeRFs can compete against traditional MVS for scenarios where the latter falls short. In addition, we introduce a new "obSTaclE, occLusion and visibiLity constrAints" dataset named STELLA concerning transparent and non-transparent obstacles in real-world scenarios since there is no existing dataset dedicated to this problem setting to date. Considering that the density field represents the 3D geometry of NeRFs and is solely position-dependent, we propose an effective approach for extracting the geometry in the form of a point cloud. We voxelize the whole density field and apply a 3D density-gradient based Canny edge detection filter to better represent the object's geometric features. The qualitative and quantitative results demonstrate NeRFs' ability to capture geometric details of the occluded parts in all scenarios, thus outperforming in completeness, as our voxel-based point cloud extraction approach achieves point coverage up to 93%. However, MVS remains a more accurate image-based 3D reconstruction method, deviating from the ground truth 2.26 mm and 3.36 mm for each obstacle scenario respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Shading aware DSM generation from high resolution multi-view satellite images.
- Author
-
Hu, Zhihua, Tao, Pengjie, Long, Xiaoxiang, and Wang, Haiyan
- Subjects
REMOTE-sensing images ,OPTICAL radar ,LIDAR ,DIGITAL elevation models ,STEREO vision (Computer science) ,ALBEDO - Abstract
In many cases, the Digital Surface Models (DSMs) and Digital Elevation Models (DEMs) are obtained with Light Detection and Ranging (LiDAR) or stereo matching. As an active method, LiDAR is very accurate but expensive, thus often limiting its use in small-scale acquisition. Stereo matching is suitable for large-scale acquisition of terrain information as the increase of satellite stereo sensors. However, underperformance of stereo matching easily occurs in textureless areas. Accordingly, this study proposed a Shading Aware DSM GEneration Method (SADGE) with high resolution multi-view satellite images. Considering the complementarity of stereo matching and Shape from Shading (SfS), SADGE combines the advantage of stereo matching and SfS technique. First, an improved Semi-Global Matching (SGM) technique is used to generate an initial surface expressed by a DSM; then, it is refined by optimizing the objective function which modeled the imaging process with the illumination, surface albedo, and normal object surface. Different from the existing shading-based DEM refinement or generation method, no information about the illumination or the viewing angle is needed while concave/convex ambiguity can be avoided as multi-view images are utilized. Experiments with ZiYuan-3 and GaoFen-7 images show that the proposed method can generate higher accuracy DSM (12.5–56.3% improvement) with sound overall shape and temporarily detailed surface compared with a software solution (SURE) for multi-view stereo. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. MVP-Stereo: A Parallel Multi-View Patchmatch Stereo Method with Dilation Matching for Photogrammetric Application.
- Author
-
Yan, Qingsong, Kang, Junhua, Xiao, Teng, Liu, Haibing, and Deng, Fei
- Subjects
- *
DISASTER relief , *ENVIRONMENTAL monitoring , *COMPUTATIONAL complexity , *POINT cloud - Abstract
Multi-view stereo plays an important role in 3D reconstruction but suffers from low reconstruction efficiency and has difficulties reconstructing areas with low or repeated textures. To address this, we propose MVP-Stereo, a novel multi-view parallel patchmatch stereo method. MVP-Stereo employs two key techniques. First, MVP-Stereo utilizes multi-view dilated ZNCC to handle low texture and repeated texture by dynamically adjusting the matching window size based on image variance and using a portion of pixels to calculate matching costs without increasing computational complexity. Second, MVP-Stereo leverages multi-scale parallel patchmatch to reconstruct the depth map for each image in a highly efficient manner, which is implemented by CUDA with random initialization, multi-scale parallel spatial propagation, random refinement, and the coarse-to-fine strategy. Experiments on the Strecha dataset, the ETH3D benchmark, and the UAV dataset demonstrate that MVP-Stereo can achieve competitive reconstruction quality compared to state-of-the-art methods with the highest reconstruction efficiency. For example, MVP-Stereo outperforms COLMAP in reconstruction quality by around 30 % of reconstruction time, and achieves around 90 % of the quality of ACMMP and SD-MVS in only around 20 % of the time. In summary, MVP-Stereo can efficiently reconstruct high-quality point clouds and meet the requirements of several photogrammetric applications, such as emergency relief, infrastructure inspection, and environmental monitoring. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. U-ETMVSNet: Uncertainty-Epipolar Transformer Multi-View Stereo Network for Object Stereo Reconstruction.
- Author
-
Zhao, Ning, Wang, Heng, Cui, Quanlong, and Wu, Lan
- Subjects
TRANSFORMER models ,VISUAL fields - Abstract
The Multi-View Stereo model (MVS), which utilizes 2D images from multiple perspectives for 3D reconstruction, is a crucial technique in the field of 3D vision. To address the poor correlation between 2D features and 3D space in existing MVS models, as well as the high sampling rate required for static sampling, we proposeU-ETMVSNet in this paper. Initially, we employ an integrated epipolar transformer module (ET) to establish 3D spatial correlations along epipolar lines, thereby enhancing the reliability of aggregated cost volumes. Subsequently, we devise a sampling module based on probability volume uncertainty to dynamically adjust the depth sampling range for the next stage. Finally, we utilize a multi-stage joint learning method based on multi-depth value classification to evaluate and optimize the model. Experimental results demonstrate that on the DTU dataset, our method achieves a relative performance improvement of 27.01% and 11.27% in terms of completeness error and overall error, respectively, compared to CasMVSNet, even at lower depth sampling rates. Moreover, our method exhibits excellent performance with a score of 58.60 on the Tanks &Temples dataset, highlighting its robustness and generalization capability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction.
- Author
-
Fu, Yuanhao, Zheng, Maoteng, Chen, Peiyu, and Liu, Xiuguo
- Subjects
- *
MONOCULARS , *POINT cloud , *FORECASTING , *CONSUMPTION (Economics) - Abstract
The learning‐based multi‐view stereo (MVS) methods have made remarkable progress in recent years. However, these methods exhibit limited robustness when faced with occlusion, weak or repetitive texture regions in the image. These factors often lead to holes in the final point cloud model due to excessive pixel‐matching errors. To address these challenges, we propose a novel MVS network assisted by monocular prediction for 3D reconstruction. Our approach combines the strengths of both monocular and multi‐view branches, leveraging the internal semantic information extracted from a single image through monocular prediction, along with the strict geometric relationships between multiple images. Moreover, we adopt a coarse‐to‐fine strategy to gradually reduce the number of assumed depth planes and minimise the interval between them as the resolution of the input images increases during the network iteration. This strategy can achieve a balance between the computational resource consumption and the effectiveness of the model. Experiments on the DTU, Tanks and Temples, and BlendedMVS datasets demonstrate that our method achieves outstanding results, particularly in textureless regions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. High completeness multi-view stereo for dense reconstruction of large-scale urban scenes.
- Author
-
Liao, Yongjian, Zhang, Xuexi, Huang, Nan, Fu, Chuanyu, Huang, Zijie, Cao, Qiku, Xu, Zexi, Xiong, Xiaoming, and Cai, Shuting
- Subjects
- *
OPTICAL flow , *SOURCE code , *PYRAMIDS , *PROBLEM solving , *MOTION - Abstract
Multi-View Stereo (MVS) algorithms remain a significant challenge in reconstructing a 3D model with high completeness due to the difficulty in recovering weakly textured regions and detailed parts of large-scale urban scenes. Although the Image Pyramid Structure is a popular approach for dealing with weakly textured regions, it also leads to the loss of detailed information. The proposed method solves these problems with three new strategies: (1) We propose the optical flow consistency for recovering details. The optical flow consistency improved the sensitivity of the image pyramid structure to details by estimating the motion vector of each pixel point. We proposed a novel detail restorer based on optical flow consistency which improves the link between adjacent scales in the image pyramid structure. (2) Geometric consistency based on epipolar line constraints is proposed to recover weakly textured regions. The proposed epipolar line constraints improve the robustness of traditional geometric consistency, which avoids the problem of mismatching in weakly textured regions. (3) A depth-filling strategy is utilized to fill the loss of depth information of weakly textured regions. Image gradient is utilized to fill the gap of depth information. The filled result is utilized as the priori information to smooth the depth of weakly textured regions. Experimental results on the ETH3D , UDD5 and SenseFly benchmark datasets demonstrate that the proposed method outperforms three state-of-the-art methods (ACMMP, EPNet, DeepC-MVS), significantly improving the completeness of the 3D models. The source code of the develop method is available at https://github.com/Liaoyongjian1/HC-MVS.git. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Neural Radiance Field-Inspired Depth Map Refinement for Accurate Multi-View Stereo †.
- Author
-
Ito, Shintaro, Miura, Kanta, Ito, Koichi, and Aoki, Takafumi
- Subjects
DEPTH maps (Digital image processing) ,ARTIFICIAL neural networks - Abstract
In this paper, we propose a method to refine the depth maps obtained by Multi-View Stereo (MVS) through iterative optimization of the Neural Radiance Field (NeRF). MVS accurately estimates the depths on object surfaces, and NeRF accurately estimates the depths at object boundaries. The key ideas of the proposed method are to combine MVS and NeRF to utilize the advantages of both in depth map estimation and to use NeRF for depth map refinement. We also introduce a Huber loss into the NeRF optimization to improve the accuracy of the depth map refinement, where the Huber loss reduces the estimation error in the radiance fields by placing constraints on errors larger than a threshold. Through a set of experiments using the Redwood-3dscan dataset and the DTU dataset, which are public datasets consisting of multi-view images, we demonstrate the effectiveness of the proposed method compared to conventional methods: COLMAP, NeRF, and DS-NeRF. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. A Light Multi-View Stereo Method with Patch-Uncertainty Awareness.
- Author
-
Liu, Zhen, Wu, Guangzheng, Xie, Tao, Li, Shilong, Wu, Chao, Zhang, Zhiming, and Zhou, Jiali
- Subjects
- *
FEATURE extraction , *POINT cloud , *AWARENESS , *CONSTRUCTION costs - Abstract
Multi-view stereo methods utilize image sequences from different views to generate a 3D point cloud model of the scene. However, existing approaches often overlook coarse-stage features, impacting the final reconstruction accuracy. Moreover, using a fixed range for all the pixels during inverse depth sampling can adversely affect depth estimation. To address these challenges, we present a novel learning-based multi-view stereo method incorporating attention mechanisms and an adaptive depth sampling strategy. Firstly, we propose a lightweight, coarse-feature-enhanced feature pyramid network in the feature extraction stage, augmented by a coarse-feature-enhanced module. This module integrates features with channel and spatial attention, enriching the contextual features that are crucial for the initial depth estimation. Secondly, we introduce a novel patch-uncertainty-based depth sampling strategy for depth refinement, dynamically configuring depth sampling ranges within the GRU-based optimization process. Furthermore, we incorporate an edge detection operator to extract edge features from the reference image's feature map. These edge features are additionally integrated into the iterative cost volume construction, enhancing the reconstruction accuracy. Lastly, our method is rigorously evaluated on the DTU and Tanks and Temples benchmark datasets, revealing its low GPU memory consumption and competitive reconstruction quality compared to other learning-based MVS methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Multi-View Jujube Tree Trunks Stereo Reconstruction Based on UAV Remote Sensing Imaging Acquisition System.
- Author
-
Ling, Shunkang, Li, Jingbin, Ding, Longpeng, and Wang, Nianyi
- Subjects
TREE trunks ,REMOTE sensing ,STEREO vision (Computer science) ,IMAGING systems ,DEEP learning ,JUJUBE (Plant) ,FEATURE extraction ,TRACKING radar ,DRONE aircraft - Abstract
High-quality agricultural multi-view stereo reconstruction technology is the key to precision and informatization in agriculture. Multi-view stereo reconstruction methods are an important part of 3D vision technology. In the multi-view stereo 3D reconstruction method based on deep learning, the effect of feature extraction directly affects the accuracy of reconstruction. Aiming at the actual problems in orchard fruit tree reconstruction, this paper designs an improved multi-view stereo structure based on the combination of remote sensing and artificial intelligence to realize the accurate reconstruction of jujube tree trunks. Firstly, an automatic key frame extraction method is proposed for the DSST target tracking algorithm to quickly recognize and extract high-quality data. Secondly, a composite U-Net feature extraction network is designed to enhance the reconstruction accuracy, while the DRE-Net feature extraction enhancement network improved by the parallel self-attention mechanism enhances the reconstruction completeness. Comparison tests show different levels of improvement on the Technical University of Denmark (DTU) dataset compared to other deep learning-based methods. Ablation test on the self-constructed dataset, the MVSNet + Co U-Net + DRE-Net_SA method proposed in this paper improves 20.4% in Accuracy, 12.8% in Completion, and 16.8% in Overall compared to the base model, which verifies the real effectiveness of the scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. 基于自适应聚合循环递归的稠密点云重建网络.
- Author
-
王江安, 黄乐, 庞大为, 秦林珍, and 梁温茜
- Abstract
Copyright of Journal of Graphics is the property of Journal of Graphics Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.