74 results on '"Mattoccia, Stefano"'
Search Results
2. Guest Editorial: Special Issue on Traditional Computer Vision in the Age of Deep Learning
- Author
-
Poggi, Matteo, Arrigoni, Federica, Fusiello, Andrea, Mattoccia, Stefano, Bartoli, Adrien, Sattler, Torsten, and Pajdla, Tomas
- Published
- 2024
- Full Text
- View/download PDF
3. Depth super-resolution from explicit and implicit high-frequency features
- Author
-
Qiao, Xin, Ge, Chenyang, Zhang, Youmin, Zhou, Yanhui, Tosi, Fabio, Poggi, Matteo, and Mattoccia, Stefano
- Published
- 2023
- Full Text
- View/download PDF
4. Self-supervised depth super-resolution with contrastive multiview pre-training
- Author
-
Qiao, Xin, Ge, Chenyang, Zhao, Chaoqiang, Tosi, Fabio, Poggi, Matteo, and Mattoccia, Stefano
- Published
- 2023
- Full Text
- View/download PDF
5. A computer vision approach based on deep learning for the detection of dairy cows in free stall barn
- Author
-
Tassinari, Patrizia, Bovo, Marco, Benni, Stefano, Franzoni, Simone, Poggi, Matteo, Mammi, Ludovica Maria Eugenia, Mattoccia, Stefano, Di Stefano, Luigi, Bonora, Filippo, Barbaresi, Alberto, Santolini, Enrica, and Torreggiani, Daniele
- Published
- 2021
- Full Text
- View/download PDF
6. Learning a confidence measure in the disparity domain from O(1) features
- Author
-
Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Published
- 2020
- Full Text
- View/download PDF
7. Guest Editorial: Special Issue on Embedded Computer Vision
- Author
-
Mattoccia, Stefano, Kisačanin, Branislav, Gelautz, Margrit, Chai, Sek, Belbachir, Ahmed Nabil, Dedeoglu, Goksel, and Stein, Fridtjof
- Published
- 2018
- Full Text
- View/download PDF
8. Booster: a Benchmark for Depth from Images of Specular and Transparent Surfaces
- Author
-
Ramirez, Pierluigi Zama, Costanzino, Alex, Tosi, Fabio, Poggi, Matteo, Salti, Samuele, Mattoccia, Stefano, and Di Stefano, Luigi
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx). Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. We divide the dataset into a training set, and two testing sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks respectively to highlight the open challenges and future research directions in this field., Extension of the paper "Open Challenges in Deep Stereo: the Booster Dataset" that was presented at CVPR 2022
- Published
- 2023
9. A computer vision approach based on deep learning for the detection of dairy cows in 2 free stall barn
- Author
-
Tassinari, Patrizia, Bovo, Marco, Benni, Stefano, Franzoni, Simone, Poggi, Matteo, Mammi, Ludovica Maria Eugenia, Mattoccia, Stefano, Di Stefano, Luigi, Bonora, Filippo, Barbaresi, Alberto, Santolini, Enrica, Torreggiani, Daniele, Tassinari, Patrizia, Bovo, Marco, Benni, Stefano, Franzoni, Simone, Poggi, Matteo, Mammi, Ludovica Maria Eugenia, Mattoccia, Stefano, Di Stefano, Luigi, Bonora, Filippo, Barbaresi, Alberto, Santolini, Enrica, and Torreggiani, Daniele
- Subjects
Dairy cow ,Precision livestock farming ,Computer vision ,Deep learning ,Herd management - Abstract
Precision Livestock Farming relies on several technological approaches to acquire in the most efficient way precise and up-to-date data concerning individual animals. In dairy farming, particular attention is paid to the automatic cow detection and tracking, as such information is closely related to animal welfare and thus to possible health issues. Computer vision represents a suitable and promising method for this purpose. This paper describes the first step for the development of a computer vision system, based on deep learning, aiming to recognize in real-time the individual cows, detect their positions, actions and movements and record the time history outputs for each animal. Specifically, a neural network based on deep learning techniques has been trained and validated on a case study farm, for the automatic recognition of individual cows in videos recorded in the barn. Four cows were selected to train and validate a YOLO neural network able to recognize a cow starting from the coat pattern. Then, precision-recall curves of the identification of individual cows were elaborated for both the specific target classes and the whole dataset in order to assess the performances of the network. By means of data augmentation techniques, an enlarged dataset has been created and considered in order to improve the performance of the network and to provide indications to increase detection efficiency in those cases where data acquisition is not easy to be carried out for long periods. The mean average precision of the detection, ranging from 0.64 to 0.66, showed that it is possible to properly identify individual cows based on their morphological appearance and that the piebald spotting pattern of a cow’s coat represents a clearly distinguishable object for a computer vision network. The results also led to obtain indications about the quantity and the characteristics of the images to be used for the network training in order to achieve efficient detections when facing with applications involving animals.
- Published
- 2021
10. Real-Time Self-Supervised Monocular Depth Estimation Without GPU.
- Author
-
Poggi, Matteo, Tosi, Fabio, Aleotti, Filippo, and Mattoccia, Stefano
- Abstract
Single-image depth estimation represents a longstanding challenge in computer vision and although it is an ill-posed problem, deep learning enabled astonishing results leveraging both supervised and self-supervised training paradigms. State-of-the-art solutions achieve remarkably accurate depth estimation from a single image deploying huge deep architectures, requiring powerful dedicated hardware to run in a reasonable amount of time. This overly demanding complexity makes them unsuited for a broad category of applications requiring devices with constrained resources or memory consumption. To tackle this issue, in this paper a family of compact, yet effective CNNs for monocular depth estimation is proposed, by leveraging self-supervision from a binocular stereo rig. Our lightweight architectures, namely PyD-Net and PyD-Net2, compared to complex state-of-the-art trade a small drop in accuracy to drastically reduce the runtime and memory requirements by a factor ranging from $2\times $ to $100\times $. Moreover, our networks can run real-time monocular depth estimation on a broad set of embedded or consumer devices, even not equipped with a GPU, by early stopping the inference with negligible (or no) loss in accuracy, making it ideally suited for real applications with strict constraints on hardware resources or power consumption. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. Continual Adaptation for Deep Stereo.
- Author
-
Poggi, Matteo, Tonioni, Alessio, Tosi, Fabio, Mattoccia, Stefano, and Stefano, Luigi Di
- Subjects
CONVOLUTIONAL neural networks ,STEREO vision (Computer science) ,STEREOPHONIC sound systems ,STEREO image ,PHYSIOLOGICAL adaptation ,BINOCULAR vision ,DATA distribution - Abstract
Depth estimation from stereo images is carried out with unmatched results by convolutional neural networks trained end-to-end to regress dense disparities. Like for most tasks, this is possible if large amounts of labelled samples are available for training, possibly covering the whole data distribution encountered at deployment time. Being such an assumption systematically unmet in real applications, the capacity of adapting to any unseen setting becomes of paramount importance. Purposely, we propose a continual adaptation paradigm for deep stereo networks designed to deal with challenging and ever-changing environments. We design a lightweight and modular architecture, Modularly ADaptive Network (MADNet), and formulate Modular ADaptation algorithms (MAD, MAD++) which permit efficient optimization of independent sub-portions of the entire network. In our paradigm, the learning signals needed to continuously adapt models online can be sourced from self-supervision via right-to-left image warping or from traditional stereo algorithms. With both sources, no other data than the input images being gathered at deployment time are needed. Thus, our network architecture and adaptation algorithms realize the first real-time self-adaptive deep stereo system and pave the way for a new paradigm that can facilitate practical deployment of end-to-end architectures for dense disparity regression. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. On the Synergies Between Machine Learning and Binocular Stereo for Depth Estimation From Images: A Survey.
- Author
-
Poggi, Matteo, Tosi, Fabio, Batsos, Konstantinos, Mordohai, Philippos, and Mattoccia, Stefano
- Subjects
MACHINE learning ,DEEP learning ,COMPUTER vision ,OPTICAL radar ,MONOCULARS - Abstract
Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. On the Confidence of Stereo Matching in a Deep-Learning Era: A Quantitative Evaluation.
- Author
-
Poggi, Matteo, Kim, Seungryong, Tosi, Fabio, Kim, Sunok, Aleotti, Filippo, Min, Dongbo, Sohn, Kwanghoon, and Mattoccia, Stefano
- Subjects
STEREO vision (Computer science) ,CONFIDENCE ,DEEP learning ,ESTIMATION theory ,SCIENTIFIC community ,VOLUME measurements - Abstract
Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images. Alongside with the development of more accurate algorithms, the research community focused on finding good strategies to estimate the reliability, i.e., the confidence, of estimated disparity maps. This information proves to be a powerful cue to naively find wrong matches as well as to improve the overall effectiveness of a variety of stereo algorithms according to different strategies. In this paper, we review more than ten years of developments in the field of confidence estimation for stereo matching. We extensively discuss and evaluate existing confidence measures and their variants, from hand-crafted ones to the most recent, state-of-the-art learning based methods. We study the different behaviors of each measure when applied to a pool of different stereo algorithms and, for the first time in literature, when paired with a state-of-the-art deep stereo network. Our experiments, carried out on five different standard datasets, provide a comprehensive overview of the field, highlighting in particular both strengths and limitations of learning-based strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. Monocular Depth Perception on Microcontrollers for Edge Applications.
- Author
-
Peluso, Valentino, Cipolletta, Antonio, Calimera, Andrea, Poggi, Matteo, Tosi, Fabio, Aleotti, Filippo, and Mattoccia, Stefano
- Subjects
DEPTH perception ,MONOCULARS ,COMPUTER vision ,MICROCONTROLLERS ,CONVOLUTIONAL neural networks ,DISTRIBUTED sensors ,SENSOR networks - Abstract
Depth estimation is crucial in several computer vision applications, and a recent trend in this field aims at inferring such a cue from a single camera. Unfortunately, despite the compelling results achieved, state-of-the-art monocular depth estimation methods are computationally demanding, thus precluding their practical deployment in several application contexts characterized by low-power constraints. Therefore, in this paper, we propose a lightweight Convolutional Neural Network based on a shallow pyramidal architecture, referred to as $\mu $ PyD-Net, enabling monocular depth estimation on microcontrollers. The network is trained in a peculiar self-supervised manner leveraging proxy labels obtained through a traditional stereo algorithm. Moreover, we propose optimization strategies aimed at performing computations with quantized 8-bit data and map the high-level description of the network to low-level layers optimized for the target microcontroller architecture. Exhaustive experimental results on standard datasets and an in-depth evaluation with a device belonging to the popular Arm Cortex-M family confirm that obtaining sufficiently accurate monocular depth estimation on microcontrollers is feasible. To the best of our knowledge, our proposal is the first one enabling such remarkable achievement, paving the way for the deployment of monocular depth cues onto the tiny end-nodes of distributed sensor networks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. On the Deployment of Out-of-the-Box Embedded Devices for Self-Powered River Surface Flow Velocity Monitoring at the Edge.
- Author
-
Livoroi, Arsal-Hanif, Conti, Andrea, Foianesi, Luca, Tosi, Fabio, Aleotti, Filippo, Poggi, Matteo, Tauro, Flavia, Toth, Elena, Grimaldi, Salvatore, and Mattoccia, Stefano
- Subjects
FLOW velocity ,STREAMFLOW ,ALGORITHMS ,VELOCIMETRY ,EDGES (Geometry) ,TRACKING algorithms - Abstract
As reported in the recent image velocimetry literature, tracking the motion of sparse feature points floating on the river surface as done by the Optical Tracking Velocimetry (OTV) algorithm is a promising strategy to address surface flow monitoring. Moreover, the lightweight nature of OTV coupled with computational optimizations makes it suited even for its deployment in situ to perform measurements at the edge with cheap embedded devices without the need to perform offload processing. Despite these notable achievements, the actual practical deployment of OTV in remote environments would require cheap and self-powered systems enabling continuous measurements without the need for cumbersome and expensive infrastructures rarely found in situ. Purposely, in this paper, we propose an additional simplification to the OTV algorithm to reduce even further its computational requirements, and we analyze self-powered off-the-shelf setups for in situ deployment. We assess the performance of such set-ups from different perspectives to determine the optimal solution to design a cost-effective self-powered measurement node. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
16. Good Cues to Learn From Scratch a Confidence Measure for Passive Depth Sensors.
- Author
-
Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Abstract
As reported in the stereo literature, confidence estimation represents a powerful cue to detect outliers as well as to improve depth accuracy. Purposely, we proposed a strategy enabling us to achieve state-of-the-art results by learning a confidence measure in the disparity domain only with a CNN. Since this method does not require the cost volume, it is very appealing because potentially suited for any depth-sensing technologies, including, for instance, those based on deep networks. By following this intuition, in this paper, we deeply investigate the performance of confidence estimation methods, known in the literature and new ones proposed in this paper, neglecting the use of the cost volume. Specifically, we estimate from scratch confidence measures feeding deep networks with raw depth estimates and optionally images and assess their performance deploying three datasets and three stereo algorithms. We also investigate, for the first time, their performance with disparity maps inferred by deep stereo end-to-end architectures. Moreover, we move beyond the stereo matching context, estimating confidence from depth maps generated by a monocular network. Our extensive experiments with different architectures highlight that inferring confidence prediction from the raw reference disparity only, as proposed in our previous work, is not only the most versatile solution but also the most effective one in most cases. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
17. Learning from scratch a confidence measure
- Author
-
POGGI, MATTEO, MATTOCCIA, STEFANO, Richard C. Wilson, Edwin R. Hancock and William A. P. Smith, Matteo, Poggi, and Mattoccia, Stefano
- Subjects
Confidence measure, stereo vision, deep-learning, CNN, 3D - Abstract
Stereo vision is a popular technique to infer depth from two or more images. In this field, confidence measures, typically obtained from the analysis of the cost volume, aim at detecting uncertain disparity assignments. As recently proved, multiple confidence measures combined with hand-crafted features extracted from the cost volume can be used also for other purposes and in particular to improve the overall disparity accuracy leveraging on machine learning techniques. In this paper, starting from the observation that recurrent local patterns occurring in the disparity maps can tell a correct assignment from a wrong one, we follow a completely different methodology to infer a novel confidence measure from scratch. Specifically, leveraging on Convolutional Neural Networks, we pose the confidence formulation as a regression problem by analyzing the disparity map provided by a stereo vision system. Once trained on a subset of the KITTI 2012 dataset with the disparity maps provided by the simple block-matching algorithm, our confidence measure outperforms state-of-the-art with two datasets (KITTI 2015 and Middlebury 2014) as well as with two stereo algorithms. The experimental evaluation reported clearly highlights that our approach is capable to better generalize its behavior in different circumstances with respect to state-of-the-art. Finally, not being based on cost volume analysis, our proposal is also potentially suited for out-of-the-box depth generation devices which usually do not expose the cues required by top-performing approaches.
- Published
- 2016
18. Confidence Estimation for ToF and Stereo Sensors and Its Application to Depth Data Fusion.
- Author
-
Poggi, Matteo, Agresti, Gianluca, Tosi, Fabio, Zanuttigh, Pietro, and Mattoccia, Stefano
- Abstract
Time-of-Flight (ToF) sensors and stereo vision systems are two widely used technologies for depth estimation. Due to their rather complementary strengths and limitations, the two sensors are often combined to infer more accurate depth maps. A key research issue in this field is how to estimate the reliability of the sensed depth data. While this problem has been widely studied for stereo systems, it has been seldom considered for ToF sensors. Therefore, starting from the work done for stereo data, in this paper, we firstly introduce novel confidence estimation techniques for ToF data. Moreover, we also show how by using learning-based confidence metrics jointly trained on the two sensors yields better performance. Finally, deploying different fusion frameworks, we show how confidence estimation can be exploited in order to guide the fusion of depth data from the two sensors. Experimental results show how accurate confidence cues allow outperforming state-of-the-art data fusion schemes even with the simplest fusion strategies known in the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
19. Learning a General-Purpose Confidence Measure Based on O(1) Features and a Smarter Aggregation Strategy for Semi Global Matching.
- Author
-
Poggi, Matteo and Mattoccia, Stefano
- Published
- 2016
- Full Text
- View/download PDF
20. Deep Stereo Fusion: Combining Multiple Disparity Hypotheses with Deep-Learning.
- Author
-
Poggi, Matteo and Mattoccia, Stefano
- Published
- 2016
- Full Text
- View/download PDF
21. Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures.
- Author
-
Marin, Giulio, Zanuttigh, Pietro, and Mattoccia, Stefano
- Published
- 2016
- Full Text
- View/download PDF
22. A passive RGBD sensor for accurate and real-time depth sensing self-contained into an FPGA.
- Author
-
Mattoccia, Stefano and Poggi, Matteo
- Published
- 2015
- Full Text
- View/download PDF
23. 3D Glasses as Mobility Aid for Visually Impaired People.
- Author
-
Mattoccia, Stefano and Macrı', Paolo
- Published
- 2015
- Full Text
- View/download PDF
24. Crosswalk Recognition Through Point-Cloud Processing and Deep-Learning Suited to a Wearable Mobility Aid for the Visually Impaired.
- Author
-
Poggi, Matteo, Nanni, Luca, and Mattoccia, Stefano
- Published
- 2015
- Full Text
- View/download PDF
25. Modeling and Simulation of Very High Spatial Resolution UXOs and Landmines in a Hyperspectral Scene for UAV Survey.
- Author
-
Bajić Jr., Milan, Bajić, Milan, Kaniewski, Piotr, Pasternak, Mateusz, and Mattoccia, Stefano
- Subjects
LAND mines ,SPECTRAL imaging ,SIMULATION methods & models ,CONFIDENCE intervals - Abstract
This paper presents methods for the modeling and simulation of explosive target placement in terrain spectral images (i.e., real hyperspectral 90-channel VNIR data), considering unexploded ordnances, landmines, and improvised explosive devices. The models used for landmine detection operate at sub-pixel levels. The presented research uses very fine spatial resolutions, 0.945 × 0.945 mm for targets and 1.868 × 1.868 cm for the scene, where the number of target pixels ranges from 52 to 116. While previous research has used the mean spectral value of the target, it is omitted in this paper. The model considers the probability of detection and its confidence intervals, which are derived and used in the analysis of the considered explosive targets. The detection results are better when decreased target endmembers are used to match the scene resolution, rather than using endmembers at the full resolution of the target. Unmanned aerial vehicles, as carriers of snapshot hyperspectral cameras, enable flexible target resolution selection and good area coverage. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. The Analysis and Modelling of the Quality of Information Acquired from Weather Station Sensors.
- Author
-
Stawowy, Marek, Olchowik, Wiktor, Rosiński, Adam, Dąbrowski, Tadeusz, and Mattoccia, Stefano
- Subjects
METEOROLOGICAL stations ,INFORMATION modeling ,DETECTORS ,MISSING data (Statistics) ,DATA analysis - Abstract
This article explores the quality of information acquired from weather station sensors. A review of literature in this field concludes that most publications concern the analysis of data acquired from weather station sensors and their characteristic properties, estimating the missing values from the data, and assessing the quality of weather information. Despite the large collection of studies devoted to these issues, there is no comprehensive approach that would consider the modelling of information uncertainty. Therefore, the article presents a proprietary method of analysing and modelling the uncertainty of the weather station sensors' information quality. For this purpose, the structure of a real meteorological station and the measurement data obtained from it were analysed. Next, an information quality model was developed using the certainty factor (CF) of hypothesis calculation. The developed method was verified on an exemplary real meteorological station. It was found that this method enables the improvement of the quality of information obtained and processed in a multi-sensor system. This becomes practical when the influence of individual measurement system elements on the information quality reaching the recipient is determined. An example is furnished by a demonstration of the usage of two sensors to improve the information quality. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
27. Real-Time Single Image Depth Perception in the Wild with Handheld Devices.
- Author
-
Aleotti, Filippo, Zaccaroni, Giulio, Bartolomei, Luca, Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Subjects
AUGMENTED reality ,MONOCULARS ,IMAGE ,DEEP learning - Abstract
Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
28. Enabling Image-Based Streamflow Monitoring at the Edge.
- Author
-
Tosi, Fabio, Rocca, Matteo, Aleotti, Filippo, Poggi, Matteo, Mattoccia, Stefano, Tauro, Flavia, Toth, Elena, and Grimaldi, Salvatore
- Subjects
STREAMFLOW velocity ,INTERNET access ,IMAGE processing ,BODIES of water ,INDUSTRIAL engineering ,STREAMFLOW - Abstract
Monitoring streamflow velocity is of paramount importance for water resources management and in engineering practice. To this aim, image-based approaches have proved to be reliable systems to non-intrusively monitor water bodies in remote places at variable flow regimes. Nonetheless, to tackle their computational and energy requirements, offload processing and high-speed internet connections in the monitored environments, which are often difficult to access, is mandatory hence limiting the effective deployment of such techniques in several relevant circumstances. In this paper, we advance and simplify streamflow velocity monitoring by directly processing the image stream in situ with a low-power embedded system. By leveraging its standard parallel processing capability and exploiting functional simplifications, we achieve an accuracy comparable to state-of-the-art algorithms that typically require expensive computing devices and infrastructures. The advantage of monitoring streamflow velocity in situ with a lightweight and cost-effective embedded processing device is threefold. First, it circumvents the need for wideband internet connections, which are expensive and impractical in remote environments. Second, it massively reduces the overall energy consumption, bandwidth and deployment cost. Third, when monitoring more than one river section, processing "at the very edge" of the system efficiency improves scalability by a large margin, compared to offload solutions based on remote or cloud processing. Therefore, enabling streamflow velocity monitoring in situ with low-cost embedded devices would foster the widespread diffusion of gauge cameras even in developing countries where appropriate infrastructure might be not available or too expensive. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
29. Optical Tracking Velocimetry (OTV): Leveraging Optical Flow and Trajectory-Based Filtering for Surface Streamflow Observations.
- Author
-
Tauro, Flavia, Tosi, Fabio, Mattoccia, Stefano, Toth, Elena, Piscopia, Rodolfo, and Grimaldi, Salvatore
- Subjects
VELOCIMETRY ,HYDROLOGY ,COMPUTER algorithms ,REMOTE sensing ,KALMAN filtering - Abstract
Nonintrusive image-based methods have the potential to advance hydrological streamflow observations by providing spatially distributed data at high temporal resolution. Due to their simplicity, correlation-based approaches have until recent been preferred to alternative image-based approaches, such as optical flow, for camera-based surface flow velocity estimate. In this work, we introduce a novel optical flow scheme, optical tracking velocimetry (OTV), that entails automated feature detection, tracking through the differential sparse Lucas-Kanade algorithm, and then a posteriori filtering to retain only realistic trajectories that pertain to the transit of actual objects in the field of view. The method requires minimal input on the flow direction and camera orientation. Tested on two image data sets collected in diverse natural conditions, the approach proved suitable for rapid and accurate surface flow velocity estimations. Five different feature detectors were compared and the features from accelerated segment test (FAST) resulted in the best balance between the number of features identified and successfully tracked as well as computational efficiency. OTV was relatively insensitive to reduced image resolution but was impacted by acquisition frequencies lower than 7–8 Hz. Compared to traditional correlation-based techniques, OTV was less affected by noise and surface seeding. In addition, the scheme is foreseen to be applicable to real-time gauge-cam implementations. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
30. Open Challenges in Deep Stereo: the Booster Dataset
- Author
-
Pierluigi Zama Ramirez, Fabio Tosi, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano, Ramirez, Pierluigi Zama, Tosi, Fabio, Poggi, Matteo, Salti, Samuele, Mattoccia, Stefano, and Di Stefano, Luigi
- Subjects
Datasets and evaluation ,FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,3D from multi-view and sensors - Abstract
We present a novel high-resolution and challenging stereo dataset framing indoor scenes annotated with dense and accurate ground-truth disparities. Peculiar to our dataset is the presence of several specular and transparent surfaces, i.e. the main causes of failures for state-of-the-art stereo networks. Our acquisition pipeline leverages a novel deep space-time stereo framework which allows for easy and accurate labeling with sub-pixel precision. We release a total of 419 samples collected in 64 different scenes and annotated with dense ground-truth disparities. Each sample include a high-resolution pair (12 Mpx) as well as an unbalanced pair (Left: 12 Mpx, Right: 1.1 Mpx). Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. We evaluate state-of-the-art deep networks based on our dataset, highlighting their limitations in addressing the open challenges in stereo and drawing hints for future research., CVPR 2022, New Orleans. Project page: https://cvlab-unibo.github.io/booster-web/
- Published
- 2022
31. Monocular Depth Perception on Microcontrollers for Edge Applications
- Author
-
Fabio Tosi, Matteo Poggi, Valentino Peluso, Antonio Cipolletta, Filippo Aleotti, Stefano Mattoccia, Andrea Calimera, Peluso, Valentino, Cipolletta, Antonio, Calimera, Andrea, Poggi, Matteo, Tosi, Fabio, Aleotti, Filippo, and Mattoccia, Stefano
- Subjects
IoT ,Monocular ,Computer vision, depth estimation, deep learning, optimization methods, edge computing, IoT, micro-controllers ,Computer science ,business.industry ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,deep learning ,micro-controllers ,Convolutional neural network ,Field (computer science) ,ARM architecture ,edge computing ,Media Technology ,depth estimation ,optimization methods ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Depth perception ,Wireless sensor network ,Edge computing - Abstract
Depth estimation is crucial in several computer vision applications, and a recent trend in this field aims at inferring such a cue from a single camera. Unfortunately, despite the compelling results achieved, state-of-the-art monocular depth estimation methods are computationally demanding, thus precluding their practical deployment in several application contexts characterized by low-power constraints. Therefore, in this paper, we propose a lightweight Convolutional Neural Network based on a shallow pyramidal architecture, referred to as μPyD-Net, enabling monocular depth estimation on microcontrollers. The network is trained in a peculiar self-supervised manner leveraging proxy labels obtained through a traditional stereo algorithm. Moreover, we propose optimization strategies aimed at performing computations with quantized 8-bit data and map the high-level description of the network to low-level layers optimized for the target microcontroller architecture. Exhaustive experimental results on standard datasets and an in-depth evaluation with a device belonging to the popular Arm Cortex-M family confirm that obtaining sufficiently accurate monocular depth estimation on microcontrollers is feasible. To the best of our knowledge, our proposal is the first one enabling such remarkable achievement, paving the way for the deployment of monocular depth cues onto the tiny end-nodes of distributed sensor networks.
- Published
- 2022
32. Real-Time Self-Supervised Monocular Depth Estimation Without GPU
- Author
-
Fabio Tosi, Matteo Poggi, FILIPPO ALEOTTI, Stefano Mattoccia, Poggi, Matteo, Tosi, Fabio, Aleotti, Filippo, and Mattoccia, Stefano
- Subjects
Computer vision, monocular depth estimation, deep learning, deep architectures, unsupervised learning ,Mechanical Engineering ,Automotive Engineering ,Computer Science Applications - Abstract
Single-image depth estimation represents a longstanding challenge in computer vision and although it is an ill-posed problem, deep learning enabled astonishing results leveraging both supervised and self-supervised training paradigms. State-of-the-art solutions achieve remarkably accurate depth estimation from a single image deploying huge deep architectures, requiring powerful dedicated hardware to run in a reasonable amount of time. This overly demanding complexity makes them unsuited for a broad category of applications requiring devices with constrained resources or memory consumption. To tackle this issue, in this paper a family of compact, yet effective CNNs for monocular depth estimation is proposed, by leveraging self-supervision from a binocular stereo rig. Our lightweight architectures, namely PyD-Net and PyD-Net2, compared to complex state-of-the-art trade a small drop in accuracy to drastically reduce the runtime and memory requirements by a factor ranging from 2× to 100×. Moreover, our networks can run real-time monocular depth estimation on a broad set of embedded or consumer devices, even not equipped with a GPU, by early stopping the inference with negligible (or no) loss in accuracy, making it ideally suited for real applications with strict constraints on hardware resources or power consumption.
- Published
- 2022
33. Monitoring Social Distancing With Single Image Depth Estimation
- Author
-
Matteo Poggi, FILIPPO ALEOTTI, Andrea Conti, Alessio Mingozzi, Stefano Mattoccia, Mingozzi, Alessio, Conti, Andrea, Aleotti, Filippo, Poggi, Matteo, and Mattoccia, Stefano
- Subjects
FOS: Computer and information sciences ,Computational Mathematics ,Control and Optimization ,Artificial Intelligence ,Computer vision, deep learning, depth estimation,monocular depth estimation, social distancing ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science Applications - Abstract
The recent pandemic emergency raised many challenges regarding the countermeasures aimed at containing the virus spread, and constraining the minimum distance between people resulted in one of the most effective strategies. Thus, the implementation of autonomous systems capable of monitoring the so-called social distance gained much interest. In this paper, we aim to address this task leveraging a single RGB frame without additional depth sensors. In contrast to existing single-image alternatives failing when ground localization is not available, we rely on single image depth estimation to perceive the 3D structure of the observed scene and estimate the distance between people. During the setup phase, a straightforward calibration procedure, leveraging a scale-aware SLAM algorithm available even on consumer smartphones, allows us to address the scale ambiguity affecting single image depth estimation. We validate our approach through indoor and outdoor images employing a calibrated LiDAR + RGB camera asset. Experimental results highlight that our proposal enables sufficiently reliable estimation of the inter-personal distance to monitor social distancing effectively. This fact confirms that despite its intrinsic ambiguity, if appropriately driven single image depth estimation can be a viable alternative to other depth perception techniques, more expensive and not always feasible in practical applications. Our evaluation also highlights that our framework can run reasonably fast and comparably to competitors, even on pure CPU systems. Moreover, its practical deployment on low-power systems is around the corner., Accepted for pubblication on IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI)
- Published
- 2022
34. RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation
- Author
-
Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano, Tosi, Fabio, Ramirez, Pierluigi Zama, Poggi, Matteo, Salti, Samuele, Mattoccia, Stefano, and Di Stefano, Luigi
- Subjects
FOS: Computer and information sciences ,Datasets and evaluation ,Multispectral imaging ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,3D from multi-view and sensor - Abstract
We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels in the form of disparity maps. To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera, required only during training data acquisition. In this setup, we can conveniently learn cross-modal matching in the absence of ground-truth labels by distilling knowledge from an easier RGB-RGB matching task based on a collection of about 11K unlabeled image triplets. Experiments show that the proposed pipeline sets a good performance bar (1.16 pixels average registration error) for future research on this novel, challenging task., Comment: CVPR 2022, New Orleans. Project page: https://cvlab-unibo.github.io/rgb-ms-web/
- Published
- 2022
35. Real-Time Single Image Depth Perception in the Wild with Handheld Devices
- Author
-
Matteo Poggi, Filippo Aleotti, Giulio Zaccaroni, Luca Bartolomei, Fabio Tosi, Stefano Mattoccia, Aleotti, Filippo, Zaccaroni, Giulio, Bartolomei, Luca, Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Subjects
FOS: Computer and information sciences ,Computer science ,Reliability (computer networking) ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,monocular depth estimation ,02 engineering and technology ,lcsh:Chemical technology ,smartphone ,01 natural sciences ,Biochemistry ,Article ,Analytical Chemistry ,mobile system ,Computer Science - Graphics ,Human–computer interaction ,0202 electrical engineering, electronic engineering, information engineering ,lcsh:TP1-1185 ,Electrical and Electronic Engineering ,Instrumentation ,mobile systems ,Monocular ,business.industry ,Deep learning ,010401 analytical chemistry ,deep learning ,Atomic and Molecular Physics, and Optics ,Graphics (cs.GR) ,0104 chemical sciences ,Feature (computer vision) ,020201 artificial intelligence & image processing ,Augmented reality ,Artificial intelligence ,Depth perception ,business ,Mobile device - Abstract
Depth perception is paramount to tackle real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image represents the most versatile solution, since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit its practical deployment: i) the low reliability when deployed in-the-wild and ii) the demanding resource requirements to achieve real-time performance, often not compatible with such devices. Therefore, in this paper, we deeply investigate these issues showing how they are both addressable adopting appropriate network design and training strategies -- also outlining how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time depth-aware augmented reality and image blurring with smartphones in-the-wild., Comment: 11 pages, 9 figures
- Published
- 2021
36. Enabling Image-Based Streamflow Monitoring at the Edge
- Author
-
Elena Toth, Flavia Tauro, Matteo Rocca, Fabio Tosi, Filippo Aleotti, Stefano Mattoccia, Salvatore Grimaldi, Matteo Poggi, Tosi, Fabio, Rocca, Matteo, Aleotti, Filippo, Poggi, Matteo, Mattoccia, Stefano, Tauro, Flavia, Toth, Elena, and Grimaldi, Salvatore
- Subjects
010504 meteorology & atmospheric sciences ,Computer science ,Science ,0208 environmental biotechnology ,Real-time computing ,Optical Tracking Velocimetry ,02 engineering and technology ,Energy consumption ,01 natural sciences ,computer vision ,020801 environmental engineering ,Water resources ,embedded system ,Parallel processing (DSP implementation) ,Streamflow ,Scalability ,streamflow velocity ,optical tracking velocimetry ,General Earth and Planetary Sciences ,Enhanced Data Rates for GSM Evolution ,0105 earth and related environmental sciences - Abstract
Monitoring streamflow velocity is of paramount importance for water resources management and in engineering practice. To this aim, image-based approaches have proved to be reliable systems to non-intrusively monitor water bodies in remote places at variable flow regimes. Nonetheless, to tackle their computational and energy requirements, offload processing and high-speed internet connections in the monitored environments, which are often difficult to access, is mandatory hence limiting the effective deployment of such techniques in several relevant circumstances. In this paper, we advance and simplify streamflow velocity monitoring by directly processing the image stream in situ with a low-power embedded system. By leveraging its standard parallel processing capability and exploiting functional simplifications, we achieve an accuracy comparable to state-of-the-art algorithms that typically require expensive computing devices and infrastructures. The advantage of monitoring streamflow velocity in situ with a lightweight and cost-effective embedded processing device is threefold. First, it circumvents the need for wideband internet connections, which are expensive and impractical in remote environments. Second, it massively reduces the overall energy consumption, bandwidth and deployment cost. Third, when monitoring more than one river section, processing “at the very edge” of the system efficiency improves scalability by a large margin, compared to offload solutions based on remote or cloud processing. Therefore, enabling streamflow velocity monitoring in situ with low-cost embedded devices would foster the widespread diffusion of gauge cameras even in developing countries where appropriate infrastructure might be not available or too expensive.
- Published
- 2020
- Full Text
- View/download PDF
37. Learning a confidence measure in the disparity domain from O(1) features
- Author
-
Matteo Poggi, Fabio Tosi, Stefano Mattoccia, Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Subjects
Measure (data warehouse) ,business.industry ,Computer science ,Volume (computing) ,Contrast (statistics) ,020207 software engineering ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,Domain (software engineering) ,Random forest ,Stereopsis ,confidence measure, stereo vision, machine-learning, depth perception, uncertainty estimation, random-forest, disparity ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Depth perception ,business ,computer ,Software - Abstract
Depth sensing is of paramount importance for countless applications and stereo represents a popular, effective and cheap solution for this purpose. As highlighted by recent works concerned with stereo, uncertainty estimation can be a powerful cue to improve accuracy in stereo. Most confidence measures rely on features, mainly extracted from the cost volume, fed to a random forest or a convolutional neural network trained to estimate match uncertainty. In contrast, we propose a novel strategy for confidence estimation based on features computed in the disparity domain, making our proposal suited for any stereo system including COTS devices, and in constant time. We exhaustively assess the performance of our proposals, referred to as O1 and O2, on KITTI and Middlebury datasets with three popular and different stereo algorithms (CENSUS, MC-CNN and SGM), as well as a deep stereo network (PSM-Net). We also evaluate how well confidence measures generalize to different environments/datasets.
- Published
- 2020
38. Unsupervised Domain Adaptation for Depth Prediction from Images
- Author
-
Stefano Mattoccia, Matteo Poggi, Luigi Di Stefano, Alessio Tonioni, Tonioni, Alessio, Poggi, Matteo, Mattoccia, Stefano, and Di Stefano, Luigi
- Subjects
FOS: Computer and information sciences ,Contextual image classification ,Computer science ,business.industry ,depth from stereo, depth from mono, domain adaptation, 3D ,Applied Mathematics ,Deep learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Estimator ,Pattern recognition ,Context (language use) ,02 engineering and technology ,Domain (software engineering) ,Image (mathematics) ,Computational Theory and Mathematics ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Unsupervised learning ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
State-of-the-art approaches to infer dense depth measurements from images rely on CNNs trained end-to-end on a vast amount of data. However, these approaches suffer a drastic drop in accuracy when dealing with environments much different in appearance and/or context from those observed at training time. This domain shift issue is usually addressed by fine-tuning on smaller sets of images from the target domain annotated with depth labels. Unfortunately, relying on such supervised labeling is seldom feasible in most practical settings. Therefore, we propose an unsupervised domain adaptation technique which does not require groundtruth labels. Our method relies only on image pairs and leverages on classical stereo algorithms to produce disparity measurements alongside with confidence estimators to assess upon their reliability. We propose to fine-tune both depth-from-stereo as well as depth-from-mono architectures by a novel confidence-guided loss function that handles the measured disparities as noisy labels weighted according to the estimated confidence. Extensive experimental results based on standard datasets and evaluation protocols prove that our technique can address effectively the domain shift issue with both stereo and monocular depth prediction architectures and outperforms other state-of-the-art unsupervised loss functions that may be alternatively deployed to pursue domain adaptation., Comment: 14 pages, 7 pages. Accepted to TPAMI
- Published
- 2020
39. Optical Tracking Velocimetry (OTV): Leveraging Optical Flow and Trajectory-Based Filtering for Surface Streamflow Observations
- Author
-
Elena Toth, Stefano Mattoccia, Fabio Tosi, Flavia Tauro, R. Piscopia, Salvatore Grimaldi, Tauro, Flavia, Tosi, Fabio, Mattoccia, Stefano, Toth, Elena, Piscopia, Rodolfo, and Grimaldi, Salvatore
- Subjects
large scale particle image velocimetry ,Computer science ,Science ,0208 environmental biotechnology ,Optical flow ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Field of view ,02 engineering and technology ,optical flow ,optical tracking velocimetry (OTV) ,streamflow ,Lucas-Kanade ,FAST ,feature detection ,feature tracking ,particle tracking velocimetry ,gauge-cam ,Lucas–Kanade method ,Particle tracking velocimetry ,Computer vision ,Image resolution ,Orientation (computer vision) ,business.industry ,Velocimetry ,020801 environmental engineering ,General Earth and Planetary Sciences ,Artificial intelligence ,Noise (video) ,business - Abstract
Nonintrusive image-based methods have the potential to advance hydrological streamflow observations by providing spatially distributed data at high temporal resolution. Due to their simplicity, correlation-based approaches have until recent been preferred to alternative image-based approaches, such as optical flow, for camera-based surface flow velocity estimate. In this work, we introduce a novel optical flow scheme, optical tracking velocimetry (OTV), that entails automated feature detection, tracking through the differential sparse Lucas-Kanade algorithm, and then a posteriori filtering to retain only realistic trajectories that pertain to the transit of actual objects in the field of view. The method requires minimal input on the flow direction and camera orientation. Tested on two image data sets collected in diverse natural conditions, the approach proved suitable for rapid and accurate surface flow velocity estimations. Five different feature detectors were compared and the features from accelerated segment test (FAST) resulted in the best balance between the number of features identified and successfully tracked as well as computational efficiency. OTV was relatively insensitive to reduced image resolution but was impacted by acquisition frequencies lower than 7–8 Hz. Compared to traditional correlation-based techniques, OTV was less affected by noise and surface seeding. In addition, the scheme is foreseen to be applicable to real-time gauge-cam implementations.
- Published
- 2018
- Full Text
- View/download PDF
40. Learning monocular depth estimation with unsupervised trinocular assumptions
- Author
-
Matteo Poggi, Stefano Mattoccia, Fabio Tosi, Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Subjects
FOS: Computer and information sciences ,Monocular ,Computer science ,business.industry ,monocular depth estimation, deep-learning, CNN, 3D,self-supervised ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,02 engineering and technology ,Iterative reconstruction ,Field (computer science) ,Domain (software engineering) ,Image (mathematics) ,Binocular stereo ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Single image ,business - Abstract
Obtaining accurate depth measurements out of a single image represents a fascinating solution to 3D sensing. CNNs led to considerable improvements in this field, and recent trends replaced the need for ground-truth labels with geometry-guided image reconstruction signals enabling unsupervised training. Currently, for this purpose, state-of-the-art techniques rely on images acquired with a binocular stereo rig to predict inverse depth (i.e., disparity) according to the aforementioned supervision principle. However, these methods suffer from well-known problems near occlusions, left image border, etc inherited from the stereo setup. Therefore, in this paper, we tackle these issues by moving to a trinocular domain for training. Assuming the central image as the reference, we train a CNN to infer disparity representations pairing such image with frames on its left and right side. This strategy allows obtaining depth maps not affected by typical stereo artifacts. Moreover, being trinocular datasets seldom available, we introduce a novel interleaved training procedure enabling to enforce the trinocular assumption outlined from current binocular datasets. Exhaustive experimental results on the KITTI dataset confirm that our proposal outperforms state-of-the-art methods for unsupervised monocular depth estimation trained on binocular stereo pairs as well as any known methods relying on other cues., 14 pages, 7 figures, 4 tables. Accepted to 3DV 2018
- Published
- 2018
41. Learning to Predict Stereo Reliability Enforcing Local Consistency of Confidence Maps
- Author
-
Matteo Poggi, Stefano Mattoccia, Poggi, Matteo, and Mattoccia, Stefano
- Subjects
business.industry ,Deep learning ,010401 analytical chemistry ,Feature extraction ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Measure (mathematics) ,0104 chemical sciences ,Stereopsis ,Confidence measures ,0202 electrical engineering, electronic engineering, information engineering ,Local consistency ,020201 artificial intelligence & image processing ,Confidence measure, deep-learning, machine-learning, CNN, stereo vision ,Pattern matching ,Artificial intelligence ,Data mining ,business ,computer ,Reliability (statistics) ,Mathematics - Abstract
Confidence measures estimate unreliable disparity assignments performed by a stereo matching algorithm and, as recently proved, can be used for several purposes. This paper aims at increasing, by means of a deep network, the effectiveness of state-of-the-art confidence measures exploiting the local consistency assumption. We exhaustively evaluated our proposal on 23 confidence measures, including 5 top-performing ones based on random-forests and CNNs, training our networks with two popular stereo algorithms and a small subset (25 out of 194 frames) of the KITTI 2012 dataset. Experimental results show that our approach dramatically increases the effectiveness of all the 23 confidence measures on the remaining frames. Moreover, without re-training, we report a further cross-evaluation on KITTI 2015 and Middlebury 2014 confirming that our proposal provides remarkable improvements for each confidence measure even when dealing with significantly different input data. To the best of our knowledge, this is the first method to move beyond conventional pixel-wise confidence estimation.
- Published
- 2017
42. Quantitative Evaluation of Confidence Measures in a Machine Learning World
- Author
-
Stefano Mattoccia, Matteo Poggi, Fabio Tosi, General Chairs: Katsushi Ikeuchi, Gérard Medioni, Marcello Pelillo, Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Subjects
Computer science ,business.industry ,Deep learning ,010401 analytical chemistry ,Feature extraction ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Field (computer science) ,0104 chemical sciences ,Confidence measures ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,confidence measure, machine-learning, CNN, stereo, depth ,computer ,Reliability (statistics) - Abstract
Confidence measures aim at detecting unreliable depth measurements and play an important role for many purposes and in particular, as recently shown, to improve stereo accuracy. This topic has been thoroughly investigated by Hu and Mordohai in 2010 (and 2012) considering 17 confidence measures and two local algorithms on the two datasets available at that time. However, since then major breakthroughs happened in this field: the availability of much larger and challenging datasets, novel and more effective stereo algorithms including ones based on deep learning and confidence measures leveraging on machine learning techniques. Therefore, this paper aims at providing an exhaustive and updated review and quantitative evaluation of 52 (actually, 76 considering variants) state-of-the-art confidence measures - focusing on recent ones mostly based on random-forests and deep learning - with three algorithms on the challenging datasets available today. Moreover we deal with problems inherently induced by learning-based confidence measures. How are these methods able to generalize to new data? How a specific training improves their effectiveness? How more effective confidence measures can actually improve the overall stereo accuracy?
- Published
- 2017
43. Even More Confident Predictions with Deep Machine-Learning
- Author
-
Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Subjects
business.industry ,Computer science ,010401 analytical chemistry ,Feature extraction ,Pattern recognition ,02 engineering and technology ,confidence measure, deep-learning, machine-learning, stereo vision, CNN ,Machine learning ,computer.software_genre ,01 natural sciences ,Convolutional neural network ,0104 chemical sciences ,Stereopsis ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Classifier (UML) - Abstract
Confidence measures aim at discriminating unreliable disparities inferred by a stereo vision system from reliable ones. A common and effective strategy adopted by most top-performing approaches consists in combining multiple confidence measures by means of an appropriately trained random-forest classifier. In this paper, we propose a novel approach by training an n-channel convolutional neural network on a set of feature maps, each one encoding the outcome of a single confidence measure. This strategy enables to move the confidence prediction problem from the conventional 1D feature maps domain, adopted by approaches based on random-forests, to a more distinctive 3D domain, going beyond single pixel analysis. This fact, coupled with a deep network appropriately trained on a small subset of images, enables to outperform top-performing approaches based on random-forests.
- Published
- 2017
44. Efficient confidence measures for embedded stereo
- Author
-
Fabio Tosi, Stefano Mattoccia, Matteo Poggi, Poggi, Matteo, Tosi, Fabio, and Mattoccia, Stefano
- Subjects
Stereo cameras ,Computer science ,Reliability (computer networking) ,010401 analytical chemistry ,Volume analysis ,Context (language use) ,02 engineering and technology ,01 natural sciences ,0104 chemical sciences ,Computer engineering ,Confidence measures ,0202 electrical engineering, electronic engineering, information engineering ,Memory footprint ,confidence measure, stereo vision, embedded vision, FPGA, embedded system ,020201 artificial intelligence & image processing ,Field-programmable gate array ,Focus (optics) - Abstract
The advent of embedded stereo cameras based on low-power and compact devices such as FPGAs (Field Programmable Gate Arrays) has enabled to effectively address several computer vision problems. However, being the depth data generated by stereo algorithms affected by errors, reliable strategies to detect wrong disparity assignments by means of confidence measures are desirable. Recent works proved that confidence measures are also a powerful cue to improve the overall accuracy of stereo. Most approaches aimed at predicting match reliability rely on cost volume analysis, an information seldom available as output of most embedded depth sensors. Therefore, in this paper we analyze and evaluate strategies compatible with the constraints of embedded stereo cameras. In particular, we focus our attention on methods to infer match reliability inside depth sensors based on highly constrained computing architectures such as FPGAs. We quantitatively assess, on Middlebury 2014 and KITTI 2015 datasets, the impact of different design strategies for 16 confidence measures from the literature, suited for implementation on such embedded systems. Our evaluation shows that, compared to the confidence measures typically deployed in this context and based on storing intermediate results, other approaches yield much more accurate predictions with negligible computing requirements and memory footprint. This enables for their implementation even on highly constrained architectures.
- Published
- 2017
45. Real-Time Image Distortion Correction: Analysis and Evaluation of FPGA-Compatible Algorithms
- Author
-
Paolo Di Febbo, Stefano Mattoccia, Carlo Dal Mutto, Di Febbo, Paolo, Mattoccia, Stefano, and Dal Mutto, Carlo
- Subjects
FOS: Computer and information sciences ,Distortion correction ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,05 social sciences ,Computer vision and image processing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,050301 education ,02 engineering and technology ,Porting ,Software implementation ,computer vision ,Computer Networks and Communication ,Robustness (computer science) ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Preprocessor ,020201 artificial intelligence & image processing ,Field-programmable gate array ,0503 education ,Implementation ,Algorithm ,FPGA - Abstract
Image distortion correction is a critical pre-processing step for a variety of computer vision and image processing algorithms. Standard real-time software implementations are generally not suited for direct hardware porting, so appropriated versions need to be designed in order to obtain implementations deployable on FPGAs. In this paper, hardware-compatible techniques for image distortion correction are introduced and analyzed in details. The considered solutions are compared in terms of output quality by using a geometrical-error-based approach, with particular emphasis on robustness with respect to increasing lens distortion. The required amount of hardware resources is also estimated for each considered approach., To be published in Proceedings of the International Conference on Reconfigurable Computing and FPGAs, Cancun, Mexico, 30 November, 2 December 2016
- Published
- 2016
46. Evaluation of variants of the SGM algorithm aimed at implementation on embedded or reconfigurable devices
- Author
-
Matteo Poggi, Stefano Mattoccia, Poggi, Matteo, and Mattoccia, Stefano
- Subjects
Matching (statistics) ,Stereo cameras ,Computer science ,Pipeline (computing) ,Reconfigurable device ,Real-time computing ,Embedded vision ,02 engineering and technology ,Frame rate ,Stereo vision ,Computer Graphics and Computer-Aided Design ,Field (computer science) ,020202 computer hardware & architecture ,Stereopsis ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Media Technology ,020201 artificial intelligence & image processing ,Field-programmable gate array ,Algorithm ,Real-time ,FPGA ,3D - Abstract
Inferring dense depth from stereo is crucial for several computer vision applications and stereo cameras based on embedded systems and/or reconfigurable devices such as FPGA became quite popular in the past years. In this field Semi Global Matching (SGM) is, in most cases, the preferred algorithm due to its good trade-off between accuracy and computation requirements. Nevertheless, a careful design of the processing pipeline enables significant improvements in terms of disparity map accuracy, hardware resources and frame rate. In particular factors like the amount of matching costs and parameters, such as the number/selection of scanlines, and so on have a great impact on the overall resource requirements. In this paper we evaluate different variants of the SGM algorithm suited for implementation on embedded or reconfigurable devices looking for the best compromise in terms of resource requirements, accuracy of the disparity estimation and running time. To assess quantitatively the effectiveness of the considered variants we adopt the KITTI 2015 training dataset, a challenging and standard benchmark with ground truth containing several realistic scenes.
- Published
- 2016
47. Deep stereo fusion: Combining multiple disparity hypotheses with deep-learning
- Author
-
Stefano Mattoccia, Matteo Poggi, Poggi, Matteo, and Mattoccia, Stefano
- Subjects
Stereo cameras ,business.industry ,Computer science ,3D vision, stereo vision, computer vision, Machine learning ,Deep learning ,010401 analytical chemistry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,01 natural sciences ,Convolutional neural network ,0104 chemical sciences ,Stereopsis ,Feature (computer vision) ,Encoding (memory) ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Computer stereo vision - Abstract
Stereo matching is a popular technique to infer depth from two or more images and wealth of methods have been proposed to deal with this problem. Despite these efforts, finding accurate stereo correspondences is still an open problem. The strengths and weaknesses of existing methods are often complementary and in this paper, motivated by recent trends in this field, we exploit this fact by proposing Deep Stereo Fusion, a Convolutional Neural Network capable of combining the output of multiple stereo algorithms in order to obtain more accurate result with respect to each input disparity map. Deep Stereo Fusion process a 3D features vector, encoding both spatial and cross-algorithm information, in order to select the best disparity hypothesis among those proposed by the single stereo matchers. To the best of our knowledge, our proposal is the first i) to leverage on deep learning and ii) able to predict the optimal disparity assignments by taking only as input cue the disparity maps. This second feature makes our method suitable for deployment even when other cues (e.g., confidence) are not available such as when dealing with disparity maps provided by off-the-shelf 3D sensors. We thoroughly evaluate our proposal on the KITTI stereo benchmark with respect state-of-the-art in this field.
- Published
- 2016
48. Reliable fusion of ToF and stereo depth driven by confidence measures
- Author
-
Giulio Marin, Stefano Mattoccia, Pietro Zanuttigh, Leibe, B, Matas, J, Sebe, N, Welling, M, Marin, Giulio, Zanuttigh, Pietro, and Mattoccia, Stefano
- Subjects
Time-of-Flight ,Fusion ,Matching (graph theory) ,business.industry ,Computer science ,010401 analytical chemistry ,Confidence metrics ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,Image segmentation ,Confidence metrics, Data fusion, Stereo vision, Time-of-Flight ,Data fusion ,Sensor fusion ,01 natural sciences ,Stereo vision ,0104 chemical sciences ,Stereopsis ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Bilateral filter ,Artificial intelligence ,business - Abstract
In this paper we propose a framework for the fusion of depth data produced by a Time-of-Flight (ToF) camera and stereo vision system. Initially, depth data acquired by the ToF camera are upsampled by an ad-hoc algorithm based on image segmentation and bilateral filtering. In parallel a dense disparity map is obtained using the Semi-Global Matching stereo algorithm. Reliable confidence measures are extracted for both the ToF and stereo depth data. In particular, ToF confidence also accounts for the mixed-pixel effect and the stereo confidence accounts for the relationship between the pointwise matching costs and the cost obtained by the semi-global optimization. Finally, the two depth maps are synergically fused by enforcing the local consistency of depth data accounting for the confidence of the two data sources at each location. Experimental results clearly show that the proposed method produces accurate high resolution depth maps and outperforms the compared fusion algorithms.
- Published
- 2016
49. Learning a general-purpose confidence measure based on O(1) features and a smarter aggregation strategy for semi global matching
- Author
-
Matteo Poggi, Stefano Mattoccia, Poggi, Matteo, and Mattoccia, Stefano
- Subjects
Matching (statistics) ,business.industry ,Computer science ,Reliability (computer networking) ,010401 analytical chemistry ,Feature extraction ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Scan line ,0104 chemical sciences ,stereo vision, machine learning, confidence measure, 3D ,Footprint ,Stereopsis ,0202 electrical engineering, electronic engineering, information engineering ,Memory footprint ,Overhead (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer - Abstract
Inferring dense depth from stereo is crucial for several computer vision applications and Semi Global Matching (SGM) is often the preferred choice due to its good tradeoff between accuracy and computation requirements. Nevertheless, it suffers of two major issues: streaking artifacts caused by the Scanline Optimization (SO) approach, at the core of this algorithm, may lead to inaccurate results and the high memory footprint that may become prohibitive with high resolution images or devices with constrained resources. In this paper, we propose a smart scanline aggregation approach for SGM aimed at dealing with both issues. In particular, the contribution of this paper is threefold: i) leveraging on machine learning, proposes a novel generalpurpose confidence measure suited for any for stereo algorithm, based on O(1) features, that outperforms state of-the-art ii) taking advantage of this confidence measure proposes a smart aggregation strategy for SGM enabling significant improvements with a very small overhead iii) the overall strategy drastically reduces the memory footprint of SGM and, at the same time, improves its effectiveness and execution time. We provide extensive experimental results, including a cross-validation with multiple datasets (KITTI 2012, KITTI 2015 and Middlebury 2014).
- Published
- 2016
50. Neural Disparity Refinement.
- Author
-
Tosi F, Aleotti F, Ramirez PZ, Poggi M, Salti S, Mattoccia S, and Stefano LD
- Abstract
We propose a framework that combines traditional, hand-crafted algorithms and recent advances in deep learning to obtain high-quality, high-resolution disparity maps from stereo images. By casting the refinement process as a continuous feature sampling strategy, our neural disparity refinement network can estimate an enhanced disparity map at any output resolution. Our solution can process any disparity map produced by classical stereo algorithms, as well as those predicted by modern stereo networks or even different depth-from-images approaches, such as the COLMAP structure-from-motion pipeline. Nonetheless, when deployed in the former configuration, our framework performs at its best in terms of zero-shot generalization from synthetic to real images. Moreover, its continuous formulation allows for easily handling the unbalanced stereo setup very diffused in mobile phones.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.