30 results on '"Guido Borghi"'
Search Results
2. Depth-based 3D human pose refinement: Evaluating the refinet framework
- Author
-
Andrea D’Eusanio, Alessandro Simoni, Stefano Pini, Guido Borghi, Roberto Vezzani, and Rita Cucchiara
- Subjects
Artificial Intelligence ,Signal Processing ,Computer Vision and Pattern Recognition ,Software - Published
- 2023
- Full Text
- View/download PDF
3. Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps
- Author
-
Alessandro Simoni, Stefano Pini, Guido Borghi, and Roberto Vezzani
- Subjects
Human-Computer Interaction ,FOS: Computer and information sciences ,Computer Science - Robotics ,Control and Optimization ,Artificial Intelligence ,Control and Systems Engineering ,Mechanical Engineering ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Biomedical Engineering ,Computer Vision and Pattern Recognition ,Robotics (cs.RO) ,Computer Science Applications - Abstract
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications, such as the detection of unsafe situations or the study of mutual interactions for statistical and social purposes. In this paper, we propose a non-invasive and light-invariant framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera. The method can be applied to any robot without requiring hardware access to the internal states. We introduce a novel representation of the predicted pose, namely Semi-Perspective Decoupled Heatmaps (SPDH), to accurately compute 3D joint locations in world coordinates adapting efficient deep networks designed for the 2D Human Pose Estimation. The proposed approach, which takes as input a depth representation based on XYZ coordinates, can be trained on synthetic depth data and applied to real-world settings without the need for domain adaptation techniques. To this end, we present the SimBa dataset, based on both synthetic and real depth images, and use it for the experimental evaluation. Results show that the proposed approach, made of a specific depth map representation and the SPDH, overcomes the current state of the art., Comment: IROS2022 and IEEE Robotics and Automation Letters (RA-L). Accepted June, 2022
- Published
- 2022
- Full Text
- View/download PDF
4. Anomaly Detection, Localization and Classification for Railway Inspection
- Author
-
Giuseppe Scaglione, Andrea D'Eusanio, Stefano Pini, Guido Borghi, Eugenio Fedeli, Riccardo Gasparini, Simone Calderara, Rita Cucchiara, Riccardo Gasparini, Andrea D'Eusanio, Guido Borghi, Stefano Pini, Giuseppe Scaglione, Simone Calderara, Eugenio Fedeli, and Rita Cucchiara
- Subjects
business.industry ,Computer science ,020208 electrical & electronic engineering ,Inference ,Context (language use) ,02 engineering and technology ,computer.software_genre ,railway inspection, anomaly detection ,Class (biology) ,Drone ,Task (project management) ,Image (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Anomaly detection ,Artificial intelligence ,Data mining ,business ,computer - Abstract
The ability to detect, localize and classify objects that are anomalies is a challenging task in the computer vision community. In this paper, we tackle these tasks developing a framework to automatically inspect the railway during the night. Specifically, it is able to predict the presence, the image coordinates and the class of obstacles. To deal with the low-light environment, the framework is based on thermal images and consists of three different modules that address the problem of detecting anomalies, predicting their image coordinates and classifying them. Moreover, due to the absolute lack of publicly-released datasets collected in the railway context for anomaly detection, we introduce a new multi-modal dataset, acquired from a rail drone, used to evaluate the proposed framework. Experimental results confirm the accuracy of the framework and its suitability, in terms of computational load, performance, and inference time, to be implemented on a self-powered inspection system.
- Published
- 2021
5. Improving Car Model Classification through Vehicle Keypoint Localization
- Author
-
Guido Borghi, Roberto Vezzani, Stefano Pini, Andrea D'Eusanio, Alessandro Simoni, Alessandro Simoni, Andrea D'Eusanio, Stefano Pini, Guido Borghi, and Roberto Vezzani
- Subjects
Computer science ,business.industry ,Car model ,car classification ,Computer vision ,Artificial intelligence ,business - Abstract
In this paper, we present a novel multi-task framework which aims to improve the performance of car model classification leveraging visual features and pose information extracted from single RGB images. In particular, we merge the visual features obtained through an image classification network and the features computed by a model able to predict the pose in terms of 2D car keypoints. We show how this approach considerably improves the performance on the model classification task testing our framework on a subset of the Pascal3D dataset containing the car classes. Finally, we conduct an ablation study to demonstrate the performance improvement obtained with respect to a single visual classifier network.
- Published
- 2021
- Full Text
- View/download PDF
6. Multimodal Hand Gesture Classification for the Human–Car Interaction
- Author
-
Rita Cucchiara, Andrea D'Eusanio, Stefano Pini, Alessandro Simoni, Guido Borghi, Roberto Vezzani, Andrea D’Eusanio, Alessandro Simoni, Stefano Pini, Guido Borghi, Roberto Vezzani, and Rita Cucchiara
- Subjects
Computer Networks and Communications ,Computer science ,0206 medical engineering ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Automotive industry ,Context (language use) ,02 engineering and technology ,Convolutional neural network ,natural user interface ,computer vision ,0502 economics and business ,natural user interfaces ,Computer vision ,infrared image ,050210 logistics & transportation ,Point (typography) ,lcsh:T58.5-58.64 ,business.industry ,lcsh:Information technology ,Communication ,Deep learning ,05 social sciences ,deep learning ,depth map ,020601 biomedical engineering ,hand gesture recognition ,Human-Computer Interaction ,infrared images ,depth maps ,RGB color model ,automotive ,Artificial intelligence ,User interface ,business ,Gesture - Abstract
The recent spread of low-cost and high-quality RGB-D and infrared sensors has supported the development of Natural User Interfaces (NUIs) in which the interaction is carried without the use of physical devices such as keyboards and mouse. In this paper, we propose a NUI based on dynamic hand gestures, acquired with RGB, depth and infrared sensors. The system is developed for the challenging automotive context, aiming at reducing the driver&rsquo, s distraction during the driving activity. Specifically, the proposed framework is based on a multimodal combination of Convolutional Neural Networks whose input is represented by depth and infrared images, achieving a good level of light invariance, a key element in vision-based in-car systems. We test our system on a recent multimodal dataset collected in a realistic automotive setting, placing the sensors in an innovative point of view, i.e., in the tunnel console looking upwards. The dataset consists of a great amount of labelled frames containing 12 dynamic gestures performed by multiple subjects, making it suitable for deep learning-based approaches. In addition, we test the system on a different well-known public dataset, created for the interaction between the driver and the car. Experimental results on both datasets reveal the efficacy and the real-time performance of the proposed method.
- Published
- 2020
7. A Double Siamese Framework for Differential Morphing Attack Detection
- Author
-
Guido Borghi, Davide Maltoni, Emanuele Pancisi, Matteo Ferrara, Borghi, Guido, Pancisi, Emanuele, Ferrara, Matteo, and Maltoni, Davide
- Subjects
Computer science ,face morphing ,0211 other engineering and technologies ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,TP1-1185 ,02 engineering and technology ,Biochemistry ,Facial recognition system ,Field (computer science) ,Article ,Analytical Chemistry ,Image (mathematics) ,Siamese networks ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Electrical and Electronic Engineering ,Instrumentation ,single-image morph attack detection ,021110 strategic, defence & security studies ,business.industry ,Chemical technology ,Deep learning ,Process (computing) ,differential morph attack detection ,deep learning ,Atomic and Molecular Physics, and Optics ,Morphing ,Face (geometry) ,Identity (object-oriented programming) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,morphing attack detection - Abstract
Face morphing and related morphing attacks have emerged as a serious security threat for automatic face recognition systems and a challenging research field. Therefore, the availability of effective and reliable morphing attack detectors is strongly needed. In this paper, we proposed a framework based on a double Siamese architecture to tackle the morphing attack detection task in the differential scenario, in which two images, a trusted live acquired image and a probe image (morphed or bona fide) are given as the input for the system. In particular, the presented framework aimed to merge the information computed by two different modules to predict the final score. The first one was designed to extract information about the identity of the input faces, while the second module was focused on the detection of artifacts related to the morphing process. Experimental results were obtained through several and rigorous cross-dataset tests, exploiting three well-known datasets, namely PMDB, MorphDB, and AMSL, containing automatic and manually refined facial morphed images, showing that the proposed framework was able to achieve satisfying results.
- Published
- 2021
8. Learn to See by Events: Color Frame Synthesis from Event and RGB Cameras
- Author
-
Stefano Pini, Guido Borghi, Roberto Vezzani, Stefano Pini, Guido Borghi, and Roberto Vezzani
- Subjects
FOS: Computer and information sciences ,Data processing ,Brightness ,Computer science ,business.industry ,Event (computing) ,Computer Vision and Pattern Recognition (cs.CV) ,Deep learning ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,021001 nanoscience & nanotechnology ,Asynchronous communication ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,0210 nano-technology ,Set (psychology) ,business ,event cameras - Abstract
Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to traditional cameras, their use is partially prevented by the limited applicability of traditional data processing and vision algorithms. To this aim, we present a framework which exploits the output stream of event cameras to synthesize RGB frames, relying on an initial or a periodic set of color key-frames and the sequence of intermediate events. Differently from existing work, we propose a deep learning-based frame synthesis method, consisting of an adversarial architecture combined with a recurrent module. Qualitative results and quantitative per-pixel, perceptual, and semantic evaluation on four public datasets confirm the quality of the synthesized images., Accepted as full oral at the 15th International Conference on Computer Vision Theory and Applications (VISAPP) 2020
- Published
- 2020
9. Face-from-Depth for Head Pose Estimation on Depth Images
- Author
-
Roberto Vezzani, Matteo Fabbri, Simone Calderara, Rita Cucchiara, Guido Borghi, Guido Borghi, Matteo Fabbri, Roberto Vezzani, Simone Calderara, and Rita Cucchiara
- Subjects
FOS: Computer and information sciences ,Male ,Shoulder ,Databases, Factual ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Automated Facial Recognition ,Posture ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Convolutional neural network ,Pattern Recognition, Automated ,Set (abstract data type) ,Imaging, Three-Dimensional ,Artificial Intelligence ,Component (UML) ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Computer vision ,Pose ,head pose estimation, depth cameras, depth frames, GAN, CNN ,business.industry ,Applied Mathematics ,Frame rate ,Computational Theory and Mathematics ,Hallucinating ,Face (geometry) ,Face ,RGB color model ,020201 artificial intelligence & image processing ,Female ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Neural Networks, Computer ,business ,Head ,Software ,Algorithms - Abstract
Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system. The core element of the framework is a Convolutional Neural Network, called POSEidon+, that receives as input three types of images and provides the 3D angles of the pose as output. Moreover, a Face-from-Depth component based on a Deterministic Conditional GAN model is able to hallucinate a face from the corresponding depth image. We empirically demonstrate that this positively impacts the system performances. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Experimental results show that our method overcomes several recent state-of-art works based on both intensity and depth input data, running in real-time at more than 30 frames per second., Comment: Submitted to IEEE Transactions on PAMI, updated version (second round). arXiv admin note: substantial text overlap with arXiv:1611.10195
- Published
- 2020
10. Mercury: A Vision-Based Framework for Driver Monitoring
- Author
-
Rita Cucchiara, Stefano Pini, Guido Borghi, Roberto Vezzani, Guido Borghi, Stefano Pini, Roberto Vezzani, and Rita Cucchiara
- Subjects
Pixel ,Vision based ,business.industry ,Computer science ,Deep learning ,Real-time computing ,Automotive industry ,Monitoring system ,Convolutional neural network ,Time of day ,driver monitoring ,Artificial intelligence ,Mercury (programming language) ,business ,computer ,computer.programming_language - Abstract
In this paper, we propose a complete framework, namely Mercury, that combines Computer Vision and Deep Learning algorithms to continuously monitor the driver during the driving activity. The proposed solution complies to the require-ments imposed by the challenging automotive context: the light invariance, in or-der to have a system able to work regardless of the time of day and the weather conditions. Therefore, infrared-based images, i.e. depth maps (in which each pixel corresponds to the distance between the sensor and that point in the scene), have been exploited in conjunction with traditional intensity images. Second, the non-invasivity of the system is required, since driver’s movements must not be impeded during the driving activity: in this context, the use of camer-as and vision-based algorithms is one of the best solutions. Finally, real-time per-formance is needed since a monitoring system must immediately react as soon as a situation of potential danger is detected.
- Published
- 2020
- Full Text
- View/download PDF
11. Anomaly Detection for Vision-based Railway Inspection
- Author
-
Giuseppe Scaglione, Simone Calderara, Stefano Pini, Riccardo Gasparini, Guido Borghi, Eugenio Fedeli, Rita Cucchiara, Riccardo Gasparini, Stefano Pini, Guido Borghi, Giuseppe Scaglione, Simone Calderara, Eugenio Fedeli, and Rita Cucchiara
- Subjects
Vision based ,Computer science ,business.industry ,Deep learning ,020208 electrical & electronic engineering ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Anomaly detection ,railway inspection, anomaly detection ,Railway inspection ,Drone ,Order (business) ,Computer vision ,Self-powered drone ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
The automatic inspection of railways for the detection of obstacles is a fundamental activity in order to guarantee the safety of the train transport. Therefore, in this paper, we propose a vision-based framework that is able to detect obstacles during the night, when the train circulation is usually suspended, using RGB or thermal images. Acquisition cameras and external light sources are placed in the frontal part of a rail drone and a new dataset is collected. Experiments show the accuracy of the proposed approach and its suitability, in terms of computational load, to be implemented on a self-powered drone.
- Published
- 2020
12. A Transformer-Based Network for Dynamic Hand Gesture Recognition
- Author
-
Rita Cucchiara, Stefano Pini, Alessandro Simoni, Roberto Vezzani, Andrea D'Eusanio, Guido Borghi, Andrea D’Eusanio, Alessandro Simoni, Stefano Pini, Guido Borghi, Roberto Vezzani, and Rita Cucchiara
- Subjects
Artificial neural network ,Computer science ,business.industry ,Dynamic Hand Gesture Recognition, depth maps ,Feature extraction ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Data type ,Visualization ,Gesture recognition ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences ,Transformer (machine learning model) ,Gesture - Abstract
Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.
- Published
- 2020
13. Driver Face Verification with Depth Maps
- Author
-
Stefano Pini, Rita Cucchiara, Guido Borghi, Roberto Vezzani, Guido Borghi, Stefano Pini, Roberto Vezzani, and Rita Cucchiara
- Subjects
Computer science ,02 engineering and technology ,lcsh:Chemical technology ,Biochemistry ,Article ,Analytical Chemistry ,Task (project management) ,0202 electrical engineering, electronic engineering, information engineering ,fully-convolutional network ,Point (geometry) ,Computer vision ,lcsh:TP1-1185 ,Electrical and Electronic Engineering ,Instrumentation ,business.industry ,Deep learning ,deep learning ,020206 networking & telecommunications ,depth map ,Atomic and Molecular Physics, and Optics ,Range (mathematics) ,depth maps ,Feature (computer vision) ,driver face verification ,Face (geometry) ,Key (cryptography) ,020201 artificial intelligence & image processing ,Artificial intelligence ,automotive ,Siamese model ,business - Abstract
Face verification is the task of checking if two provided images contain the face of the same person or not. In this work, we propose a fully-convolutional Siamese architecture to tackle this task, achieving state-of-the-art results on three publicly-released datasets, namely Pandora, High-Resolution Range-based Face Database (HRRFaceD), and CurtinFaces. The proposed method takes depth maps as the input, since depth cameras have been proven to be more reliable in different illumination conditions. Thus, the system is able to work even in the case of the total or partial absence of external light sources, which is a key feature for automotive applications. From the algorithmic point of view, we propose a fully-convolutional architecture with a limited number of parameters, capable of dealing with the small amount of depth data available for training and able to run in real time even on a CPU and embedded boards. The experimental results show acceptable accuracy to allow exploitation in real-world applications with in-board cameras. Finally, exploiting the presence of faces occluded by various head garments and extreme head poses available in the Pandora dataset, we successfully test the proposed system also during strong visual occlusions. The excellent results obtained confirm the efficacy of the proposed method.
- Published
- 2019
14. Automated Artifact Retouching in Morphed Images with Attention Maps
- Author
-
Guido Borghi, Annalisa Franco, Gabriele Graffieti, Davide Maltoni, Borghi G., Franco A., Graffieti G., and Maltoni D.
- Subjects
General Computer Science ,Computer science ,Feature extraction ,face morphing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Artifact (software development) ,Facial recognition system ,conditional generative adversarial network ,conditional generative adversarial networks ,General Materials Science ,Computer vision ,Visual artifact ,ComputingMethodologies_COMPUTERGRAPHICS ,Landmark ,business.industry ,Automated artifact retouching ,General Engineering ,deep neural network ,TK1-9971 ,Morphing ,deep neural networks ,single-image morphing attack detection ,Face (geometry) ,Electrical engineering. Electronics. Nuclear engineering ,Noise (video) ,Artificial intelligence ,business - Abstract
Morphing attack is an important security threat for automatic face recognition systems. High-quality morphed images, i.e. images without significant visual artifacts such as ghosts, noise, and blurring, exhibit higher chances of success, being able to fool both human examiners and commercial face verification algorithms. Therefore, the availability of large sets of high-quality morphs is fundamental for training and testing robust morphing attack detection algorithms. However, producing a high-quality morphed image is an expensive and time-consuming task since manual post-processing is generally required to remove the typical artifacts generated by landmark-based morphing techniques. This work describes an approach based on the Conditional Generative Adversarial Network paradigm for automated morphing artifact retouching and the use of Attention Maps to guide the generation process and limit the retouch to specific areas. In order to work with high-resolution images, the framework is applied on different facial crops, which, once processed and retouched, are accurately blended to reconstruct the whole morphed face. Specifically, we focus on four different squared face regions, i.e. the right and left eyes, the nose, and the mouth, that are frequently affected by artifacts. Several qualitative and quantitative experimental evaluations have been conducted to confirm the effectiveness of the proposal in terms of, among the others, pixel-wise metrics, identity preservation, and human observer analysis. Results confirm the feasibility and the accuracy of the proposed framework.
- Published
- 2021
15. A Systematic Comparison of Depth Map Representations for Face Recognition
- Author
-
Stefano Pini, Rita Cucchiara, Roberto Vezzani, Davide Maltoni, Guido Borghi, Stefano Pini, Guido Borghi, Roberto Vezzani, Davide Maltoni, and Rita Cucchiara
- Subjects
Databases, Factual ,Computer science ,Point cloud ,02 engineering and technology ,lcsh:Chemical technology ,computer.software_genre ,Biochemistry ,Convolutional neural network ,Facial recognition system ,Analytical Chemistry ,Voxel ,0202 electrical engineering, electronic engineering, information engineering ,dataset ,lcsh:TP1-1185 ,Instrumentation ,depth sensors ,Depth sensors ,depth map representations ,Atomic and Molecular Physics, and Optics ,020201 artificial intelligence & image processing ,Voxels ,Depth Map ,Facial Recognition ,Dataset ,Depth map representations ,Depth maps ,Face recognition ,Surface normal ,Algorithms ,point cloud ,Normalization (statistics) ,Depth Sensor ,Face Recognition ,Article ,Depth map ,Electrical and Electronic Engineering ,business.industry ,Surface Normal ,020207 software engineering ,Pattern recognition ,depth maps ,Pointcloud ,Neural Networks, Computer ,Artificial intelligence ,voxel ,business ,computer - Abstract
Nowadays, we are witnessing the wide diffusion of active depth sensors. However, the generalization capabilities and performance of the deep face recognition approaches that are based on depth data are hindered by the different sensor technologies and the currently available depth-based datasets, which are limited in size and acquired through the same device. In this paper, we present an analysis on the use of depth maps, as obtained by active depth sensors and deep neural architectures for the face recognition task. We compare different depth data representations (depth and normal images, voxels, point clouds), deep models (two-dimensional and three-dimensional Convolutional Neural Networks, PointNet-based networks), and pre-processing and normalization techniques in order to determine the configuration that maximizes the recognition accuracy and is capable of generalizing better on unseen data and novel acquisition settings. Extensive intra- and cross-dataset experiments, which were performed on four public databases, suggest that representations and methods that are based on normal images and point clouds perform and generalize better than other 2D and 3D alternatives. Moreover, we propose a novel challenging dataset, namely MultiSFace, in order to specifically analyze the influence of the depth map quality and the acquisition distance on the face recognition accuracy.
- Published
- 2021
- Full Text
- View/download PDF
16. Hand Gestures for the Human-Car Interaction: The Briareo Dataset
- Author
-
Stefano Pini, Rita Cucchiara, Roberto Vezzani, Fabio Manganaro, Guido Borghi, Fabio Manganaro, Stefano Pini, Guido Borghi, Roberto Vezzani, and Rita Cucchiara
- Subjects
050210 logistics & transportation ,Point (typography) ,business.industry ,Computer science ,Deep learning ,Natural User Interfaces ,05 social sciences ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Context (language use) ,02 engineering and technology ,Task (project management) ,Gesture recognition ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,User interface ,business ,Gesture - Abstract
Natural User Interfaces can be an effective way to reduce driver's inattention during the driving activity. To this end, in this paper we propose a new dataset, called Briareo, specifically collected for the hand gesture recognition task in the automotive context. The dataset is acquired from an innovative point of view, exploiting different kinds of cameras, i.e. RGB, infrared stereo, and depth, that provide various types of images and 3D hand joints. Moreover, the dataset contains a significant amount of hand gesture samples, performed by several subjects, allowing the use of deep learning-based approaches. Finally, a framework for hand gesture segmentation and classification is presented, exploiting a method introduced to assess the quality of the proposed dataset.
- Published
- 2019
- Full Text
- View/download PDF
17. Learning to Generate Facial Depth Maps
- Author
-
Roberto Vezzani, Guido Borghi, Rita Cucchiara, Filippo Grazioli, Stefano Pini, PINI, STEFANO, GRAZIOLI, FILIPPO, Guido Borghi, Roberto Vezzani, and Rita Cucchiara
- Subjects
FOS: Computer and information sciences ,Monocular ,facial depth map estimation ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Supervised learning ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,Visual appearance ,01 natural sciences ,Task (project management) ,Depth map ,Face verification ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an image-to-image approach, we combine the advantages of supervised learning and adversarial training, proposing a conditional Generative Adversarial Network that effectively learns to translate intensity face images into the corresponding depth maps. Two public datasets, namely Biwi database and Pandora dataset, are exploited to demonstrate that the proposed model generates high-quality synthetic depth images, both in terms of visual appearance and informative content. Furthermore, we show that the model is capable of predicting distinctive facial details by testing the generated depth maps through a deep model trained on authentic depth maps for the face verification task.
- Published
- 2018
18. Fully Convolutional Network for Head Detection with Depth Images
- Author
-
Diego Ballotta, Roberto Vezzani, Guido Borghi, Rita Cucchiara, Diego Ballotta, Guido Borghi, Roberto Vezzani, and Rita Cucchiara
- Subjects
head detection, depth maps ,business.industry ,Computer science ,Deep learning ,Detector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,010501 environmental sciences ,Object (computer science) ,01 natural sciences ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
Head detection and localization are one of the most investigated and demanding tasks of the Computer Vision community. These are also a key element for many disciplines, like Human Computer Interaction, Human Behavior Understanding, Face Analysis and Video Surveillance. In last decades, many efforts have been conducted to develop accurate and reliable head or face detectors on standard RGB images, but only few solutions concern other types of images, such as depth maps. In this paper, we propose a novel method for head detection on depth images, based on a deep learning approach. In particular, the presented system overcomes the classic sliding-window approach, that is often the main computational bottleneck of many object detectors, through a Fully Convolutional Network. Two public datasets, namely Pandora and Watch-n-Patch, are exploited to train and test the proposed network. Experimental results confirm the effectiveness of the method, that is able to exceed all the state-of-art works based on depth images and to run with real time performance.
- Published
- 2018
19. Hands on the wheel: a Dataset for Driver Hand Detection and Tracking
- Author
-
Rita Cucchiara, Roberto Vezzani, Guido Borghi, Elia Frigieri, Guido Borghi, Elia Frigieri, Roberto Vezzani, and Rita Cucchiara
- Subjects
050210 logistics & transportation ,Point (typography) ,Computer science ,business.industry ,Hand detection ,Automotive ,Dataset ,05 social sciences ,Automotive industry ,020206 networking & telecommunications ,Context (language use) ,02 engineering and technology ,Interaction systems ,Steering wheel ,Tracking (particle physics) ,Leap motion ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Artificial intelligence ,business ,Gesture - Abstract
The ability to detect, localize and track the hands is crucial in many applications requiring the understanding of the person behavior, attitude and interactions. In particular, this is true for the automotive context, in which hand analysis allows to predict preparatory movements for maneuvers or to investigate the driver's attention level. Moreover, due to the recent diffusion of cameras inside new car cockpits, it is feasible to use hand gestures to develop new Human-Car Interaction systems, more user-friendly and safe. In this paper, we propose a new dataset, called Turms, that consists of infrared images of driver's hands, collected from the back of the steering wheel, an innovative point of view. The Leap Motion device has been selected for the recordings, thanks to its stereo capabilities and the wide view-angle. Besides, we introduce a method to detect the presence and the location of driver's hands on the steering wheel, during driving activity tasks.
- Published
- 2018
20. Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face
- Author
-
Simone Calderara, Rita Cucchiara, Fabio Lanzi, Roberto Vezzani, Guido Borghi, Matteo Fabbri, FABBRI, MATTEO, BORGHI, GUIDO, LANZI, FABIO, Roberto Vezzani, Simone Calderara, and Rita Cucchiara
- Subjects
FOS: Computer and information sciences ,Modality (human–computer interaction) ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,Translation (geometry) ,01 natural sciences ,Luminance ,image translation, face analysis ,Hallucinating ,Face (geometry) ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
Can faces acquired by low-cost depth sensors be useful to catch some characteristic details of the face? Typically the answer is no. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper, we propose a new \textit{Deterministic Conditional GAN}, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some \textit{Perceptual Probes}, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available due to difficult luminance conditions. Experimental results are very promising and are as far as better than previously proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications., Comment: Accepted at ICPR 2018
- Published
- 2019
- Full Text
- View/download PDF
21. Head Detection with Depth Images in the Wild
- Author
-
Diego Ballotta, Guido Borghi, Rita Cucchiara, Roberto Vezzani, Ballotta, Diego, BORGHI, GUIDO, VEZZANI, Roberto, and CUCCHIARA, Rita
- Subjects
FOS: Computer and information sciences ,Dependency (UML) ,Computer science ,business.industry ,Convolutional neural network ,Depth maps ,Head detection ,Head localization ,Deep learning ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,Domain (software engineering) ,Binary classification ,Face (geometry) ,Key (cryptography) ,RGB color model ,Computer vision ,Artificial intelligence ,Depth map ,Scale (map) ,business - Abstract
Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, Human Computer Interaction and face analysis. The stunning amount of work done for detecting faces on RGB images, together with the availability of huge face datasets, allowed to setup very effective systems on that domain. However, due to illumination issues, infrared or depth cameras may be required in real applications. In this paper, we introduce a novel method for head detection on depth images that exploits the classification ability of deep learning approaches. In addition to reduce the dependency on the external illumination, depth images implicitly embed useful information to deal with the scale of the target objects. Two public datasets have been exploited: the first one, called Pandora, is used to train a deep binary classifier with face and non-face images. The second one, collected by Cornell University, is used to perform a cross-dataset test during daily activities in unconstrained environments. Experimental results show that the proposed method overcomes the performance of state-of-art methods working on depth images., Accepted as full paper (oral) at VISAPP 2018
- Published
- 2018
22. Deep Head Pose Estimation from Depth Data for In-car Automotive Applications
- Author
-
Rita Cucchiara, Roberto Vezzani, Marco Venturelli, Guido Borghi, Venturelli, Marco, BORGHI, GUIDO, VEZZANI, Roberto, and CUCCHIARA, Rita
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,Head (linguistics) ,Computer Vision and Pattern Recognition (cs.CV) ,Deep learning ,Computer Science - Computer Vision and Pattern Recognition ,Automotive industry ,020207 software engineering ,02 engineering and technology ,3D pose estimation ,Convolutional neural network ,Articulated body pose estimation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,head pose estimation, depth maps ,business ,Pose - Abstract
Recently, deep learning approaches have achieved promising results in various fields of computer vision. In this paper, we tackle the problem of head pose estimation through a Convolutional Neural Network (CNN). Differently from other proposals in the literature, the described system is able to work directly and based only on raw depth data. Moreover, the head pose estimation is solved as a regression problem and does not rely on visual facial features like facial landmarks. We tested our system on a well known public dataset, Biwi Kinect Head Pose, showing that our approach achieves state-of-art results and is able to meet real time performance requirements., 2nd International Workshop on Understanding Human Activities through 3D Sensors (ICPR 2016)
- Published
- 2018
23. Learning to Map Vehicles into Bird's Eye View
- Author
-
Simone Calderara, Andrea Palazzi, Davide Abati, Rita Cucchiara, Guido Borghi, PALAZZI, ANDREA, BORGHI, GUIDO, ABATI, DAVIDE, CALDERARA, Simone, and CUCCHIARA, Rita
- Subjects
FOS: Computer and information sciences ,050210 logistics & transportation ,Occupancy ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,05 social sciences ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Advanced driver assistance systems ,02 engineering and technology ,Advances Driver Assistance Systems ,Transformation (function) ,Component (UML) ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Dashboard ,Artificial intelligence ,business - Abstract
Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies. This paper presents a way to learn a semantic-aware transformation which maps detections from a dashboard camera view onto a broader bird's eye occupancy map of the scene. To this end, a huge synthetic dataset featuring 1M couples of frames, taken from both car dashboard and bird's eye view, has been collected and automatically annotated. A deep-network is then trained to warp detections from the first to the second view. We demonstrate the effectiveness of our model against several baselines and observe that is able to generalize on real-world data despite having been trained solely on synthetic ones., Comment: Accepted to International Conference on Image Analysis and Processing (ICIAP) 2017
- Published
- 2017
24. Embedded Recurrent Network for Head Pose Estimation in Car
- Author
-
Roberto Vezzani, Guido Borghi, Riccardo Gasparini, Rita Cucchiara, BORGHI, GUIDO, GASPARINI, RICCARDO, VEZZANI, Roberto, and CUCCHIARA, Rita
- Subjects
050210 logistics & transportation ,Engineering ,business.industry ,05 social sciences ,Automotive industry ,Context (language use) ,02 engineering and technology ,3D pose estimation ,Articulated body pose estimation ,Recurrent neural network ,Component (UML) ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,head pose estimation, depth maps ,business ,Pose - Abstract
An accurate and fast driver's head pose estimation is a rich source of information, in particular in the automotive context. Head pose is a key element for driver's behavior investigation, pose analysis, attention monitoring and also a useful component to improve the efficacy of Human-Car Interaction systems. In this paper, a Recurrent Neural Network is exploited to tackle the problem of driver head pose estimation, directly and only working on depth images to be more reliable in presence of varying or insufficient illumination. Experimental results, obtained from two public dataset, namely Biwi Kinect Head Pose and ICT-3DHP Database, prove the efficacy of the proposed method that overcomes state-of-art works. Besides, the entire system is implemented and tested on two embedded boards with real time performance.
- Published
- 2017
25. Fast and Accurate Facial Landmark Localization in Depth Images for In-car Applications
- Author
-
Rita Cucchiara, Roberto Vezzani, Guido Borghi, Elia Frigieri, Frigieri, Elia, BORGHI, GUIDO, VEZZANI, Roberto, and CUCCHIARA, Rita
- Subjects
Landmark ,business.industry ,Computer science ,Automotive industry ,020207 software engineering ,Ranging ,Context (language use) ,02 engineering and technology ,Convolutional neural network ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,facial landmarks, depth maps ,Artificial intelligence ,State (computer science) ,business - Abstract
A correct and reliable localization of facial landmark enables several applications in many fields, ranging from Human Computer Interaction to video surveillance. For instance, it can provide a valuable input to monitor the driver physical state and attention level in automotive context. In this paper, we tackle the problem of facial landmark localization through a deep approach. The developed system runs in real time and, in particular, is more reliable than state-of-the-art competitors specially in presence of light changes and poor illumination, thanks to the use of depth images as input. We also collected and shared a new realistic dataset inside a car, called MotorMark, to train and test the system. In addition, we exploited the public Eurecom Kinect Face Dataset for the evaluation phase, achieving promising results both in terms of accuracy and computational speed.
- Published
- 2017
26. From Depth Data to Head Pose Estimation: a Siamese approach
- Author
-
Rita Cucchiara, Marco Venturelli, Roberto Vezzani, Guido Borghi, Venturelli, Marco, BORGHI, GUIDO, VEZZANI, Roberto, and CUCCHIARA, Rita
- Subjects
FOS: Computer and information sciences ,Exploit ,Computer science ,business.industry ,media_common.quotation_subject ,Deep learning ,Computer Vision and Pattern Recognition (cs.CV) ,Automotive industry ,Computer Science - Computer Vision and Pattern Recognition ,Contrast (statistics) ,020206 networking & telecommunications ,02 engineering and technology ,Network layer ,Convolutional neural network ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,head pose estimation, depth maps ,Function (engineering) ,business ,Pose ,media_common - Abstract
The correct estimation of the head pose is a problem of the great importance for many applications. For instance, it is an enabling technology in automotive for driver attention monitoring. In this paper, we tackle the pose estimation problem through a deep learning network working in regression manner. Traditional methods usually rely on visual facial features, such as facial landmarks or nose tip position. In contrast, we exploit a Convolutional Neural Network (CNN) to perform head pose estimation directly from depth data. We exploit a Siamese architecture and we propose a novel loss function to improve the learning of the regression network layer. The system has been tested on two public datasets, Biwi Kinect Head Pose and ICT-3DHP database. The reported results demonstrate the improvement in accuracy with respect to current state-of-the-art approaches and the real time capabilities of the overall framework., Comment: VISAPP 2017. arXiv admin note: text overlap with arXiv:1703.01883
- Published
- 2017
- Full Text
- View/download PDF
27. POSEidon: Face-from-Depth for Driver Pose Estimation
- Author
-
Roberto Vezzani, Marco Venturelli, Rita Cucchiara, Guido Borghi, BORGHI, GUIDO, Venturelli, Marco, VEZZANI, Roberto, and CUCCHIARA, Rita
- Subjects
FOS: Computer and information sciences ,050210 logistics & transportation ,Artificial neural network ,Computer science ,business.industry ,Orientation (computer vision) ,Deep learning ,Computer Vision and Pattern Recognition (cs.CV) ,05 social sciences ,Computer Science - Computer Vision and Pattern Recognition ,Context (language use) ,02 engineering and technology ,Frame rate ,Face (geometry) ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,head pose estimation, depth maps ,business ,Pose - Abstract
Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second., Comment: Accepted in Computer Vision and Pattern Recognition (CVPR 2017)
- Published
- 2016
- Full Text
- View/download PDF
28. Fast gesture recognition with Multiple StreamDiscrete HMMs on 3D Skeletons
- Author
-
Roberto Vezzani, Rita Cucchiara, Guido Borghi, BORGHI, GUIDO, VEZZANI, Roberto, and CUCCHIARA, Rita
- Subjects
FOS: Computer and information sciences ,021110 strategic, defence & security studies ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Feature extraction ,Computer Science - Computer Vision and Pattern Recognition ,0211 other engineering and technologies ,Pattern recognition ,02 engineering and technology ,Machine learning ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Gesture recognition ,0202 electrical engineering, electronic engineering, information engineering ,Parallelism (grammar) ,HMM, action recognition, gesture recognition ,020201 artificial intelligence & image processing ,Segmentation ,State (computer science) ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Gesture - Abstract
HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high parallelism. They have worth performance even with a limited training set. All these characteristics are hard to find together in other even more accurate methods. In this paper, we propose a novel double-stage classification approach, based on Multiple Stream Discrete Hidden Markov Models (MSD-HMM) and 3D skeleton joint data, able to reach high performances maintaining all advantages listed above. The approach allows both to quickly classify pre-segmented gestures (offline classification), and to perform temporal segmentation on streams of gestures (online classification) faster than real time. We test our system on three public datasets, MSRAction3D, UTKinect-Action and MSRDailyAction, and on a new dataset, Kinteract Dataset, explicitly created for Human Computer Interaction (HCI). We obtain state of the art performances on all of them., Accepted in ICPR 2016
- Published
- 2016
29. Video synthesis from Intensity and Event Frames
- Author
-
Rita Cucchiara, Stefano Pini, Roberto Vezzani, Guido Borghi, Stefano Pini, Guido Borghi, Roberto Vezzani, and Rita Cucchiara
- Subjects
business.industry ,Event (computing) ,Computer science ,Deep learning ,020208 electrical & electronic engineering ,Frame (networking) ,Automotive industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Usability ,02 engineering and technology ,Grayscale ,Task (project management) ,Neuromorphic engineering ,event frames ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business - Abstract
Event cameras, neuromorphic devices that naturally respond to brightness changes, have multiple advantages with respect to traditional cameras. However, the difficulty of applying traditional computer vision algorithms on event data limits their usability. Therefore, in this paper we investigate the use of a deep learning-based architecture that combines an initial grayscale frame and a series of event data to estimate the following intensity frames. In particular, a fully-convolutional encoder-decoder network is employed and evaluated for the frame synthesis task on an automotive event-based dataset. Performance obtained with pixel-wise metrics confirms the quality of the images synthesized by the proposed architecture.
30. Video Frame Synthesis combining Conventional and Event Cameras
- Author
-
Guido Borghi, Stefano Pini, Roberto Vezzani, Pini Stefano, Borghi G, and Vezzani Roberto
- Subjects
Brightness ,Computer science ,business.industry ,Event (relativity) ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Object detection ,automotive ,Event cameras ,event frames ,object detection ,semantic segmentation ,simulated event frames ,video synthesis ,Artificial Intelligence ,Asynchronous communication ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,event cameras - Abstract
Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to conventional cameras, their use is limited due to the scarce compatibility of asynchronous event streams with traditional data processing and vision algorithms. In this regard, we present a framework that synthesizes RGB frames from the output stream of an event camera and an initial or a periodic set of color key-frames. The deep learning-based frame synthesis framework consists of an adversarial image-to-image architecture and a recurrent module. Two public event-based datasets, DDD17 and MVSEC, are used to obtain qualitative and quantitative per-pixel and perceptual results. In addition, we converted into event frames two additional well-known datasets, namely Kitti and Cityscapes, in order to present semantic results, in terms of object detection and semantic segmentation accuracy. Extensive experimental evaluation confirms the quality and the capability of the proposed approach of synthesizing frame sequences from color key-frames and sequences of intermediate events.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.