48 results on '"Depth video"'
Search Results
2. Dual-View Spatio-Temporal Interactive Network for Video Human Action Recognition
- Author
-
Wu, Hanbo, Ma, Xin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lan, Xuguang, editor, Mei, Xuesong, editor, Jiang, Caigui, editor, Zhao, Fei, editor, and Tian, Zhiqiang, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Strategies for enhancing deep video encoding efficiency using the Convolutional Neural Network in a hyperautomation mechanism
- Author
-
Xiaolan Wang
- Subjects
Variable resolution ,Convolutional Neural Network ,Depth video ,Video coding efficiency ,Hyperautomation mechanism ,Medicine ,Science - Abstract
Abstract With ongoing social progress, three-dimensional (3D) video is becoming increasingly prevalent in everyday life. As a key component of 3D video technology, depth video plays a crucial role by providing information about the distance and spatial distribution of objects within a scene. This study focuses on deep video encoding and proposes an efficient encoding method that integrates the Convolutional Neural Network (CNN) with a hyperautomation mechanism. First, an overview of the principles underlying CNNs and the concept of hyperautomation is presented, and the application of CNNs in the intra-frame prediction module of video encoding is explored. By incorporating the hyperautomation mechanism, this study emphasizes the potential of Artificial Intelligence to enhance encoding efficiency. Next, a CNN-based method for variable-resolution intra-frame prediction of depth video is proposed. This method utilizes a multi-level feature fusion network to reconstruct coding units. The effectiveness of the proposed variable-resolution coding technique is then evaluated by comparing its performance against the original method on the high-efficiency video coding (HEVC) test platform. The results demonstrate that, compared to the original test platform method (HTM-16.2), the proposed method achieves an average Bjøntegaard delta bit rate (BDBR) savings of 8.12% across all tested video sequences. This indicates a significant improvement in coding efficiency. Furthermore, the viewpoint BDBR loss of the variable-resolution coding method is only 0.15%, which falls within an acceptable margin of error. This suggests that the method is both stable and reliable in viewpoint coding, and it performs well across a broad range of quantization parameter settings. Additionally, compared to other encoding methods, the proposed approach exhibits superior peak signal-to-noise ratio, structural similarity index, and perceptual quality metrics. This study introduces a novel and efficient approach to 3D video compression, and the integration of CNNs with hyperautomation provides valuable insights for future innovations in video encoding.
- Published
- 2025
- Full Text
- View/download PDF
4. A validation study demonstrating portable motion capture cameras accurately characterize gait metrics when compared to a pressure-sensitive walkway
- Author
-
Kevin A. Mazurek, Leland Barnard, Hugo Botha, Teresa Christianson, Jonathan Graff-Radford, Ronald Petersen, Prashanthi Vemuri, B. Gwen Windham, David T. Jones, and Farwa Ali
- Subjects
Depth video ,Gait analysis ,Motion capture ,Portable cameras ,Point-cloud ,Medicine ,Science - Abstract
Abstract Digital quantification of gait can be used to measure aging- and disease-related decline in mobility. Gait performance also predicts prognosis, disease progression, and response to therapies. Most gait analysis systems require large amounts of space, resources, and expertise to implement and are not widely accessible. Thus, there is a need for a portable system that accurately characterizes gait. Here, depth video from two portable cameras accurately reconstructed gait metrics comparable to those reported by a pressure-sensitive walkway. 392 research participants walked across a four-meter pressure-sensitive walkway while depth video was recorded. Gait speed, cadence, and step and stride durations and lengths strongly correlated (r > 0.9) between modalities, with root-mean-squared-errors (RMSE) of 0.04 m/s, 2.3 steps/min, 0.03 s, and 0.05–0.08 m for speed, cadence, step/stride duration, and step/stride length, respectively. Step, stance, and double support durations (gait cycle percentage) significantly correlated (r > 0.6) between modalities, with 5% RMSE for step and stance and 10% RMSE for double support. In an exploratory analysis, gait speed from both modalities significantly related to healthy, mild, moderate, or severe categorizations of Charleson Comorbidity Indices (ANOVA, Tukey’s HSD, p
- Published
- 2024
- Full Text
- View/download PDF
5. A validation study demonstrating portable motion capture cameras accurately characterize gait metrics when compared to a pressure-sensitive walkway.
- Author
-
Mazurek, Kevin A., Barnard, Leland, Botha, Hugo, Christianson, Teresa, Graff-Radford, Jonathan, Petersen, Ronald, Vemuri, Prashanthi, Windham, B. Gwen, Jones, David T., and Ali, Farwa
- Subjects
MOTION capture (Human mechanics) ,WALKING speed ,MOTION analysis ,VIDEO recording ,DISEASE progression - Abstract
Digital quantification of gait can be used to measure aging- and disease-related decline in mobility. Gait performance also predicts prognosis, disease progression, and response to therapies. Most gait analysis systems require large amounts of space, resources, and expertise to implement and are not widely accessible. Thus, there is a need for a portable system that accurately characterizes gait. Here, depth video from two portable cameras accurately reconstructed gait metrics comparable to those reported by a pressure-sensitive walkway. 392 research participants walked across a four-meter pressure-sensitive walkway while depth video was recorded. Gait speed, cadence, and step and stride durations and lengths strongly correlated (r > 0.9) between modalities, with root-mean-squared-errors (RMSE) of 0.04 m/s, 2.3 steps/min, 0.03 s, and 0.05–0.08 m for speed, cadence, step/stride duration, and step/stride length, respectively. Step, stance, and double support durations (gait cycle percentage) significantly correlated (r > 0.6) between modalities, with 5% RMSE for step and stance and 10% RMSE for double support. In an exploratory analysis, gait speed from both modalities significantly related to healthy, mild, moderate, or severe categorizations of Charleson Comorbidity Indices (ANOVA, Tukey's HSD, p < 0.0125). These findings demonstrate the viability of using depth video to expand access to quantitative gait assessments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Human Fall Detection in Depth-Videos Using Temporal Templates and Convolutional Neural Networks
- Author
-
Paul Ijjina, Earnest, Tsihrintzis, George A., Series Editor, Virvou, Maria, Series Editor, Jain, Lakhmi C., Series Editor, and Esposito, Anna, editor
- Published
- 2022
- Full Text
- View/download PDF
7. Fast Depth Map Coding Based on Bayesian Decision Theorem for 3D-HEVC
- Author
-
Dongyao Zou, Pu Dai, and Qiuwen Zhang
- Subjects
3D-HEVC ,depth video ,complexity ,Bayesian ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The depth map compression of 3D High-Efficiency Video Coding (3D-HEVC) inherits prediction structure adopted by HEVC and develops supplementary coding modes to better express depth image. These new coding modes combine with existing technology implement high coding efficiency, but lead to an extremely huge increase in coding time. This article proposes a rapid coding method for depth map in 3D-HEVC. The proposed scheme utilizes the Bayesian decision rule and the correlations between corresponding texture video and spatially adjacent treeblocks to analyze the treeblock features of depth map. According to the analysis, we propose two approaches, including early SKIP/Merge mode selection, and adaptive CU pruning termination. Simulation consequences illustrate that this paper can save 51.2% complexity while maintaining the rate distortion (RD) performance.
- Published
- 2022
- Full Text
- View/download PDF
8. Zoom motion estimation for color and depth videos using depth information
- Author
-
Soon-kak Kwon and Dong-seok Lee
- Subjects
Zoom motion Estimation ,Inter prediction ,Depth video ,Depth video coding ,Electronics ,TK7800-8360 - Abstract
Abstract In this paper, two methods of zoom motion estimation for color and depth videos by using depth information are proposed. Color and depth videos are independently estimated for zoom motion. Zoom for color video is scaled by spatial domain, and depth video is scaled by both spatial and depth domains. For color video, instead of existing methods of zoom motion estimation that apply all of possible zoom ratios for a current block, the zoom ratio of the proposed method is determined as the ratio of the average depth values of the current and reference blocks. Then, the reference block is resized by multiplying the zoom ratio and the reference block is mapped to the current block. For depth video, the reference block is first scaled in the spatial direction by the same methodology used for color video and then scaled by a distance ratio from a camera to the objects. Compared to the conventional motion estimation method, the proposed method reduces MSE by up to about 30% for the color video and up to about 85% for the depth video.
- Published
- 2020
- Full Text
- View/download PDF
9. Temporal Activity Segmentation for Depth Cameras Using Joint Angle Variance Features
- Author
-
Jafar, Syed, Singh, Pranav Kumar, Bhavsar, Arnav, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Chaudhuri, Bidyut B., editor, Kankanhalli, Mohan S., editor, and Raman, Balasubramanian, editor
- Published
- 2018
- Full Text
- View/download PDF
10. S3D-CNN: skeleton-based 3D consecutive-low-pooling neural network for fall detection.
- Author
-
Xiong, Xin, Min, Weidong, Zheng, Wei-Shi, Liao, Pin, Yang, Hao, and Wang, Shuai
- Subjects
MOVEMENT sequences ,AUTUMN ,HUMAN behavior ,ALGORITHMS - Abstract
Most existing deep-learning-based fall detection methods use either 2D neural network without considering movement representation sequences, or whole sequences instead of only those in the fall period. These characteristics result in inaccurate extraction of human action features and failure to detect falls due to background interferences or activity representation beyond the fall period. To alleviate these problems, a skeleton-based 3D consecutive-low-pooling neural network (S3D-CNN) for fall detection is proposed in this paper. In the S3D-CNN, an activity feature clustering selector is designed to extract the skeleton representation in depth videos using pose estimation algorithm and form optimized skeleton sequence of fall period. A 3D consecutive-low-pooling (3D-CLP) neural network is proposed to process these representation sequences by improving network in terms of layer number, pooling kernel size, and single input frame number. The proposed method is evaluated on public and self-collected datasets respectively, outperforming the existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. Human Height Estimation by Color Deep Learning and Depth 3D Conversion.
- Author
-
Lee, Dong-seok, Kim, Jong-soo, Jeong, Seok Chan, and Kwon, Soon-kak
- Subjects
COLOR image processing ,DEEP learning ,HUMAN body ,CONVOLUTIONAL neural networks ,ALTITUDES ,EUCLIDEAN distance - Abstract
In this study, an estimation method for human height is proposed using color and depth information. Color images are used for deep learning by mask R-CNN to detect a human body and a human head separately. If color images are not available for extracting the human body region due to low light environment, then the human body region is extracted by comparing between current frame in depth video and a pre-stored background depth image. The topmost point of the human head region is extracted as the top of the head and the bottommost point of the human body region as the bottom of the foot. The depth value of the head top-point is corrected to a pixel value that has high similarity to a neighboring pixel. The position of the body bottom-point is corrected by calculating a depth gradient between vertically adjacent pixels. Two head-top and foot-bottom points are converted into 3D real-world coordinates using depth information. Two real-world coordinates estimate human height by measuring a Euclidean distance. Estimation errors for human height are corrected as the average of accumulated heights. In experiment results, we achieve that the estimated errors of human height with a standing state are 0.7% and 2.2% when the human body region is extracted by mask R-CNN and the background depth image, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Fast Depth Video Coding with Intra Prediction on VVC.
- Author
-
Hongan Wei, Binqian Zhou, Ying Fang, Yiwen Xu, and Tiesong Zhao
- Subjects
VIDEO coding ,FORECASTING ,ALGORITHMS - Abstract
In the stereoscopic or multiview display, the depth video illustrates visual distances between objects and camera. To promote the computational efficiency of depth video encoder, we exploit the intra prediction of depth videos under Versatile Video Coding (VVC) and observe a diverse distribution of intra prediction modes with different coding unit sizes. We propose a hybrid scheme to further boost fast depth video coding. In the first stage, we adaptively predict the HADamard (HAD) costs of intra prediction modes and initialize a candidate list according to the HAD costs. Then, the candidate list is further improved by considering the probability distribution of candidate modes with different CU sizes. Finally, early termination of CU splitting is performed at each CU depth level based on the Bayesian theorem. Our proposed method is incorporated into VVC intra prediction for fast coding of depth videos. Experiments with 7 standard sequences and 4 Quantization parameters (Qps) validate the efficiency of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. Zoom motion estimation for color and depth videos using depth information.
- Author
-
Kwon, Soon-kak and Lee, Dong-seok
- Subjects
MOTION ,VIDEOS ,IMAGE stabilization ,COLORS ,MOTION analysis ,STREAMING video & television ,VIDEO coding ,REFERENCE values - Abstract
In this paper, two methods of zoom motion estimation for color and depth videos by using depth information are proposed. Color and depth videos are independently estimated for zoom motion. Zoom for color video is scaled by spatial domain, and depth video is scaled by both spatial and depth domains. For color video, instead of existing methods of zoom motion estimation that apply all of possible zoom ratios for a current block, the zoom ratio of the proposed method is determined as the ratio of the average depth values of the current and reference blocks. Then, the reference block is resized by multiplying the zoom ratio and the reference block is mapped to the current block. For depth video, the reference block is first scaled in the spatial direction by the same methodology used for color video and then scaled by a distance ratio from a camera to the objects. Compared to the conventional motion estimation method, the proposed method reduces MSE by up to about 30% for the color video and up to about 85% for the depth video. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. First Approach to Automatic Measurement of Frontal Plane Projection Angle During Single Leg Landing Based on Depth Video
- Author
-
Bailon, Carlos, Damas, Miguel, Pomares, Hector, Banos, Oresti, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, García, Carmelo R., editor, Caballero-Gil, Pino, editor, Burmester, Mike, editor, and Quesada-Arencibia, Alexis, editor
- Published
- 2016
- Full Text
- View/download PDF
15. A Low Complexity Compressed Sensing-Based Codec for Consumer Depth Video Sensors.
- Author
-
Wang, Shengwei, Yu, Li, and Xiang, Sen
- Subjects
- *
VIDEO coding , *POWER resources , *COMPRESSED sensing , *ENERGY consumption , *CONSUMPTION (Economics) , *VIDEOS - Abstract
In 3-D applications, a high-quality depth video codec with low complexity is in great demand for consumer devices, which generally have limited computational resources and energy. Based on the compressed sensing theory, we propose a low complexity depth video codec to compress depth videos effectively. The proposed codec decomposes depth blocks via an adaptive wavelet decomposition algorithm by following the principle of local entropy minimization. The decomposition can reduce the amount of local data, and separate depth blocks by frequency precisely. In order to reduce temporal redundancy, a block average value-based fast motion estimation scheme is also designed. Further, a joint optimization method is proposed to select the best combination of quantization parameter and measurement rate for each block. The experimental results demonstrate that the proposed codec, compared with H.265 and H.264, achieves BD-PSNR improvement up to 1.34 dB and 4.28 dB at most respectively in “AllIntra” mode. In “IPPP” mode, the proposed codec also achieves BD-PSNR improvement up to 0.62 dB and 1.79 dB on average respectively. Moreover, compared with other methods, the complexity and energy consumption of the proposed codec is much lower, which is adapted to consumer devices well. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
16. Moving Human Detection Based on Depth Interframe Difference
- Author
-
Xu, Hongwei, Liu, Jie, Ming, Yue, Kacprzyk, Janusz, Series editor, Patnaik, Srikanta, editor, and Li, Xiaolong, editor
- Published
- 2014
- Full Text
- View/download PDF
17. Quantifying Progression of Multiple Sclerosis via Classification of Depth Videos
- Author
-
Kontschieder, Peter, Dorn, Jonas F., Morrison, Cecily, Corish, Robert, Zikic, Darko, Sellen, Abigail, D’Souza, Marcus, Kamm, Christian P., Burggraaff, Jessica, Tewarie, Prejaas, Vogel, Thomas, Azzarito, Michela, Glocker, Ben, Chin, Peter, Dahlke, Frank, Polman, Chris, Kappos, Ludwig, Uitdehaag, Bernard, Criminisi, Antonio, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Golland, Polina, editor, Hata, Nobuhiko, editor, Barillot, Christian, editor, Hornegger, Joachim, editor, and Howe, Robert, editor
- Published
- 2014
- Full Text
- View/download PDF
18. Human Height Estimation by Color Deep Learning and Depth 3D Conversion
- Author
-
Dong-seok Lee, Jong-soo Kim, Seok Chan Jeong, and Soon-kak Kwon
- Subjects
human-height estimation ,depth video ,depth 3D conversion ,artificial intelligence ,convolutional neural networks ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
In this study, an estimation method for human height is proposed using color and depth information. Color images are used for deep learning by mask R-CNN to detect a human body and a human head separately. If color images are not available for extracting the human body region due to low light environment, then the human body region is extracted by comparing between current frame in depth video and a pre-stored background depth image. The topmost point of the human head region is extracted as the top of the head and the bottommost point of the human body region as the bottom of the foot. The depth value of the head top-point is corrected to a pixel value that has high similarity to a neighboring pixel. The position of the body bottom-point is corrected by calculating a depth gradient between vertically adjacent pixels. Two head-top and foot-bottom points are converted into 3D real-world coordinates using depth information. Two real-world coordinates estimate human height by measuring a Euclidean distance. Estimation errors for human height are corrected as the average of accumulated heights. In experiment results, we achieve that the estimated errors of human height with a standing state are 0.7% and 2.2% when the human body region is extracted by mask R-CNN and the background depth image, respectively.
- Published
- 2020
- Full Text
- View/download PDF
19. Unsupervised Learning Spatio-temporal Features for Human Activity Recognition from RGB-D Video Data
- Author
-
Chen, Guang, Zhang, Feihu, Giuliani, Manuel, Buckl, Christian, Knoll, Alois, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Herrmann, Guido, editor, Pearson, Martin J., editor, Lenz, Alexander, editor, Bremner, Paul, editor, Spiers, Adam, editor, and Leonards, Ute, editor
- Published
- 2013
- Full Text
- View/download PDF
20. A Monitoring System for Home-Based Physiotherapy Exercises
- Author
-
Ar, Ilktan, Akgul, Yusuf Sinan, Gelenbe, Erol, editor, and Lent, Ricardo, editor
- Published
- 2013
- Full Text
- View/download PDF
21. Action recognition by fusing depth video and skeletal data information.
- Author
-
Kapsouras, Ioannis and Nikolaidis, Nikos
- Subjects
BAG-of-words model (Computer science) ,KINECT (Motion sensor) ,VIDEOS ,COMPUTER vision ,DIGITAL image processing - Abstract
Two action recognition approaches that utilize depth videos and skeletal information are proposed in this paper. Dense trajectories are used to represent the depth video data. Skeletal data are represented by vectors of skeleton joints positions and their forward differences in various temporal scales. The extracted features are encoded using either Bag of Words (BoW) or Vector of Locally Aggregated Descriptors (VLAD) approaches. Finally, a Support Vector Machine (SVM) is used for classification. Experiments were performed on three datasets, namely MSR Action3D, MSR Action Pairs and Florence3D in order to measure the performance of the methods. The proposed approaches outperform all state of the art action recognition methods that operate on depth video/skeletal data in the most challenging and fair experimental setup of the MSR Action3D dataset. Moreover, they achieve 100% correct recognition in the MSR Action Pairs dataset and the highest classification rate among all compared methods on the Florence3D dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
22. Cross-View Multi-Lateral Filter for Compressed Multi-View Depth Video.
- Author
-
Yang, You, Liu, Qiong, He, Xin, and Liu, Zhen
- Subjects
- *
VIDEO recording , *ERRORS , *VIRTUAL reality , *SEMANTICS , *IMAGING systems - Abstract
Multi-view depth is crucial for describing positioning information in 3D space for virtual reality, free viewpoint video, and other interaction- and remote-oriented applications. However, in cases of lossy compression for bandwidth limited remote applications, the quality of multi-view depth video suffers from quantization errors, leading to the generation of obvious artifacts in consequent virtual view rendering during interactions. Considerable efforts must be made to properly address these artifacts. In this paper, we propose a cross-view multi-lateral filtering scheme to improve the quality of compressed depth maps/videos within the framework of asymmetric multi-view video with depth compression. Through this scheme, a distorted depth map is enhanced via non-local candidates selected from current and neighboring viewpoints of different time-slots. Specifically, these candidates are clustered into a macro super pixel denoting the physical and semantic cross-relationships of the cross-view, spatial and temporal priors. The experimental results show that gains from static depth maps and dynamic depth videos can be obtained from PSNR and SSIM metrics, respectively. In subjective evaluations, even object contours are recovered from a compressed depth video. We also verify our method via several practical applications. For these verifications, artifacts on object contours are properly managed for the development of interactive video and discontinuous object surfaces are restored for 3D modeling. Our results suggest that the proposed filter outperforms state-of-the-art filters and is suitable for use in multi-view color plus depth-based interaction- and remote-oriented applications. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
23. 基于彩色-深度视频和CLDS 的手语识别.
- Author
-
张淑军, 彭中, and 王传旭
- Abstract
Copyright of Journal of Data Acquisition & Processing / Shu Ju Cai Ji Yu Chu Li is the property of Editorial Department of Journal of Nanjing University of Aeronautics & Astronautics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2019
- Full Text
- View/download PDF
24. Fall detection without people: A simulation approach tackling video data scarcity.
- Author
-
Mastorakis, Georgios, Ellis, Tim, and Makris, Dimitrios
- Subjects
- *
ACCIDENTAL falls , *HAUSDORFF spaces , *VELOCITY , *MORPHOLOGY , *COHORT analysis - Abstract
We propose an intelligent system to detect human fall events using a physics-based myoskeletal simulation, detecting falls by comparing the simulation with a fall velocity profile using the Hausdorff distance. Previous methods of fall detection are trained using recordings of acted falls which are limited in number, body variability and type of fall and can be unrepresentative of real falls. The paper demonstrates that the use of fall recordings are unnecessary for modelling the fall as the simulation engine can produce a variety of fall events customised to an individual’s physical characteristics using myoskeletal models of different morphology, without pre-knowledge of the falling behaviour. To validate this methodological approach, the simulation is customised by the person’s height, modelling a rigid fall type. This approach allows the detection to be tailored to cohorts in the population (such as the elderly or the infirm) that are not represented in existing fall datasets. The method has been evaluated on several publicly available datasets which show that our method outperforms the results of previously reported research in fall detection. Finally, our approach is demonstrated to be robust to occlusions that hide up to 50% of a fall, which increases the applicability of automatic fall detection in a real-world environment such as the home. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
25. Phase-based frame rate up-conversion for depth video.
- Author
-
Zhou, Lunan, Sun, Rui, Tian, Xiang, and Chen, Yaowu
- Subjects
- *
VIDEO processing , *INTERPOLATION , *THREE-dimensional display systems , *OPTICAL flow , *PIXELS - Abstract
Due to the limits of frame rate sampling like time-of-flight and transmission bandwidth constraints of depth video, depth frame rate up-conversion is imperative. Most researches interpolate the depth frames using the motion information of the texture images, which require well-rectified images. They usually find dense correspondences to synthesize an in-between frame. Estimating the correspondences is sensitive to occlusion, disocclusion, and color or luminance changes. This paper proposes a phase-based method to interpolate depth frames without explicitly finding correspondences. Combining top-down phase difference preservation and bottom-up phase difference correction, varying amounts of motions are well modeled in a complex-steerable pyramid decomposition. The pixel amplitude is also shifted and interpolated based on the phase information. Finally, a depth value correction is employed to correct the inaccurate depth value. Experimental results show that the depth video interpolated by our proposed approach provides better subjective and objective qualities. ©2018 SPIE and IS&T [DOI: 10.1117/1.JEI.27.4.043036] [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
26. Region Adaptive R- $\lambda$ Model-Based Rate Control for Depth Maps Coding.
- Author
-
Lei, Jianjun, He, Xiaoxu, Yuan, Hui, Wu, Feng, Ling, Nam, and Hou, Chunping
- Subjects
- *
DEPTH maps (Digital image processing) , *ALGORITHMS , *IMAGE processing , *ARTIFICIAL intelligence , *VIDEO codecs - Abstract
In this paper, a novel rate-control algorithm based on the region adaptive R- $\lambda $ model is proposed for depth maps coding. First, in order to obtain an accurate rate control for depth maps coding, a modified frame level bit allocation method based on coding bits statistical distribution of depth maps is proposed. Second, considering that different areas in a depth map have an imparity effect on virtual view rendering, the blocks of the depth map are divided into two types, namely, interested blocks for virtual view rending (IBV) and noninterested blocks for virtual view rending (NIBV). Then, two different R- $\lambda $ models are derived for IBV and NIBV, respectively. The optimal bitrates for IBV and NIBV are determined by solving an optimization problem. After that, based on the regional R- $\lambda $ models, the optimal Lagrange multipliers are calculated for both IBV and NIBV. Finally, the largest coding unit (LCU) level rate control is performed by adaptively adjusting the Lagrange multiplier to avoid blocking artifacts and smooth the quality of coding. Experimental results demonstrate that the proposed method can achieve considerable BD-PSNR gains compared with the unified rate-quantization model and conventional R- $\lambda $ model-based algorithms in terms of rendered virtual views quality. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
27. Modern Technology for Image processing and Computer vision -A Review.
- Author
-
Al Smadi Takialddin and Sallam Al Zoubi
- Subjects
IMAGE processing ,COMPUTER vision ,VIDEO processing ,AUGMENTED reality ,BAYESIAN analysis - Abstract
This survey outlines the use of computer vision in Image and video processing in multidisciplinary applications; either in academia or industry, which are active in this field. The scope of this paper covers the theoretical and practical aspects in image and video processing in addition of computer vision, from essential research to evolution of application. In this work a various subjects of image processing and computer vision will be demonstrated, these subjects are spanned from the evolution of mobile augmented reality (MAR) applications, to augmented reality under 3D modeling and real time depth imaging, video processing algorithms will be discussed to get higher depth video compression, beside that in the field of mobile platform an automatic computer vision system for citrus fruit has been implemented, where the Bayesian classification with Boundary Growing to detect the text in the video scene. Also the paper illustrates the usability of the handed interactive method to the portable projector based on augmented reality. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
28. Action recognition for depth video using multi-view dynamic images.
- Author
-
Xiao, Yang, Chen, Jun, Wang, Yancheng, Cao, Zhiguo, Tianyi Zhou, Joey, and Bai, Xiang
- Subjects
- *
THREE-dimensional imaging , *ARTIFICIAL neural networks , *COMPUTER simulation , *ACCURACY of information , *MATHEMATICAL optimization - Abstract
Highlights • Multi-view dynamic image is proposed to capture 3D motion characteristics of action. • A novel CNN model is proposed to leverage performance. • An action proposal approach is proposed to resist the effect of spatial variation. • Multi-view dynamic image can achieve the state-of-the-art performance. Abstract Dynamic imaging is a recently proposed action description paradigm for simultaneously capturing motion and temporal evolution information, particularly in the context of deep convolutional neural networks (CNNs). Compared with optical flow for motion characterization, dynamic imaging exhibits superior efficiency and compactness. Inspired by the success of dynamic imaging in RGB video, this study extends it to the depth domain. To better exploit three-dimensional (3D) characteristics, multi-view dynamic images are proposed. In particular, the raw depth video is densely projected with respect to different virtual imaging viewpoints by rotating the virtual camera within the 3D space. Subsequently, dynamic images are extracted from the obtained multi-view depth videos and multi-view dynamic images are thus constructed from these images. Accordingly, more view-tolerant visual cues can be involved. A novel CNN model is then proposed to perform feature learning on multi-view dynamic images. Particularly, the dynamic images from different views share the same convolutional layers but correspond to different fully connected layers. This is aimed at enhancing the tuning effectiveness on shallow convolutional layers by alleviating the gradient vanishing problem. Moreover, as the spatial occurrence variation of the actions may impair the CNN, an action proposal approach is also put forth. In experiments, the proposed approach can achieve state-of-the-art performance on three challenging datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
29. 3D Mesh Construction from Depth Images with Occlusion
- Author
-
Park, Jeung-Chul, Kim, Seung-Man, Lee, Kwan-Heng, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Zhuang, Yueting, editor, Yang, Shi-Qiang, editor, Rui, Yong, editor, and He, Qinming, editor
- Published
- 2006
- Full Text
- View/download PDF
30. Natural-Textured Mesh Stream Modeling from Depth Image-Based Representation
- Author
-
Kim, Seung Man, Park, Jeung Chul, Lee, Kwan H., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Gavrilova, Marina, editor, Gervasi, Osvaldo, editor, Kumar, Vipin, editor, Tan, C. J. Kenneth, editor, Taniar, David, editor, Laganá, Antonio, editor, Mun, Youngsong, editor, and Choo, Hyunseung, editor
- Published
- 2006
- Full Text
- View/download PDF
31. A Robust Approach for Human Activity Recognition Using 3-D Body Joint Motion Features with Deep Belief Network.
- Author
-
Uddin, Md. Zia and Jaehyoun Kim
- Subjects
MOTION detectors ,DETECTORS ,KINETIC energy ,FORCE & energy ,DYNAMICS - Abstract
Computer vision-based human activity recognition (HAR) has become very famous these days due to its applications in various fields such as smart home healthcare for elderly people. A video-based activity recognition system basically has many goals such as to react based on people's behavior that allows the systems to proactively assist them with their tasks. A novel approach is proposed in this work for depth video based human activity recognition using joint-based motion features of depth body shapes and Deep Belief Network (DBN). From depth video, different body parts of human activities are segmented first by means of a trained random forest. The motion features representing the magnitude and direction of each joint in next frame are extracted. Finally, the features are applied for training a DBN to be used for recognition later. The proposed HAR approach showed superior performance over conventional approaches on private and public datasets, indicating a prominent approach for practical applications in smartly controlled environments. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
32. Region-based bit allocation and rate control for depth video in HEVC.
- Author
-
Lei, Jianjun, Li, Zhenzhen, Zhu, Tao, He, Xiaoxu, You, Lei, and Hou, Chunping
- Subjects
BIT allocation analysis ,BIT rate ,VIDEO coding ,RATE distortion theory ,SIGNAL quantization - Abstract
In this paper, we present a novel rate control method with optimized region-based bit allocation for depth video coding. First, a synthetic view distortion oriented segmentation method is proposed to divide depth video into different regions, including texture areas and smooth areas. Then, the expanded exponential distortion-rate (D-R) models and power quantization parameter-rate (QP-R) models are investigated to simulate rate-distortion (R-D) characteristics of different regions. Finally, an optimal bit allocation scheme is developed to adaptively allocate target bit with the division. Experimental results on various video sequences demonstrate that the proposed algorithm achieves excellent R-D efficiency and bit rate accuracy compared with benchmark algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
33. A depth video processing algorithm based on cluster dependent and corner-ware filtering.
- Author
-
Peng, Zongju, Guo, Mingsong, Chen, Fen, Jiang, Gangyi, Yu, Mei, and Shao, Feng
- Subjects
- *
STREAMING video & television , *FUZZY clustering technique , *FEATURE extraction , *SIGNAL processing , *INFORMATION filtering - Abstract
In free viewpoint video system, color video and the associated depth video are utilized to synthesize arbitrary virtual viewpoint. Hence, high quality of depth video is a necessity for virtual view rendering. However, depth video estimated by depth estimate software is inconsistent and inaccurate which decreases the quality of virtual views. To solve the problem, a depth video processing algorithm is proposed in this paper. Firstly, depth video is divided into five clusters adaptively by Fuzzy C-means clustering method. Meanwhile, the edges of depth video are detected and expanded into 8×8 block-wise. Secondly, for pixels in non-edge regions of depth video, a cluster dependent filtering method is adopted according to the feature of each cluster. Finally, corners in the corresponding texture video are detected. For pixels in edge regions of depth video, a corner-aware filtering method is used. Experimental results show that the proposed algorithm enhances the depth video and significantly improves the quality of the virtual views. The peak signal to noise ratio of virtual views rendered by using the proposed algorithm is 0.43 dB higher than that of the virtual views rendered by using the original depth video. The proposed algorithm also outperforms Martin's algorithm in terms of virtual view rendering performance. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
34. Enhancement of dynamic depth scenes by upsampling for precise super-resolution (UP-SR).
- Author
-
Al Ismaeil, Kassem, Aouada, Djamila, Mirbach, Bruno, and Ottersten, Björn
- Subjects
HIGH resolution imaging ,DEPTH of field ,VIDEO excerpts ,ARCHAEOLOGY methodology ,ACCURACY - Abstract
Multi-frame super-resolution is the process of recovering a high resolution image or video from a set of captured low resolution images. Super-resolution approaches have been largely explored in 2-D imaging. However, their extension to depth videos is not straightforward due to the textureless nature of depth data, and to their high frequency contents coupled with fast motion artifacts. Recently, few attempts have been introduced where only the super-resolution of static depth scenes has been addressed. In this work, we propose to enhance the resolution of dynamic depth videos with non-rigidly moving objects. The proposed approach is based on a new data model that uses densely upsampled, and cumulatively registered versions of the observed low resolution depth frames. We show the impact of upsampling in increasing the sub-pixel accuracy and reducing the rounding error of the motion vectors. Furthermore, with the proposed cumulative motion estimation, a high registration accuracy is achieved between non-successive upsampled frames with relative large motions. A statistical performance analysis is derived in terms of mean square error explaining the effect of the number of observed frames and the effect of the super-resolution factor at a given noise level. We evaluate the accuracy of the proposed algorithm theoretically and experimentally as function of the SR factor, and the level of noise contamination. Experimental results on both real and synthetic data show the effectiveness of the proposed algorithm on dynamic depth videos as compared to state-of-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
35. A Local Feature-Based Robust Approach for Facial Expression Recognition from Depth Video.
- Author
-
Uddin, Md. Zia and Jaehyoun Kim
- Subjects
HUMAN facial recognition software ,FACIAL expression ,VIDEO processing ,FEATURE extraction ,ROBUST control ,PRINCIPAL components analysis - Abstract
Facial expression recognition (FER) plays a very significant role in computer vision, pattern recognition, and image processing applications such as human computer interaction as it provides sufficient information about emotions of people. For video-based facial expression recognition, depth cameras can be better candidates over RGB cameras as a person's face cannot be easily recognized from distance-based depth videos hence depth cameras also resolve some privacy issues that can arise using RGB faces. A good FER system is very much reliant on the extraction of robust features as well as recognition engine. In this work, an efficient novel approach is proposed to recognize some facial expressions from time-sequential depth videos. First of all, efficient Local Binary Pattern (LBP) features are obtained from the time-sequential depth faces that are further classified by Generalized Discriminant Analysis (GDA) to make the features more robust and finally, the LBP-GDA features are fed into Hidden Markov Models (HMMs) to train and recognize different facial expressions successfully. The depth information-based proposed facial expression recognition approach is compared to the conventional approaches such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA) where the proposed one outperforms others by obtaining better recognition rates. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
36. RGB-D视频中连续动作识别.
- Author
-
白栋天, 张磊, and 黄华
- Abstract
Copyright of China Sciencepaper is the property of China Sciencepaper and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2016
37. Depth video spatial and temporal correlation enhancement algorithm based on just noticeable rendering distortion model.
- Author
-
Peng, Zongju, Chen, Fen, Jiang, Gangyi, Yu, Mei, Shao, Feng, and Ho, Yo-Song
- Subjects
- *
VIDEOS , *DEPTH perception , *SPATIAL analysis (Statistics) , *ENCODING , *GAUSSIAN processes , *RENDERING (Computer graphics) , *BIT rate - Abstract
Spatial and temporal inconsistency of depth video deteriorates encoding efficiency in three dimensional video systems. A depth video processing algorithm based on human perception is presented. Firstly, a just noticeable rendering distortion (JNRD) model is formulated by combining the analyses of the influence of depth distortion on virtual view rendering with human visual perception characteristics. Then, depth video is processed based on the JNRD model from two aspects, spatial and temporal correlation enhancement. During the process of spatial correlation enhancement, depth video is segmented into edge, foreground, and background regions, and smoothened by Gaussian and mean filters. The operations of the temporal correlation enhancement include temporal–spatial transpose (TST), temporal smoothing filter and inverse TST. Finally, encoding and virtual view rendering experiments are conducted to evaluate the proposed algorithm. Experimental results show that the proposed algorithm can greatly reduce the bit rate while it maintains the quality of virtual view. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
38. Depth Error Elimination for RGB-D Cameras.
- Author
-
GAO, YUE, YANG, YOU, ZHEN, YI, and DAI, QIONGHAI
- Subjects
- *
DEPTH maps (Digital image processing) , *ERROR analysis in mathematics , *VIDEO recording , *COLOR image processing , *IMAGE quality analysis - Abstract
The rapid spreading of RGB-D cameras has led to wide applications of 3D videos in both academia and industry, such as 3D entertainment and 3D visual understanding. Under these circumstances, extensive research efforts have been dedicated to RGB-D camera -- oriented topics. In these topics, quality promotion of depth videos with the temporal characteristic is emerging and important. Due to the limited exposure time of RGB-D cameras, object movement can easily lead to motion blurs in intensive images, which can further result in obvious artifacts (holes or fake boundaries) in the corresponding depth frames. With regard to this problem, we propose a depth error elimination method based on time series analysis to remove the artifacts in depth images. In this method, we first locate the regions with erroneous depths in intensive images by using motion blur detection based on a time series analysis model. This is based on the fact that the depth image is calculated by intensive color images that are captured synchronously by RGB-D cameras. Then, the artifacts, such as holes or fake boundaries, are fixed by a depth error elimination method. To evaluate the performance of the proposed method, we conducted experiments on 250 images. Experimental results demonstrate that the proposed method can locate the error regions correctly and eliminate these artifacts effectively. The quality of depth video can be improved significantly by using the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
39. A depth video-based facial expression recognition system using radon transform, generalized discriminant analysis, and hidden Markov model.
- Author
-
Uddin, Md. and Hassan, Mohammad
- Subjects
HUMAN facial recognition software ,VIDEOS ,RADON transforms ,HIDDEN Markov models ,CAMERAS ,FEATURE extraction - Abstract
In this paper, a depth camera-based novel method is proposed to recognize several facial expressions from depth video. At first, Radon Transformation (RT) is done to extract features from the time-sequential depth faces that are further improved by Generalized Discriminant Analysis (GDA) to generate more robust features and then, Hidden Markov Models (HMMs) are applied to train and recognize different facial expressions successfully. Performance of the proposed facial expression recognition shows the superiority over conventional RGB camera-based approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
40. Content-adaptive temporal consistency enhancement for depth video.
- Author
-
Zeng, Huanqiang and Ma, Kai-Kuang
- Abstract
The video plus depth format, which is composed of the texture video and the depth video, has been widely used for free viewpoint TV. However, the temporal inconsistency is often encountered in the depth video due to the error incurred in the estimation of the depth values. This will inevitably deteriorate the coding efficiency of depth video and the visual quality of synthesized view. To address this problem, a content-adaptive temporal consistency enhancement (CTCE) algorithm for the depth video is proposed in this paper, which consists of two sequential stages: 1) classification of stationary and non-stationary regions based on the texture video, and 2) adaptive temporal consistency filtering on the depth video. The result of the first stage is used to steer the second stage so that the filtering process will be conducted in an adaptive manner. Extensive experimental results have shown that the proposed CTCE algorithm can effectively mitigate the temporal inconsistency in the original depth video and consequently improve the coding efficiency of depth video and the visual quality of synthesized view. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
41. Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos.
- Author
-
Song, Yan, Tang, Jinhui, Liu, Fan, and Yan, Shuicheng
- Subjects
- *
ROBUST control , *FEATURE extraction , *BODY surface mapping , *PATTERN recognition systems , *INFORMATION processing - Abstract
Human action recognition in videos is useful for many applications. However, there still exist huge challenges in real applications due to the variations in the appearance, lighting condition and viewing angle, of the subjects. In this consideration, depth data have advantages over red, green, blue (RGB) data because of their spatial information about the distance between object and viewpoint. Unlike existing works, we utilize the 3-D point cloud, which contains points in the 3-D real-world coordinate system to represent the external surface of human body. Specifically, we propose a new robust feature, the body surface context (BSC), by describing the distribution of relative locations of the neighbors for a reference point in the point cloud in a compact and descriptive way. The BSC encodes the cylindrical angular of the difference vector based on the characteristics of human body, which increases the descriptiveness and discriminability of the feature. As the BSC is an approximate object-centered feature, it is robust to transformations including translations and rotations, which are very common in real applications. Furthermore, we propose three schemes to represent human actions based on the new feature, including the skeleton-based scheme, the random-reference-point scheme, and the spatial-temporal scheme. In addition, to evaluate the proposed feature, we construct a human action dataset by a depth camera. Experiments on three datasets demonstrate that the proposed feature outperforms RGB-based features and other existing depth-based features, which validates that the BSC feature is promising in the field of human action recognition. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
42. Depth Motion Detection—A Novel RS-Trigger Temporal Logic based Method.
- Author
-
Can Wang, Hong Liu, and Liqian Ma
- Subjects
MOTION analysis ,DEPTH maps (Digital image processing) ,VIDEO processing ,IMAGE segmentation ,IMAGE denoising - Abstract
Recently, depth data is widely used in computer vision applications such as detection and tracking, which shows great promises in complicated environments due to its complementary natures to RGB data. However, previous works mostly use depth as an auxiliary cue of RGB data and overlook its inherent advantage on motion detection. Intrinsically different from RGB data, points in depth map essentially represents 3-D positions in the world, so depth video represents the variation of these “positions,” which is motion. Motivated by this, we proposed a novel motion detection scheme based on RS-Trigger temporal logic which best fits nature of depth data on motion detection. The proposed algorithm can fast detect motion regions in the scene without statistics of background and prior knowledge of objects to detect. In following refinement modules, a depth-invariant density-constant projection is proposed which contributes to a fast spatial clustering and accurate segmentation, for it transforms dense 3-D points cloud to depth-invariant 2-D map with density-constance, not only it overcomes depth-dependent sampling of depth sensor, but also overcomes the common `scale problem' in 2-D image analysis, which makes it easy to set system parameters to de-noise and pop-out motion regions. Experimental results validate its effectiveness and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
43. A Bayesian framework for dense depth estimation based on spatial–temporal correlation
- Author
-
Liu, Qiong, Yang, You, Gao, Yue, Ji, Rongrong, and Yu, Li
- Subjects
- *
THREE-dimensional display systems , *MATHEMATICAL optimization , *BAYESIAN analysis , *FEATURE extraction , *PROBABILITY theory , *STATISTICAL correlation , *ALGORITHMS , *GRAPH theory - Abstract
Abstract: Depth video is vital to the representation of the dynamic 3-D video, which is the fundamental for the rapidly growing 3-D video applications. The issues of accuracy and temporal consistency are the main concerns of the researches on depth videos. In previous works, the stereo matching methods with global optimization can generate accurate and dense depth videos. However, the global optimization is computationally intensive, and the temporal consistency is obtained with difficulty in the optimization procedure. In this paper, a Bayesian framework is proposed to generate accurate and temporal consistent dense depth videos in an efficient way. Firstly, the spatial and temporal correlations in 3-D videos from different viewpoints are used to generate the candidates for depth, and these correlations are further measured by extracted features. These features are adopted to estimate the risk of initial depth in our Bayesian framework. Then, a two-stage decision method is designed to select candidates of depth value in the initial depth map with the minimum risk probability. Finally, depth videos are refined by improved graph cuts algorithm with global optimization. An improved graph construction method is designed and applied on graph cuts algorithm to reduce the number of nodes in graph and thus the complexity of global optimization. The experimental results demonstrate that the proposed algorithm can achieve accurate depth videos with higher efficiency up to 68.14% than traditional methods. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
44. Enhancement of dynamic depth scenes by upsampling for precise super-resolution (UP-SR)
- Author
-
Djamila Aouada, Bjorn Ottersten, Kassem Al Ismaeil, and Bruno Mirbach
- Subjects
Mean squared error ,non-rigid motion ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Synthetic data ,Upsampling ,Motion estimation ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,moving objects ,Data model (GIS) ,Mathematics ,Computer science [C05] [Engineering, computing & technology] ,Signal processing ,depth video ,business.industry ,020206 networking & telecommunications ,Sciences informatiques [C05] [Ingénierie, informatique & technologie] ,Super-resolution ,upsampling ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Noise (video) ,Artificial intelligence ,Round-off error ,cumulative motion ,business ,Software - Abstract
Multi-frame super-resolution is the process of recovering a high resolution image or video from a set of captured low resolution images. Super-resolution approaches have been largely explored in 2-D imaging. However, their extension to depth videos is not straightforward due to the textureless nature of depth data, and to their high frequency contents coupled with fast motion artifacts. Recently, few attempts have been introduced where only the super-resolution of static depth scenes has been addressed. In this work, we propose to enhance the resolution of dynamic depth videos with non-rigidly moving objects. The proposed approach is based on a new data model that uses densely upsampled, and cumulatively registered versions of the observed low resolution depth frames. We show the impact of upsampling in increasing the sub-pixel accuracy and reducing the rounding error of the motion vectors. Furthermore, with the proposed cumulative motion estimation, a high registration accuracy is achieved between non-successive upsampled frames with relative large motions. A statistical performance analysis is derived in terms of mean square error explaining the effect of the number of observed frames and the effect of the super-resolution factor at a given noise level. We evaluate the accuracy of the proposed algorithm theoretically and experimentally as function of the SR factor, and the level of contaminations with noise. Experimental results on both real and synthetic data show the effectiveness of the proposed algorithm on dynamic depth videos as compared to state-of-art methods.
- Published
- 2016
- Full Text
- View/download PDF
45. Action recognition in depth videos
- Author
-
Ubalde, Sebastián and Mejail, Marta Estela
- Subjects
APRENDIZAJE MULTI INSTANCIA ,INSTANCIA-A-CLASE ,CITATION-KNN ,DEPTH VIDEO ,MULTIPLE INSTANCE LEARNING ,VIDEO DE PROFUNDIDAD ,EDIT DISTANCE ON REAL SEQUENCE ,INSTANCE-TO-CLASS - Abstract
El problema de reconocer automáticamente una acción llevadaa cabo en un video está recibiendo mucha atención en la comunidad devisión por computadora, con aplicaciones que van desde el reconocimientode personas hasta la interacción persona-computador. Podemos pensar alcuerpo humano como un sistema de segmentos rígidos conectados por articulaciones,y al movimiento del cuerpo como una transformación continuade la configuración espacial de dichos segmentos. La llegada de cámarasde profundidad de bajo costo hizo posible el desarrollo de un algoritmo deseguimiento de personas preciso y eficiente, que obtiene la ubicación 3D devarias articulaciones del esqueleto humano en tiempo real. Esta tesis presentacontribuciones al modelado de la evolución temporal de los esqueletos. El modelado de la evolución temporal de descriptores de esqueleto planteavarios desafíos. En primer lugar, la posición 3D estimada para las articulacionessuele ser imprecisa. En segundo lugar, las acciones humanaspresentan gran variabilidad intra-clase. Esta variabilidad puede encontrarseno sólo en la configuración de los esqueletos por separado (por ejemplo, lamisma acción da lugar a diferentes configuraciones para diestros y para zurdos)sino tambión en la dinámica de la acción: diferentes personas puedenejecutar una misma acción a distintas velocidades; las acciones que involucranmovimientos periódicos (como aplaudir) pueden presentar diferentescantidades de repeticiones de esos movimientos; dos videos de la mismaacción puede estar no-alineados temporalmente; etc. Por último, accionesdiferentes pueden involucrar configuraciones de esqueleto y movimientossimilares, dando lugar a un escenario de gran similaridad inter-clase. Eneste trabajo exploramos dos enfoques para hacer frente a estas dificultades. En el primer enfoque presentamos una extensión a Edit Distance on Realsequence (EDR), una medida de similaridad entre series temporales robustay precisa. Proponemos dos mejoras clave a EDR: una función de costo suavepara el alineamiento de puntos y un algoritmo de alineamiento modificado basado en el concepto de Instancia-a-Clase (I2C, por el término en inglés: Instance-to-Class). La función de distancia resultante tiene en cuenta el ordenamientotemporal de las secuencias comparadas, no requiere aprendizajede parámetros y es altamente tolerante al ruido y al desfasaje temporal. Además, mejora los resultados de métodos no-paramótricos de clasificaciónde secuencias, sobre todo en casos de alta variabilidad intra-clase y pocosdatos de entrenamiento. En el segundo enfoque, reconocemos que la cantidad de esqueletos discriminativosen una secuencia puede ser baja. Los esqueletos restantes puedenser ruidosos, tener configuraciones comunes a varias acciones (por ejemplo,la configuración correspondiente a un esqueleto sentado e inmóvil) uocurrir en instantes de tiempo poco comunes para la acción del video. Por lotanto, el problema puede ser naturalmente encarado como uno de Aprendizaje Multi Instancia (MIL por el término en inglés Multiple Instance Learning). En MIL, las instancias de entrenamiento se organizan en conjuntos o bags. Cada bag de entrenamiento tiene asignada una etiqueta que indica la clasea la que pertenece. Un bag etiquetado con una determinada clase contieneinstancias que son características de la clase, pero puede (y generalmenteasí ocurre) también contener instancias que no lo son. Siguiendo esta idea,representamos los videos como bags de descriptores de esqueleto con marcasde tiempo, y proponemos un framework basado en MIL para el reconocimientode acciones. Nuestro enfoque resulta muy tolerante al ruido, lavariabilidad intra-clase y la similaridad inter-clase. El framework propuestoes simple y provee un mecanismo claro para regular la tolerancia al ruido, ala poca alineación temporal y a la variación en las velocidades de ejecución. Evaluamos los enfoques presentados en cuatro bases de datos públicascapturadas con cámaras de profundidad. En todos los casos, se trata debases desafiantes. Los resultados muestran una comparación favorable denuestras propuestas respecto al estado del arte. The problem of automatically identifying an action performedin a video is receiving a great deal of attention in the computer vision community,with applications ranging from people recognition to human computerinteraction. We can think the human body as an articulated systemof rigid segments connected by joints, and human motion as a continuoustransformation of the spatial arrangement of those segments. The arrival oflow-cost depth cameras has made possible the development of an accurateand efficient human body tracking algorithm, that computes the 3D locationof several skeleton joints in real time. This thesis presents contributionsconcerning the modeling of the skeletons temporal evolution. Modeling the temporal evolution of skeleton descriptors is a challengingtask. First, the estimated location of the 3D joints are usually inaccurate. Second, human actions have large intra-class variability. This variabilitymay be found not only in the spatial configuration of individual skeletons (for example, the same action involves different configurations for righthandedand left-handed people) but also on the action dynamics: differentpeople have different execution speeds; actions with periodic movements (like clapping) may involve different numbers of repetitions; two videos ofthe same action may be temporally misaligned; etc. Finally, different actionsmay involve similar skeletal configurations, as well as similar movements,effectively yielding large inter-class similarity. We explore two approachesto the problem that aim at tackling this difficulties. In the first approach, we present an extension to the Edit Distance on Real sequence (EDR), a robust and accurate similarity measure between timeseries. We introduce two key improvements to EDR: a weighted matchingscheme for the points in the series and a modified aligning algorithm basedon the concept of Instance-to-Class distance. The resulting distance functiontakes into account temporal ordering, requires no learning of parametersand is highly tolerant to noise and temporal misalignment. Furthermore,it improves the results of non-parametric sequence classification methods,specially in cases of large intra-class variability and small training sets. In the second approach, we explicitly acknowledge that the number ofdiscriminative skeletons in a sequence might be low. The rest of the skeletonsmight be noisy or too person-specific, have a configuration common toseveral actions (for example, a sit still configuration), or occur at uncommonframes. Thus, the problem can be naturally treated as a Multiple Instance Learning (MIL) problem. In MIL, training instances are organized into bags. A bag from a given class contains some instances that are characteristic ofthat class, but might (and most probably will) contain instances that are not. Following this idea, we represent videos as bags of time-stamped skeletondescriptors, and we propose a new MIL framework for action recognitionfrom skeleton sequences. We found that our approach is highly tolerant tonoise, intra-class variability and inter-class similarity. The proposed frameworkis simple and provides a clear way of regulating tolerance to noise,temporal misalignment and variations in execution speed. We evaluate the proposed approaches on four publicly available challengingdatasets captured by depth cameras, and we show that they comparefavorably against other state-of-the-art methods. Fil: Ubalde, Sebastián. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina.
- Published
- 2016
46. Incorporation of Depth Feed for Prototype Rover Video and Other Usability Improvements
- Author
-
Dal Santo, Joseph
- Subjects
- Teleoperation, Augmented Reality, Depth Video, Direct Control
- Abstract
NASA has been mandated by the US government to return to the moon within the next five years, meaning that functional technology for lunar exploration must be developed and tested in order to reach that goal. While most lunar exploration vehicles can be operated using a supervisory control scheme, utilizing a direct human control scheme would allow missions and navigation be more robust to unforeseen events and obstacles. Because any human-teleoperated lunar mission would involve significant time delay, low video resolution, and low framerate, video feeds need to be augmented to grant human navigators additional information about the rover's surroundings. This work proposes two new video feed options for an existing prototype lunar rover that utilize an Intel D435 depth camera to provide a depth image stream of the rover's surroundings and an augmented reality (AR) depth overlay on a mono RGB video stream. The video feeds proposed in this work have been proven to function within the rover's current architecture, and can correctly simulate the conditions a navigating user would experience when attempting to teleoperate the rover at a lunar distance. This work also proposes usability improvements to the User Heads-Up Display based on previous test feedback to reduce onscreen clutter for navigating users, and a data visualization program to reduce data analysis time for testers and increase clarity when presenting data.
- Published
- 2019
47. Enhancement of dynamic depth scenes by upsampling for precise super-resolution (UP-SR)
- Author
-
Ismaeil, Kassem Al, Aouada, Djamila, Mirbach, Bruno, Ottersten, Björn, Ismaeil, Kassem Al, Aouada, Djamila, Mirbach, Bruno, and Ottersten, Björn
- Abstract
Multi-frame super-resolution is the process of recovering a high resolution image or video from a set of captured low resolution images. Super-resolution approaches have been largely explored in 2-D imaging. However, their extension to depth videos is not straightforward due to the textureless nature of depth data, and to their high frequency contents coupled with fast motion artifacts. Recently, few attempts have been introduced where only the super-resolution of static depth scenes has been addressed. In this work, we propose to enhance the resolution of dynamic depth videos with non-rigidly moving objects. The proposed approach is based on a new data model that uses densely upsampled, and cumulatively registered versions of the observed low resolution depth frames. We show the impact of upsampling in increasing the sub-pixel accuracy and reducing the rounding error of the motion vectors. Furthermore, with the proposed cumulative motion estimation, a high registration accuracy is achieved between non-successive upsampled frames with relative large motions. A statistical performance analysis is derived in terms of mean square error explaining the effect of the number of observed frames and the effect of the super-resolution factor at a given noise level. We evaluate the accuracy of the proposed algorithm theoretically and experimentally as function of the SR factor, and the level of noise contamination. Experimental results on both real and synthetic data show the effectiveness of the proposed algorithm on dynamic depth videos as compared to state-of-art methods., QC 20191025
- Published
- 2016
- Full Text
- View/download PDF
48. 2D-plus-depth based resolution and frame-rate up-conversion technique for depth video.
- Author
-
Choi, Jinwook, Min, Dongbo, and Sohn, Kwanghoon
- Subjects
- *
CHARGE coupled devices , *INTERPOLATION , *VIDEO processing , *TIME-domain analysis , *COMPUTER vision , *INFORMATION theory - Abstract
We propose a novel framework for upconversion of depth video resolution in both spatial and time domains considering spatial and temporal coherences. Although the Time-of-Flight (TOF) sensor which is widely used in computer vision fields provides depth video in realtime, it also provides a low resolution and a low frame-rate depth video. We propose a cheaper solution that enhances depth video obtained from a TOF sensor by combining it with a Charge-coupled Device (CCD) camera in 3D contents which consist of 2D-plus-depth. Temporal fluctuation problems are also considered for temporally consistent framerate up-conversion. It is important to maintain temporal coherence in depth video, because temporal fluctuation problems may cause eye fatigue and increase bit rates on video coding. We propose a Motion Compensated Frame Interpolation (MCFI) using reliable and rich motion information from a CCD camera and 3-dimensional Joint Bilateral Up-sampling (3D JBU) extended into the temporal domain of depth video. Experimental results show that depth video obtained by the proposed method provides satisfactory quality1. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.