46 results on '"Chenglei Wu"'
Search Results
2. Pattern-Based Cloth Registration and Sparse-View Animation
- Author
-
Oshri Halimi, Tuur Stuyck, Donglai Xiang, Timur Bagautdinov, He Wen, Ron Kimmel, Takaaki Shiratori, Chenglei Wu, Yaser Sheikh, and Fabian Prada
- Subjects
Computer Graphics and Computer-Aided Design - Abstract
We propose a novel multi-view camera pipeline for the reconstruction and registration of dynamic clothing. Our proposed method relies on a specifically designed pattern that allows for precise video tracking in each camera view. We triangulate the tracked points and register the cloth surface in a fine-grained geometric resolution and low localization error. Compared to state-of-the-art methods, our registration exhibits stable correspondence, tracking the same points on the deforming cloth surface along the temporal sequence. As an application, we demonstrate how the use of our registration pipeline greatly improves state-of-the-art pose-based drivable cloth models. Furthermore, we propose a novel model, Garment Avatar , for driving cloth from a dense tracking signal which is obtained from two opposing camera views. The method produces realistic reconstructions which are faithful to the actual geometry of the deforming cloth. In this setting, the user wears a garment with our custom pattern which enables our driving model to reconstruct the geometry. Our code and data are available at https://github.com/HalimiOshri/Pattern-Based-Cloth-Registration-and-Sparse-View-Animation. The released data includes our pattern and registered mesh sequences containing four different subjects and 15k frames in total.
- Published
- 2022
3. Learning Tailored Adaptive Bitrate Algorithms to Heterogeneous Network Conditions: A Domain-Specific Priors and Meta-Reinforcement Learning Approach
- Author
-
Tianchi Huang, Chao Zhou, Rui-Xiao Zhang, Chenglei Wu, and Lifeng Sun
- Subjects
Computer Networks and Communications ,Electrical and Electronic Engineering - Published
- 2022
4. AggCast: Practical Cost-effective Scheduling for Large-scale Cloud-edge Crowdsourced Live Streaming
- Author
-
Rui-Xiao Zhang, Changpeng Yang, Xiaochan Wang, Tianchi Huang, Chenglei Wu, Jiangchuan Liu, and Lifeng Sun
- Published
- 2022
5. Constraining dense hand surface tracking with elasticity
- Author
-
Chenglei Wu, Yaser Sheikh, He Wen, Patrick Peluse, Jessica K. Hodgins, Takaaki Shiratori, and Breannan Smith
- Subjects
Fist ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,Tracking (particle physics) ,Computer Graphics and Computer-Aided Design ,Hand surface ,Elasticity (cloud computing) ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Artificial intelligence ,Graphics ,Physically based animation ,business ,Gesture - Abstract
Many of the actions that we take with our hands involve self-contact and occlusion: shaking hands, making a fist, or interlacing our fingers while thinking. This use of of our hands illustrates the importance of tracking hands through self-contact and occlusion for many applications in computer vision and graphics, but existing methods for tracking hands and faces are not designed to treat the extreme amounts of self-contact and self-occlusion exhibited by common hand gestures. By extending recent advances in vision-based tracking and physically based animation, we present the first algorithm capable of tracking high-fidelity hand deformations through highly self-contacting and self-occluding hand gestures, for both single hands and two hands. By constraining a vision-based tracking algorithm with a physically based deformable model, we obtain an algorithm that is robust to the ubiquitous self-interactions and massive self-occlusions exhibited by common hand gestures, allowing us to track two hand interactions and some of the most difficult possible configurations of a human hand.
- Published
- 2020
6. Quality-Aware Neural Adaptive Video Streaming With Lifelong Imitation Learning
- Author
-
Lifeng Sun, Chao Zhou, Rui-Xiao Zhang, Tianchi Huang, Chenglei Wu, Bing Yu, and Xin Yao
- Subjects
Speedup ,Computer Networks and Communications ,business.industry ,Computer science ,media_common.quotation_subject ,020206 networking & telecommunications ,02 engineering and technology ,Construct (python library) ,Machine learning ,computer.software_genre ,Video quality ,0202 electrical engineering, electronic engineering, information engineering ,Quality (business) ,Artificial intelligence ,Quality of experience ,Electrical and Electronic Engineering ,business ,computer ,Internet video ,media_common ,TRACE (psycholinguistics) - Abstract
Existing Adaptive Bitrate (ABR) algorithms pick future video chunks' bitrates via fixed rules or offline trained models to ensure good quality of experience (QoE) for Internet video. Nevertheless, data analysis demonstrates that a good ABR algorithm is required to continually and fast update for adapting itself to time-varying network conditions. Therefore, we propose Comyco, a video quality-aware learning-based ABR approach that enormously improves recent schemes by i) picking the chunk with higher perceptual video qualities rather than video bitrates; ii) training the policy via imitating expert trajectories given by the expert strategy; iii) employing the lifelong learning method to continually train the model w.r.t the fresh trace collected by the users. To achieve this, we develop a complete quality-aware lifelong imitation learning-based ABR system, construct quality-based neural network architecture, collect a quality-driven video dataset, and estimate QoE metrics with video quality features. Using trace-driven and real-world experiments, we demonstrate Comyco reaches 1700-fold improvements in the number of samples required and 16-fold speedup in the training time compared with the prior work. Meanwhile, Comyco outperforms existing methods, with the improvements on average QoE of 7.5%-16.79%. Moreover, experimental results on continual training also illustrate that lifelong learning helps Comyco further improve the average QoE of 1.07%-9.81% in comparison to the offline trained model.
- Published
- 2020
7. A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video
- Author
-
Zhi Wang, Chenglei Wu, Lifeng Sun, and Rui-Xiao Zhang
- Subjects
Scheme (programming language) ,Viewport ,business.industry ,Computer science ,Feature extraction ,General Medicine ,Convolutional neural network ,Convolution ,Recurrent neural network ,Distortion ,Computer vision ,Artificial intelligence ,business ,Projection (set theory) ,computer ,computer.programming_language - Abstract
Viewport prediction for 360 video forecasts a viewer’s viewport when he/she watches a 360 video with a head-mounted display, which benefits many VR/AR applications such as 360 video streaming and mobile cloud VR. Existing studies based on planar convolutional neural network (CNN) suffer from the image distortion and split caused by the sphere-to-plane projection. In this paper, we start by proposing a spherical convolution based feature extraction network to distill spatial-temporal 360 information. We provide a solution for training such a network without a dedicated 360 image or video classification dataset. We differ with previous methods, which base their predictions on image pixel-level information, and propose a semantic content and preference based viewport prediction scheme. In this paper, we adopt a recurrent neural network (RNN) network to extract a user's personal preference of 360 video content from minutes of embedded viewing histories. We utilize this semantic preference as spatial attention to help network find the "interested'' regions on a future video. We further design a tailored mixture density network (MDN) based viewport prediction scheme, including viewport modeling, tailored loss function, etc, to improve efficiency and accuracy. Our extensive experiments demonstrate the rationality and performance of our method, which outperforms state-of-the-art methods, especially in long-term prediction.
- Published
- 2020
8. A Practical Learning-based Approach for Viewer Scheduling in the Crowdsourced Live Streaming
- Author
-
Chenglei Wu, Xin Yao, Lifeng Sun, Tianchi Huang, Ming Ma, Rui-Xiao Zhang, and Haitian Pang
- Subjects
Feature engineering ,Schedule ,Computer Networks and Communications ,Computer science ,business.industry ,Decision tree ,020206 networking & telecommunications ,Content delivery network ,02 engineering and technology ,Machine learning ,computer.software_genre ,Live streaming ,Scheduling (computing) ,Hardware and Architecture ,TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,020201 artificial intelligence & image processing ,Artificial intelligence ,Quality of experience ,business ,computer - Abstract
Scheduling viewers effectively among different Content Delivery Network (CDN) providers is challenging owing to the extreme diversity in the crowdsourced live streaming (CLS) scenarios. Abundant algorithms have been proposed in recent years, which, however, suffer from a critical limitation: Due to their inaccurate feature engineering or naive rules, they cannot optimally schedule viewers. To address this concern, we put forward LTS (Learn to Schedule), a novel scheduling algorithm that can adapt to the dynamics from both viewer traffics and CDN performance. In detail, we first propose LTS-RL, an approach that schedules CLS viewers based on deep reinforcement learning (DRL). Since LTS-RL is trained in an end-to-end way, it can automatically learn scheduling algorithms without any pre-programmed models or assumptions about the environment dynamics. At the same time, to practically deploy LTS-RL, we then use the decision tree and imitation learning to convert LTS-RL into a more light-weighted and interpretable model, which is denoted as Fast-LTS. After the extensive evaluation of the real data from a leading CLS platform in China, we demonstrate that our proposed model (both LTS-RL and Fast-LTS) can improve the average quality of experience (QoE) over state-of-the-art approaches by 8.71--15.63%. At the same time, we also demonstrate that Fast-LTS can faithfully convert the complicated LTS-RL with slight performance degradation (< 2%), while significantly reducing the decision time (×7--10).
- Published
- 2020
9. Neural Strands: Learning Hair Geometry and Appearance from Multi-view Images
- Author
-
Radu Alexandru Rosu, Shunsuke Saito, Ziyan Wang, Chenglei Wu, Sven Behnke, and Giljoo Nam
- Published
- 2022
10. A Spherical Mixture Model Approach for 360 Video Virtual Cinematography
- Author
-
Chenglei Wu, Zhi Wang, and Lifeng Sun
- Subjects
Computer science ,Computer graphics (images) ,Virtual cinematography ,Mixture model - Published
- 2021
11. PAAS
- Author
-
Lifeng Sun, Chenglei Wu, and Zhi Wang
- Subjects
Instruction prefetch ,Quality management ,Generalization ,business.industry ,Computer science ,media_common.quotation_subject ,Stability (learning theory) ,Machine learning ,computer.software_genre ,Adaptive bitrate streaming ,Overhead (computing) ,Reinforcement learning ,Quality (business) ,Artificial intelligence ,business ,computer ,media_common - Abstract
Conventional tile-based 360° video streaming methods, including deep reinforcement learning (DRL) based, ignore the interactive nature of 360° video streaming and download tiles following fixed sequential orders, thus failing to respond to the user's head motion changes. We show that these existing solutions suffer from either the prefetch accuracy or the playback stability drop. Furthermore, these methods are constrained to serve only one fixed streaming preference, causing extra training overhead and the lack of generalization on unseen preferences. In this paper, we propose a dual-queue streaming framework, with accuracy and stability purposes respectively, to enable the DRL agent to determine and change the tile download order without incurring overhead. We also design a preference-aware DRL algorithm to incentivize the agent to learn preference-dependent ABR decisions efficiently. Compared with state-of-the-art DRL baselines, our method not only significantly improves the streaming quality, e.g., increasing the average streaming quality by 13.6% on a public dataset, but also demonstrates better performance and generalization under dynamic preferences, e.g., an average quality improvement of 19.9% on unseen preferences.
- Published
- 2021
12. Deep incremental learning for efficient high-fidelity face tracking
- Author
-
Takaaki Shiratori, Yaser Sheikh, and Chenglei Wu
- Subjects
Facial motion capture ,business.industry ,Computer science ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,Tracking (particle physics) ,Computer Graphics and Computer-Aided Design ,Autoencoder ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Texture mapping ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train our deep learning-based statistical model, and the tracking step, which takes predictions of geometry and texture our model infers from measured images and optimize the predicted geometry by minimizing image, geometry and facial landmark errors. Our Geo-Tex VAE model extends the convolutional variational autoencoder for face tracking, and jointly learns and represents deformations and variations in geometry and texture from tracked meshes and texture maps. To accurately model variations in facial geometry and texture, we introduce the decomposition layer in the Geo-Tex VAE architecture which decomposes the facial deformation into global and local components. We train the global deformation with a fully-connected network and the local deformations with convolutional layers. Despite running this model on each frame independently - thereby enabling a high amount of parallelization - we validate that our framework achieves sub-millimeter accuracy on synthetic data and outperforms existing methods. We also qualitatively demonstrate high-fidelity, long-duration facial performance tracking on several actors.
- Published
- 2018
13. Modeling Clothing as a Separate Layer for an Animatable Human Avatar
- Author
-
He Wen, Timur Bagautdinov, Donglai Xiang, Chenglei Wu, Weipeng Xu, Fabian Prada, Yuan Dong, and Jessica K. Hodgins
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Animation ,Texture (music) ,Clothing ,Computer Graphics and Computer-Aided Design ,Autoencoder ,GeneralLiterature_MISCELLANEOUS ,Graphics (cs.GR) ,Computer Science - Graphics ,Codec ,Computer vision ,Artificial intelligence ,Layer (object-oriented design) ,Representation (mathematics) ,business ,Avatar ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We have recently seen great progress in building photorealistic animatable full-body codec avatars, but generating high-fidelity animation of clothing is still difficult. To address these difficulties, we propose a method to build an animatable clothed body avatar with an explicit representation of the clothing on the upper body from multi-view captured videos. We use a two-layer mesh representation to register each 3D scan separately with the body and clothing templates. In order to improve the photometric correspondence across different frames, texture alignment is then performed through inverse rendering of the clothing geometry and texture predicted by a variational autoencoder. We then train a new two-layer codec avatar with separate modeling of the upper clothing and the inner body layer. To learn the interaction between the body dynamics and clothing states, we use a temporal convolution network to predict the clothing latent code based on a sequence of input skeletal poses. We show photorealistic animation output for three different actors, and demonstrate the advantage of our clothed-body avatars over the single-layer avatars used in previous work. We also show the benefit of an explicit clothing model that allows the clothing texture to be edited in the animation output., Comment: Camera ready for SIGGRAPH Asia 2021 Technical Papers. https://research.fb.com/publications/modeling-clothing-as-a-separate-layer-for-an-animatable-human-avatar/
- Published
- 2021
- Full Text
- View/download PDF
14. Stick: A Harmonious Fusion of Buffer-based and Learning-based Approach for Adaptive Streaming
- Author
-
Xin Yao, Tianchi Huang, Chao Zhou, Chenglei Wu, Lifeng Sun, and Rui-Xiao Zhang
- Subjects
Emulation ,Artificial neural network ,business.industry ,Computer science ,Deep learning ,Real-time computing ,020206 networking & telecommunications ,020207 software engineering ,02 engineering and technology ,Reduction (complexity) ,Bit rate ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Artificial intelligence ,Quality of experience ,business - Abstract
Off-the-shelf buffer-based approaches leverage a simple yet effective buffer-bound to control the adaptive bitrate (ABR) streaming system. Nevertheless, such approaches in standard parameters fail to always provide high quality of experience (QoE) video streaming services under all considered network conditions. Meanwhile, state-of-the-art learning-based ABR approach Pensieve outperforms existing schemes but is impractical to deploy. Therefore, how to harmoniously fuse the buffer-based and learning-based approach has become a key challenge for further enhancing ABR methods. In this paper, we propose Stick, an ABR algorithm that fuses the deep learning method and traditional buffer-based method. Stick utilizes the deep reinforcement learning (DRL) method to train the neural network, which outputs the buffer-bound to control the buffer-based approach for maximizing the QoE metric with different parameters. Trace-driven emulation illustrates that Stick betters Pensieve by 3.5% - 9.41% with an overhead reduction of 88%. Moreover, aiming to further reduce the computational costs while preserving the performances, we propose Trigger, a light-weighted neural network that determines whether the buffer-bound should be adjusted. Experimental results show that Stick+Trigger rivals or outperforms existing schemes in average QoE by 1.7%-28%, and significantly reduces the Stick’s computational overhead by 24%-61%. Meanwhile, we show that Trigger also helps other ABR schemes mitigate the overhead. Extensive results on real-world evaluation demonstrate the superiority of Stick over existing state-of-the-art approaches.
- Published
- 2020
15. Generalizing Rate Control Strategies for Realtime Video Streaming via Learning from Deep Learning
- Author
-
Xin Yao, Tianchi Huang, Bing Yu, Chenglei Wu, Rui-Xiao Zhang, Chao Zhou, and Lifeng Sun
- Subjects
Computer science ,business.industry ,Deep learning ,media_common.quotation_subject ,Rate control ,Machine learning ,computer.software_genre ,Reduction (complexity) ,Overhead (computing) ,Quality (business) ,Artificial intelligence ,Video streaming ,business ,computer ,media_common - Abstract
The leading learning-based rate control method, i.e., QARC, achieves state-of-the-art performances but fails to interpret the fundamental principles, and thus lacks the abilities to further improve itself efficiently. In this paper, we propose EQARC (Explainable QARC) via reconstructing QARC's modules, aiming to demystify how QARC works. In details, we first utilize a novel hybrid attention-based CNN+GRU model to re-characterize the original quality prediction network and reasonably replace the QARC's 1D-CNN layers with 2D-CNN layers. Using trace-driven experiment, we demonstrate the superiority of EQARC over existing state-of-the-art approaches. Next, we collect several useful information from each interpretable modules and learn the insight of EQARC. Following this step, we further propose AQARC (Advanced QARC), which is the light-weighted version of QARC. Experimental results show that AQARC achieves the same performances as the QARC with an overhead reduction of 90%. In short, through learning from deep learning, we generalize a rate control method which can both reach high performance and reduce computation cost.
- Published
- 2019
16. USP38 Couples Histone Ubiquitination and Methylation via KDM5B to Resolve Inflammation
- Author
-
Junjiu Huang, Ling Ma, Shuai Yang, Zexiong Su, Jun Cui, Xiya Zhang, Puping Liang, Zhiyao Zhao, Yaoxing Wu, Junyan Feng, Di Liu, and Chenglei Wu
- Subjects
Science ,General Chemical Engineering ,General Physics and Astronomy ,Medicine (miscellaneous) ,02 engineering and technology ,ubiquitination ,010402 general chemistry ,Corrections ,01 natural sciences ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,Chromatin remodeling ,histone modification ,General Materials Science ,lcsh:Science ,Transcription factor ,Full Paper ,biology ,Histone ubiquitination ,Chemistry ,General Engineering ,Correction ,Methylation ,Full Papers ,USP38 ,021001 nanoscience & nanotechnology ,0104 chemical sciences ,Chromatin ,Cell biology ,deubiquitinase ,Histone ,inflammation ,Acetylation ,KDM5B ,biology.protein ,Demethylase ,lcsh:Q ,0210 nano-technology - Abstract
Chromatin modifications, such as histone acetylation, ubiquitination, and methylation, play fundamental roles in maintaining chromatin architecture and regulating gene transcription. Although their crosstalk in chromatin remodeling has been gradually uncovered, the functional relationship between histone ubiquitination and methylation in regulating immunity and inflammation remains unclear. Here, it is reported that USP38 is a novel histone deubiquitinase that works together with the histone H3K4 modifier KDM5B to orchestrate inflammatory responses. USP38 specifically removes the monoubiquitin on H2B at lysine 120, which functions as a prerequisite for the subsequent recruitment of demethylase KDM5B to the promoters of proinflammatory cytokines Il6 and Il23a during LPS stimulation. KDM5B in turn inhibits the binding of NF‐κB transcription factors to the Il6 and Il23a promoters by reducing H3K4 trimethylation. Furthermore, USP38 can bind to KDM5B and prevent it from proteasomal degradation, which further enhances the function of KDM5B in the regulation of inflammation‐related genes. Loss of Usp38 in mice markedly enhances susceptibility to endotoxin shock and acute colitis, and these mice display a more severe inflammatory phenotype compared to wild‐type mice. The studies identify USP38‐KDM5B as a distinct chromatin modification complex that restrains inflammatory responses through manipulating the crosstalk of histone ubiquitination and methylation., USP38 is a novel histone deubiquitinase of H2B that couples the regulation of histone ubiquitination (H2Bub) and methylation (H3K4me3) with KDM5B to selectively inhibit the transcription of the proinflammatory cytokines and prevent excessive inflammation. USP38 deficiency in mice enhances the inflammatory response, and renders animals more susceptible to acute inflammation and dextran sulfate sodium‐induced acute colitis.
- Published
- 2021
17. Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning
- Author
-
Chenglei Wu, Xin Yao, Lifeng Sun, Chao Zhou, Tianchi Huang, and Rui-Xiao Zhang
- Subjects
FOS: Computer and information sciences ,Computer Science - Artificial Intelligence ,Computer science ,media_common.quotation_subject ,Sample (statistics) ,02 engineering and technology ,Video quality ,Machine learning ,computer.software_genre ,Computer Science - Networking and Internet Architecture ,Perception ,0202 electrical engineering, electronic engineering, information engineering ,Quality (business) ,media_common ,Networking and Internet Architecture (cs.NI) ,business.industry ,020206 networking & telecommunications ,Construct (python library) ,Imitation learning ,Multimedia (cs.MM) ,Artificial Intelligence (cs.AI) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Computer Science - Multimedia - Abstract
Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it typically suffers from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco's neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real-world experiments, we demonstrate significant improvements of Comyco's sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time., ACM Multimedia 2019
- Published
- 2019
18. Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning
- Author
-
Rui-Xiao Zhang, Chenglei Wu, Tianchi Huang, Lifeng Sun, and Xin Yao
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Process (engineering) ,Cognitive Neuroscience ,Computer Vision and Pattern Recognition (cs.CV) ,Lifelong learning ,Models, Neurological ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (stat.ML) ,02 engineering and technology ,Machine learning ,computer.software_genre ,01 natural sciences ,Regularization (mathematics) ,010305 fluids & plasmas ,Task (project management) ,Machine Learning (cs.LG) ,Arts and Humanities (miscellaneous) ,Statistics - Machine Learning ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Humans ,Learning ,Forgetting ,Artificial neural network ,business.industry ,Brain ,Variety (cybernetics) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Neural Networks, Computer ,business ,computer - Abstract
Human beings are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed as \emph{Catastrophic Forgetting}, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts, e.g. \emph{Lifelong} or \emph{Continual} learning algorithms, have been proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require to store an excessive amount of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this paper, we focus on the incremental multi-task image classification scenario. Inspired by the learning process of human students, where they usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomenons, the proposed method gains even better performance than finetuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracies on new tasks and performance preservation on old tasks.
- Published
- 2019
19. Towards Faster and Better Federated Learning: A Feature Fusion Approach
- Author
-
Rui-Xiao Zhang, Chenglei Wu, Lifeng Sun, Tianchi Huang, and Xin Yao
- Subjects
Feature fusion ,Speedup ,Distributed database ,Computer science ,Distributed computing ,Feature extraction ,Initialization ,020206 networking & telecommunications ,020207 software engineering ,02 engineering and technology ,Federated learning ,Data modeling ,Server ,0202 electrical engineering, electronic engineering, information engineering - Abstract
Federated learning enables on-device training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT devices. However, the leading optimization algorithm in such settings, i.e., federated averaging, suffers from heavy communication cost and inevitable performance drop, especially when the local data is distributed in a Non-IID way. In this paper, we propose a feature fusion method to address this problem. By aggregating the features from both the local and global models, we achieve a higher accuracy at less communication cost. Furthermore, the feature fusion modules offer better initialization for newly incoming clients and thus speed up the process of convergence. Experiments in popular federated learning scenarios show that our federated learning algorithm with feature fusion mechanism outperforms baselines in both accuracy and generalization ability while reducing the number of communication rounds by more than 60%.
- Published
- 2019
20. Being more Effective and Interpretable
- Author
-
Rui-Xiao Zhang, Tianchi Huang, Xin Yao, Lifeng Sun, and Chenglei Wu
- Subjects
Bridging (networking) ,Computer science ,business.industry ,Heuristic ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,02 engineering and technology ,Artificial intelligence ,Heuristics ,business ,GeneralLiterature_MISCELLANEOUS - Abstract
In this poster, we propose several novel ABR approaches, namely BBA+ and MPC+, which are the fusion of heuristics and AI-based schemes. Results indicate that the proposed methods perform better than recent heuristic ABR methods. Meanwhile, such methods have also become more interpretable compared with AI-based schemes.
- Published
- 2019
21. Enhancing the crowdsourced live streaming
- Author
-
Xin Yao, Ming Ma, Rui-Xiao Zhang, Tianchi Huang, Lifeng Sun, Chenglei Wu, and Haitian Pang
- Subjects
Feature engineering ,Multimedia ,Computer science ,020206 networking & telecommunications ,Content delivery network ,02 engineering and technology ,computer.software_genre ,Live streaming ,Scheduling (computing) ,CLs upper limits ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,020201 artificial intelligence & image processing ,Quality of experience ,computer - Abstract
With the growing demand for crowdsourced live streaming (CLS), how to schedule the large-scale dynamic viewers effectively among different Content Delivery Network (CDN) providers has become one of the most significant challenges for CLS platforms. Although abundant algorithms have been proposed in recent years, they suffer from a critical limitation: due to their inaccurate feature engineering or naive rules, they cannot optimally schedule viewers. To address this concern, we propose LTS (Learn to schedule), a deep reinforcement learning (DRL) based scheduling approach that can dynamically adapt to the variation of both viewer traffics and CDN performance. After the extensive evaluation the real data from a leading CLS platform in China, we demonstrate that LTS improves the average quality of experience (QoE) over state-of-the-art approach by 8.71%-15.63%.
- Published
- 2019
22. Model-based teeth reconstruction
- Author
-
Chenglei Wu, Michael Zollhöfer, Christian Theobalt, Pablo Garrido, Derek Bradley, Thabo Beeler, and Markus Gross
- Subjects
Facial expression ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,stomatognathic diseases ,Photogrammetry ,stomatognathic system ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In recent years, sophisticated image-based reconstruction methods for the human face have been developed. These methods capture highly detailed static and dynamic geometry of the whole face, or specific models of face regions, such as hair, eyes or eye lids. Unfortunately, image-based methods to capture the mouth cavity in general, and the teeth in particular, have received very little attention. The accurate rendering of teeth, however, is crucial for the realistic display of facial expressions, and currently high quality face animations resort to tooth row models created by tedious manual work. In dentistry, special intra-oral scanners for teeth were developed, but they are invasive, expensive, cumbersome to use, and not readily available. In this paper, we therefore present the first approach for non-invasive reconstruction of an entire person-specific tooth row from just a sparse set of photographs of the mouth region. The basis of our approach is a new parametric tooth row prior learned from high quality dental scans. A new model-based reconstruction approach fits teeth to the photographs such that visible teeth are accurately matched and occluded teeth plausibly synthesized. Our approach seamlessly integrates into photogrammetric multi-camera reconstruction setups for entire faces, but also enables high quality teeth modeling from normal uncalibrated photographs and even short videos captured with a mobile phone.
- Published
- 2016
23. Corrective 3D reconstruction of lips from monocular video
- Author
-
Christian Theobalt, Pablo Garrido, Patrick Pérez, Derek Bradley, Thabo Beeler, Chenglei Wu, and Michael Zollhöfer
- Subjects
Monocular ,business.industry ,Computer science ,3D reconstruction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Monocular video ,020207 software engineering ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Motion capture ,GeneralLiterature_MISCELLANEOUS ,Motion (physics) ,Expression (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Computer facial animation ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.
- Published
- 2016
24. LRRC14 attenuates Toll-like receptor-mediated NF-κB signaling through disruption of IKK complex
- Author
-
Jun Cui, Chenglei Wu, Jiayu Ou, Yexin Yang, Wei Zhao, and Liang Zhu
- Subjects
0301 basic medicine ,Inflammation ,IκB kinase ,Biology ,Leucine-Rich Repeat Proteins ,Cell Line ,Proinflammatory cytokine ,Mice ,03 medical and health sciences ,chemistry.chemical_compound ,medicine ,Animals ,Humans ,RNA, Messenger ,Phosphorylation ,Toll-like receptor ,Innate immune system ,Toll-Like Receptors ,NF-kappa B ,Proteins ,NF-κB ,Cell Biology ,I-kappa B Kinase ,030104 developmental biology ,chemistry ,Cancer research ,Signal transduction ,medicine.symptom ,Protein Binding ,Signal Transduction - Abstract
Activation of NF-κB signaling plays pivotal roles in innate immune responses against pathogens. It requires strict control to avert inflammatory diseases. However, the mechanisms underlying this tight regulation are not completely understood. Here, we identified LRRC14, a novel member of LRR (leucine-rich repeat) protein family, as a negative regulator in TLR signaling. Expression of LRRC14 resulted in decreased activation of NF-κB, whereas knockdown of LRRC14 enhanced NF-κB activation as well as the production of inflammatory cytokines. Mechanistically, LRRC14 bound to HLH domain of IKKβ to block its interaction with NEMO and thereby inhibiting the phosphorylation of IKKβ and NF-κB activation. In addition, our data showed that TLR signaling led to lower expression of LRRC14. Together, LRRC14 may function as a checkpoint to prevent overzealous inflammation.
- Published
- 2016
25. An anatomically-constrained local deformation model for monocular face capture
- Author
-
Derek Bradley, Thabo Beeler, Markus Gross, and Chenglei Wu
- Subjects
Monocular ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Linear subspace ,Motion capture ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Subspace topology ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We present a new anatomically-constrained local face model and fitting approach for tracking 3D faces from 2D motion data in very high quality. In contrast to traditional global face models, often built from a large set of blendshapes, we propose a local deformation model composed of many small subspaces spatially distributed over the face. Our local model offers far more flexibility and expressiveness than global blendshape models, even with a much smaller model size. This flexibility would typically come at the cost of reduced robustness, in particular during the under-constrained task of monocular reconstruction. However, a key contribution of this work is that we consider the face anatomy and introduce subspace skin thickness constraints into our model, which constrain the face to only valid expressions and helps counteract depth ambiguities in monocular tracking. Given our new model, we present a novel fitting optimization that allows 3D facial performance reconstruction from a single view at extremely high quality, far beyond previous fitting approaches. Our model is flexible, and can be applied also when only sparse motion data is available, for example with marker-based motion capture or even face posing from artistic sketches. Furthermore, by incorporating anatomical constraints we can automatically estimate the rigid motion of the skull, obtaining a rigid stabilization of the performance for free. We demonstrate our model and single-view fitting method on a number of examples, including, for the first time, extreme local skin deformation caused by external forces such as wind, captured from a single high-speed camera.
- Published
- 2016
26. Inflammatory Response: USP38 Couples Histone Ubiquitination and Methylation via KDM5B to Resolve Inflammation (Adv. Sci. 22/2020)
- Author
-
Xiya Zhang, Jun Cui, Shuai Yang, Chenglei Wu, Zexiong Su, Di Liu, Junjiu Huang, Yaoxing Wu, Puping Liang, Zhiyao Zhao, Ling Ma, and Junyan Feng
- Subjects
Histone ubiquitination ,Chemistry ,General Chemical Engineering ,Inflammatory response ,General Engineering ,General Physics and Astronomy ,Medicine (miscellaneous) ,Inflammation ,Methylation ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,Cover Picture ,Cancer research ,medicine ,General Materials Science ,medicine.symptom - Abstract
In article number 2002680, Jun Cui and co‐workers identify a new histone modifier‐USP38, that modulates histone ubiquitination and methylation to resolve inflammatory response. USP38 acts as a “Safety Machine” which shuts down the inflammation by de‐ubiquitinating histone H2B and stabilizes KDM5B to de‐methylate H3K4, which blocks the transcription of pro‐inflammatory genes. The USP38‐KDM5B complex might be a potential clinical target for therapy against inflammatory diseases. [Image: see text]
- Published
- 2020
27. Modeling Facial Geometry Using Compositional VAEs
- Author
-
Yaser Sheikh, Timur Bagautdinov, Jason Saragih, Pascal Fua, and Chenglei Wu
- Subjects
business.industry ,Computer science ,Deep learning ,face modeling ,variational methods ,Structure (category theory) ,deep learning ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Autoencoder ,computer vision ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
We propose a method for learning non-linear face geometry representations using deep generative models. Our model is a variational autoencoder with multiple levels of hidden variables where lower layers capture global geometry and higher ones encode more local deformations. Based on that, we propose a new parameterization of facial geometry that naturally decomposes the structure of the human face into a set of semantically meaningful levels of detail. This parameterization enables us to do model fitting while capturing varying level of detail under different types of geometrical constraints.
- Published
- 2018
28. Learning Patch Reconstructability for Accelerating Multi-view Stereo
- Author
-
Yaser Sheikh, Alex Poms, Shoou-I Yu, and Chenglei Wu
- Subjects
Artificial neural network ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Stereo matching ,020207 software engineering ,02 engineering and technology ,Iterative reconstruction ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Specular reflection ,business ,Surface reconstruction ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We present an approach to accelerate multi-view stereo (MVS) by prioritizing computation on image patches that are likely to produce accurate 3D surface reconstructions. Our key insight is that the accuracy of the surface reconstruction from a given image patch can be predicted significantly faster than performing the actual stereo matching. The intuition is that non-specular, fronto-parallel, in-focus patches are more likely to produce accurate surface reconstructions than highly specular, slanted, blurry patches - and that these properties can be reliably predicted from the image itself. By prioritizing stereo matching on a subset of patches that are highly reconstructable and also cover the 3D surface, we are able to accelerate MVS with minimal reduction in accuracy and completeness. To predict the reconstructability score of an image patch from a single view, we train an image-to-reconstructability neural network: the I2RNet. This reconstructability score enables us to efficiently identify image patches that are likely to provide the most accurate surface estimates before performing stereo matching. We demonstrate that the I2RNet, when trained on the ScanNet dataset, generalizes to the DTU and Tanks & Temples MVS datasets. By using our I2RNet with an existing MVS implementation, we show that our method can achieve more than a 30A— speed-up over the baseline with only an minimal loss in completeness.
- Published
- 2018
29. Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video Streaming
- Author
-
Chenglei Wu, Rui-Xiao Zhang, Zhangyuan Pang, Xin Yao, Lifeng Sun, and Tianchi Huang
- Subjects
FOS: Computer and information sciences ,Sequence ,Computer science ,Estimation theory ,business.industry ,Testbed ,020206 networking & telecommunications ,Throughput ,02 engineering and technology ,Machine learning ,computer.software_genre ,Multimedia (cs.MM) ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,Quality of experience ,Artificial intelligence ,Set (psychology) ,business ,computer ,Computer Science - Multimedia - Abstract
Existing reinforcement learning~(RL)-based adaptive bitrate~(ABR) approaches outperform the previous fixed control rules based methods by improving the Quality of Experience~(QoE) score, as the QoE metric can hardly provide clear guidance for optimization, finally resulting in the unexpected strategies. In this paper, we propose \emph{Tiyuntsong}, a self-play reinforcement learning approach with generative adversarial network~(GAN)-based method for ABR video streaming. Tiyuntsong learns strategies automatically by training two agents who are competing against each other. Note that the competition results are determined by a set of rules rather than a numerical QoE score that allows clearer optimization objectives. Meanwhile, we propose GAN Enhancement Module to extract hidden features from the past status for preserving the information without the limitations of sequence lengths. Using testbed experiments, we show that the utilization of GAN significantly improves the Tiyuntsong's performance. By comparing the performance of ABRs, we observe that Tiyuntsong also betters existing ABR algorithms in the underlying metrics., Comment: Published in ICME 2019
- Published
- 2018
- Full Text
- View/download PDF
30. DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs
- Author
-
Shi Yan, Liang An, Yebin Liu, Kaiwen Guo, Lizhen Wang, Feng Xu, and Chenglei Wu
- Subjects
business.industry ,Computer science ,Color image ,Noise reduction ,020207 software engineering ,02 engineering and technology ,Convolutional neural network ,Rendering equation ,Depth map ,Limit (music) ,0202 electrical engineering, electronic engineering, information engineering ,Unsupervised learning ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business - Abstract
Consumer depth sensors are more and more popular and come to our daily lives marked by its recent integration in the latest Iphone X. However, they still suffer from heavy noises which limit their applications. Although plenty of progresses have been made to reduce the noises and boost geometric details, due to the inherent illness and the real-time requirement, the problem is still far from been solved. We propose a cascaded Depth Denoising and Refinement Network (DDRNet) to tackle this problem by leveraging the multi-frame fused geometry and the accompanying high quality color image through a joint training strategy. The rendering equation is exploited in our network in an unsupervised manner. In detail, we impose an unsupervised loss based on the light transport to extract the high-frequency geometry. Experimental results indicate that our network achieves real-time single depth enhancement on various categories of scenes. Thanks to the well decoupling of the low and high frequency information in the cascaded network, we achieve superior performance over the state-of-the-art techniques.
- Published
- 2018
31. NLRP11 attenuates Toll-like receptor signalling by targeting TRAF6 for degradation via the ubiquitin ligase RNF19A
- Author
-
Jiayu Ou, Chenglei Wu, Jun Cui, Wei Zhao, Zexiong Su, Meng Lin, and Rongfu Wang
- Subjects
Lipopolysaccharides ,0301 basic medicine ,THP-1 Cells ,T-Lymphocytes ,Ubiquitin-Protein Ligases ,Science ,Gene Expression ,General Physics and Astronomy ,Article ,General Biochemistry, Genetics and Molecular Biology ,Cell Line ,Proinflammatory cytokine ,Gene Knockout Techniques ,03 medical and health sciences ,Ubiquitin ,Animals ,Humans ,lcsh:Science ,TNF Receptor-Associated Factor 6 ,B-Lymphocytes ,Toll-like receptor ,Multidisciplinary ,biology ,Chemistry ,Toll-Like Receptors ,HEK 293 cells ,Intracellular Signaling Peptides and Proteins ,NF-kappa B ,Ubiquitination ,Signal transducing adaptor protein ,General Chemistry ,TNF Receptor-Associated Factor 2 ,NFKB1 ,Ubiquitin ligase ,Cell biology ,HEK293 Cells ,030104 developmental biology ,Mutagenesis, Site-Directed ,biology.protein ,Cytokines ,lcsh:Q ,Signal transduction ,Signal Transduction - Abstract
The adaptor protein TRAF6 has a central function in Toll-like receptor (TLR) signalling, yet the molecular mechanisms controlling its activity and stability are unclear. Here we show that NLRP11, a primate specific gene, inhibits TLR signalling by targeting TRAF6 for degradation. NLRP11 recruits the ubiquitin ligase RNF19A to catalyze K48-linked ubiquitination of TRAF6 at multiple sites, thereby leading to the degradation of TRAF6. Furthermore, deficiency in either NLRP11 or RNF19A abrogates K48-linked ubiquitination and degradation of TRAF6, which promotes activation of NF-κB and MAPK signalling and increases the production of proinflammatory cytokines. Therefore, our findings identify NLRP11 as a conserved negative regulator of TLR signalling in primate cells and reveal a mechanism by which the NLRP11-RNF19A axis targets TRAF6 for degradation., NLRP11 is a primate-specific NOD-like receptor with unclear function. Here the authors show NLRP11 is an inhibitory protein that targets TRAF6 for K48 ubiquitination-mediated proteasomal degradation to limit inflammatory responses.
- Published
- 2017
32. Shading-based refinement on volumetric signed distance functions
- Author
-
Michael Zollhöfer, Christian Theobalt, Chenglei Wu, Matthias Innmann, Matthias Nießner, Angela Dai, and Marc Stamminger
- Subjects
business.industry ,3D reconstruction ,Hash function ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Signed distance function ,Solver ,computer.software_genre ,Grid ,Computer Graphics and Computer-Aided Design ,Voxel ,RGB color model ,Computer vision ,Artificial intelligence ,business ,computer ,Distance transform ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
We present a novel method to obtain fine-scale detail in 3D reconstructions generated with low-budget RGB-D cameras or other commodity scanning devices. As the depth data of these sensors is noisy, truncated signed distance fields are typically used to regularize out the noise, which unfortunately leads to over-smoothed results. In our approach, we leverage RGB data to refine these reconstructions through shading cues, as color input is typically of much higher resolution than the depth data. As a result, we obtain reconstructions with high geometric detail, far beyond the depth resolution of the camera itself. Our core contribution is shading-based refinement directly on the implicit surface representation, which is generated from globally-aligned RGB-D images. We formulate the inverse shading problem on the volumetric distance field, and present a novel objective function which jointly optimizes for fine-scale surface geometry and spatially-varying surface reflectance. In order to enable the efficient reconstruction of sub-millimeter detail, we store and process our surface using a sparse voxel hashing scheme which we augment by introducing a grid hierarchy. A tailored GPU-based Gauss-Newton solver enables us to refine large shape models to previously unseen resolution within only a few seconds.
- Published
- 2015
33. Comparison of user satisfaction and image quality of fixed and mobile camera systems for 3-dimensional image capture of edentulous patients: A pilot clinical study
- Author
-
Thabo Beeler, Shiming Liu, Marcel Lancelle, Murali Srinivasan, Vincent Fehmer, Markus Gross, Chenglei Wu, Irena Sailer, and Roland Mörzinger
- Subjects
Male ,Dental/instrumentation/standards ,Computer science ,Image quality ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pilot Projects ,Imaging ,Clinical study ,03 medical and health sciences ,0302 clinical medicine ,Imaging, Three-Dimensional ,Photography ,Participant perceptions ,Humans ,Computer vision ,Mobile camera ,030223 otorhinolaryngology ,Aged ,Mouth ,Facial expression ,business.industry ,User satisfaction ,030206 dentistry ,Consumer Behavior ,Image capture ,ddc:617.6 ,Facial Expression ,Edentulous/diagnostic imaging/pathology ,Patient Satisfaction ,Photography, Dental ,Female ,Artificial intelligence ,Oral Surgery ,Mouth, Edentulous ,business ,Three-Dimensional/instrumentation/standards ,Mobile device - Abstract
Statement of problem An evaluation of user satisfaction and image quality of a novel handheld purpose-built mobile camera system for 3-dimensional (3D) facial acquisition is lacking. Purpose The purpose of this pilot clinical study was to assess and compare the effectiveness between a handheld mobile camera system designed for facial acquisition and a fixed static camera arrangement by comparing the time effectiveness and the operator and participant preference for the 2 techniques of image capture. Material and methods Completely edentulous participants (n=12: women=7, men=5; mean age: 74.6 years) were included in this pilot study. Images were captured with and without the prostheses in situ while maintaining “serious” and “full-smile” facial expressions. Images were captured using a mobile and a static system. The working times for the participant installation and image captures were recorded. Operator and participant perceptions of the entire experience were recorded by using visual analog scale questionnaires. Nonparametric tests were used for statistical analyses (α=.05). Results The installation time was significantly shorter for the mobile system (static=24 ±13 seconds; mobile=10 ±10 seconds), but the differences in the image capture times were not statistically significant (static: 29 ±5 seconds; mobile: 40 ±18 seconds). Operator preference was in favor of the mobile system with regard to working time (P=.002), difficulty in using (installation: P=.002; handling: P=.045), and camera weight (P=.002); however, they preferred the static arrangement for image quality (P=.003) and comfort (P=.013). The participants rated the entire photographic experience favorably, and 10 of 12 participants preferred the static camera over the mobile one. Conclusions Despite the complexity of the installation, the static system was evaluated better for image quality; the mobile system was easier in installation and handling. The operators preferred the mobile system, and the participants preferred the static system.
- Published
- 2017
34. Real-time non-rigid reconstruction using an RGB-D camera
- Author
-
Christian Theobalt, Christoph Rehmann, Shahram Izadi, Michael Zollhöfer, Matthias Nießner, Chenglei Wu, Charles Loop, Matthew Fisher, Christopher Zach, Andrew Fitzgibbon, and Marc Stamminger
- Subjects
Computer science ,business.industry ,Graphics hardware ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,Kinematics ,Computer Graphics and Computer-Aided Design ,Motion capture ,Computer graphics ,Robustness (computer science) ,Computer Science::Computer Vision and Pattern Recognition ,Computer graphics (images) ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Stereo camera ,Surface reconstruction ,ComputingMethodologies_COMPUTERGRAPHICS ,Parametric statistics - Abstract
We present a combined hardware and software solution for markerless reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time . Our system uses a single self-contained stereo camera unit built from off-the-shelf components and consumer graphics hardware to generate spatio-temporally coherent 3D models at 30 Hz. A new stereo matching algorithm estimates real-time RGB-D data. We start by scanning a smooth template model of the subject as they move rigidly. This geometric surface prior avoids strong scene assumptions, such as a kinematic human skeleton or a parametric shape model. Next, a novel GPU pipeline performs non-rigid registration of live RGB-D data to the smooth template using an extended non-linear as-rigid-as-possible (ARAP) framework. High-frequency details are fused onto the final mesh using a linear deformation model. The system is an order of magnitude faster than state-of-the-art methods, while matching the quality and robustness of many offline algorithms. We show precise real-time reconstructions of diverse scenes, including: large deformations of users' heads, hands, and upper bodies; fine-scale wrinkles and folds of skin and clothing; and non-rigid interactions performed by users on flexible objects such as toys. We demonstrate how acquired models can be used for many interactive scenarios, including re-texturing, online performance capture and preview, and real-time shape and motion re-targeting.
- Published
- 2014
35. Crowdsourced Live Streaming over Aggregated Edge Networks
- Author
-
Chenglei Wu, Jiangchuan Liu, Shiqiang Yang, and Zhi Wang
- Subjects
Upload ,business.industry ,Computer science ,Server ,Telecommunications link ,Cellular network ,Cloud computing ,Network interface ,business ,Live streaming ,Computer network - Abstract
Recent years have witnessed a dramatic increase of user-generated video services. In such user-generated video services, crowdsourced live streaming (e.g., Periscope, Twitch) has significantly challenged today's content delivery infrastructure: today's edge networks (e.g., 4G, Wi-Fi) have limited uplink capacity support, making high-bitrate live streaming over such links fundamentally impossible. In this paper, we propose to let broadcasters (i.e., users who generate the video) upload crowdsourced video streams using aggregated network resources from multiple edge networks. There are several challenges in the proposal: First, how to design a framework that aggregates bandwidth from multiple edge networks? Second, how to make this framework transparent to today's crowdsourced live stream- ing services? Third, how to maximize the streaming quality for the whole system? We design a multi-objective and deployable bandwidth aggregation system BASS to address these challenges: (1) We propose an aggregation framework transparent to today's crowdsourced live streaming services, using an edge proxy box and aggregation cloud paradigm; (2) We dynamically allocate geo- distributed cloud aggregation servers to enable MPTCP (i.e., multi- path TCP), according to location and network characteristics of both broadcasters and the original streaming servers; (3) We maximize the overall performance gain for the whole system, by matching streams with the best aggregation paths.
- Published
- 2016
36. On-set performance capture of multiple actors with a stereo camera
- Author
-
Levi Valgaerts, Christian Theobalt, Carsten Stoll, and Chenglei Wu
- Subjects
Computer science ,business.industry ,Photography ,Principal (computer security) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Graphics and Computer-Aided Design ,Motion capture ,Motion (physics) ,Video editing ,Computer graphics (images) ,Computer vision ,Segmentation ,Artificial intelligence ,Set (psychology) ,business ,Stereo camera ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
State-of-the-art marker-less performance capture algorithms reconstruct detailed human skeletal motion and space-time coherent surface geometry. Despite being a big improvement over marker-based motion capture methods, they are still rarely applied in practical VFX productions as they require ten or more cameras and a studio with controlled lighting or a green screen background. If one was able to capture performances directly on a general set using only the primary stereo camera used for principal photography, many possibilities would open up in virtual production and previsualization, the creation of virtual actors, and video editing during post-production. We describe a new algorithm which works towards this goal. It is able to track skeletal motion and detailed surface geometry of one or more actors from footage recorded with a stereo rig that is allowed to move. It succeeds in general sets with uncontrolled background and uncontrolled illumination, and scenes in which actors strike non-frontal poses. It is one of the first performance capture methods to exploit detailed BRDF information and scene illumination for accurate pose tracking and surface refinement in general scenes. It also relies on a new foreground segmentation approach that combines appearance, stereo, and pose tracking results to segment out actors from the background. Appearance, segmentation, and motion cues are combined in a new pose optimization framework that is robust under uncontrolled lighting, uncontrolled background and very sparse camera views.
- Published
- 2013
37. Reconstructing detailed dynamic face geometry from monocular video
- Author
-
Christian Theobalt, Pablo Garrido, Chenglei Wu, and Levi Valgaert
- Subjects
Monocular ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Process (computing) ,Optical flow ,Initialization ,Computer Graphics and Computer-Aided Design ,Motion capture ,Expression (mathematics) ,Photometric stereo ,Computer graphics (images) ,Face (geometry) ,Computer vision ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Detailed facial performance geometry can be reconstructed using dense camera and light setups in controlled studios. However, a wide range of important applications cannot employ these approaches, including all movie productions shot from a single principal camera. For post-production, these require dynamic monocular face capture for appearance modification. We present a new method for capturing face geometry from monocular video. Our approach captures detailed, dynamic, spatio-temporally coherent 3D face geometry without the need for markers. It works under uncontrolled lighting, and it successfully reconstructs expressive motion including high-frequency face detail such as folds and laugh lines. After simple manual initialization, the capturing process is fully automatic, which makes it versatile, lightweight and easy-to-deploy. Our approach tracks accurate sparse 2D features between automatically selected key frames to animate a parametric blend shape model, which is further refined in pose, expression and shape by temporally coherent optical flow and photometric stereo. We demonstrate performance capture results for long and complex face sequences captured indoors and outdoors, and we exemplify the relevance of our approach as an enabling technology for model-based face editing in movies and video, such as adding new facial textures, as well as a step towards enabling everyone to do facial performance capture with a single affordable camera.
- Published
- 2013
38. Capturing Relightable Human Performances under General Uncontrolled Illumination
- Author
-
Kiran Varanasi, Christian Theobalt, Chenglei Wu, Yebin Liu, Guannan Li, Qionghai Dai, and Carsten Stoll
- Subjects
Surface (mathematics) ,Basis (linear algebra) ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Spherical harmonics ,Tracking (particle physics) ,Computer Graphics and Computer-Aided Design ,Reflectivity ,Motion capture ,Wavelet ,Path (graph theory) ,Surface geometry ,Computer vision ,Artificial intelligence ,Shading ,Diffuse reflection ,Bidirectional reflectance distribution function ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We present a novel approach to create relightable free-viewpoint human performances from multi-view video recorded under general uncontrolled and uncalibated illumination. We first capture a multi-view sequence of an actor wearing arbitrary apparel and reconstruct a spatio-temporal coherent coarse 3D model of the performance using a marker-less tracking approach. Using these coarse reconstructions, we estimate the low-frequency component of the illumination in a spherical harmonics (SH) basis as well as the diffuse reflectance, and then utilize them to estimate the dynamic geometry detail of human actors based on shading cues. Given the high-quality time-varying geometry, the estimated illumination is extended to the all-frequency domain by re-estimating it in the wavelet basis. Finally, the high-quality all-frequency illumination is utilized to reconstruct the spatially-varying BRDF of the surface. The recovered time-varying surface geometry and spatially-varying non-Lambertian reflectance allow us to generate high-quality model-based free view-point videos of the actor under novel illumination conditions. Our method enables plausible reconstruction of relightable dynamic scene models without a complex controlled lighting apparatus, and opens up a path towards relightable performance capture in less constrained environments and using less complex acquisition setups.
- Published
- 2013
39. Lightweight binocular facial performance capture under uncontrolled lighting
- Author
-
Andrés Bruhn, Levi Valgaerts, Hans-Peter Seidel, Christian Theobalt, and Chenglei Wu
- Subjects
Stereo cameras ,Facial motion capture ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Graphics and Computer-Aided Design ,Motion capture ,Pipeline (software) ,Motion (physics) ,Computer graphics (images) ,Computer vision ,Artificial intelligence ,Set (psychology) ,business ,Stereo camera ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Recent progress in passive facial performance capture has shown impressively detailed results on highly articulated motion. However, most methods rely on complex multi-camera set-ups, controlled lighting or fiducial markers. This prevents them from being used in general environments, outdoor scenes, during live action on a film set, or by freelance animators and everyday users who want to capture their digital selves. In this paper, we therefore propose a lightweight passive facial performance capture approach that is able to reconstruct high-quality dynamic facial geometry from only a single pair of stereo cameras. Our method succeeds under uncontrolled and time-varying lighting, and also in outdoor scenes. Our approach builds upon and extends recent image-based scene flow computation, lighting estimation and shading-based refinement algorithms. It integrates them into a pipeline that is specifically tailored towards facial performance reconstruction from challenging binocular footage under uncontrolled lighting. In an experimental evaluation, the strong capabilities of our method become explicit: We achieve detailed and spatio-temporally coherent results for expressive facial motion in both indoor and outdoor scenes -- even from low quality input images recorded with a hand-held consumer stereo camera. We believe that our approach is the first to capture facial performances of such high quality from a single stereo rig and we demonstrate that it brings facial performance capture out of the studio, into the wild, and within the reach of everybody.
- Published
- 2012
40. Full Body Performance Capture under Uncontrolled and Varying Illumination: A Shading-Based Approach
- Author
-
Kiran Varanasi, Christian Theobalt, and Chenglei Wu
- Subjects
Computer science ,business.industry ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Albedo ,Motion capture ,Photometric stereo ,Computer vision ,Segmentation ,Artificial intelligence ,Shading ,Set (psychology) ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
This paper presents a marker-less method for full body human performance capture by analyzing shading information from a sequence of multi-view images, which are recorded under uncontrolled and changing lighting conditions. Both the articulated motion of the limbs and then the fine-scale surface detail are estimated in a temporally coherent manner. In a temporal framework, differential 3D human pose-changes from the previous time-step are expressed in terms of constraints on the visible image displacements derived from shading cues, estimated albedo and estimated scene illumination. The incident illumination at each frame are estimated jointly with pose, by assuming the Lambertian model of reflectance. The proposed method is independent of image silhouettes and training data, and is thus applicable in cases where background segmentation cannot be performed or a set of training poses is unavailable. We show results on challenging cases for pose-tracking such as changing backgrounds, occlusions and changing lighting conditions.
- Published
- 2012
41. Shading-based dynamic shape refinement from multi-view video under general illumination
- Author
-
Christian Theobalt, Chenglei Wu, Hans-Peter Seidel, Kiran Varanasi, and Yebin Liu
- Subjects
business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Iterative reconstruction ,Albedo ,Computational geometry ,Computer graphics ,Prior probability ,Maximum a posteriori estimation ,Polygon mesh ,Computer vision ,Artificial intelligence ,business ,Surface reconstruction ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
We present an approach to add true fine-scale spatio-temporal shape detail to dynamic scene geometry captured from multi-view video footage. Our approach exploits shading information to recover the millimeter-scale surface structure, but in contrast to related approaches succeeds under general unconstrained lighting conditions. Our method starts off from a set of multi-view video frames and an initial series of reconstructed coarse 3D meshes that lack any surface detail. In a spatio-temporal maximum a posteriori probability (MAP) inference framework, our approach first estimates the incident illumination and the spatially-varying albedo map on the mesh surface for every time instant. Thereafter, albedo and illumination are used to estimate the true geometric detail visible in the images and add it to the coarse reconstructions. The MAP framework uses weak temporal priors on lighting, albedo and geometry which improve reconstruction quality yet allow for temporal variations in the data.
- Published
- 2011
42. Fusing multiview and photometric stereo for 3D reconstruction under uncalibrated illumination
- Author
-
Qionghai Dai, Bennett Wilburn, Yebin Liu, and Chenglei Wu
- Subjects
Computer science ,business.industry ,3D reconstruction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Spherical harmonics ,Iterative reconstruction ,Computer Graphics and Computer-Aided Design ,Lambertian reflectance ,Photometric stereo ,Signal Processing ,Metric (mathematics) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,Surface reconstruction ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We propose a method to obtain a complete and accurate 3D model from multiview images captured under a variety of unknown illuminations. Based on recent results showing that for Lambertian objects, general illumination can be approximated well using low-order spherical harmonics, we develop a robust alternating approach to recover surface normals. Surface normals are initialized using a multi-illumination multiview stereo algorithm, then refined using a robust alternating optimization method based on the l(1) metric. Erroneous normal estimates are detected using a shape prior. Finally, the computed normals are used to improve the preliminary 3D model. The reconstruction system achieves watertight and robust 3D reconstruction while neither requiring manual interactions nor imposing any constraints on the illumination. Experimental results on both real world and synthetic data show that the technique can acquire accurate 3D models for Lambertian surfaces, and even tolerates small violations of the Lambertian assumption.
- Published
- 2011
43. High-quality shape from multi-view stereo and shading under general illumination
- Author
-
Chenglei Wu, Christian Theobalt, Bennett Wilburn, and Yasuyuki Matsushita
- Subjects
Surface (mathematics) ,Quality (physics) ,Computer science ,business.industry ,Computer graphics (images) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Range (statistics) ,Computer vision ,Shading ,Artificial intelligence ,Iterative reconstruction ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Multi-view stereo methods reconstruct 3D geometry from images well for sufficiently textured scenes, but often fail to recover high-frequency surface detail, particularly for smoothly shaded surfaces. On the other hand, shape-from-shading methods can recover fine detail from shading variations. Unfortunately, it is non-trivial to apply shape-from-shading alone to multi-view data, and most shading-based estimation methods only succeed under very restricted or controlled illumination. We present a new algorithm that combines multi-view stereo and shading-based refinement for high-quality reconstruction of 3D geometry models from images taken under constant but otherwise arbitrary illumination. We have tested our algorithm on several scenes that were captured under several general and unknown lighting conditions, and we show that our final reconstructions rival laser range scans.
- Published
- 2011
44. Multi-view reconstruction under varying illumination conditions
- Author
-
Xiangyang Ji, Qionghai Dai, Yebin Liu, and Chenglei Wu
- Subjects
Pixel ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Point cloud ,Iterative reconstruction ,Photometry (optics) ,Stereopsis ,Photometric stereo ,Normal mapping ,Computer vision ,Artificial intelligence ,business ,Surface reconstruction ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
This paper addresses the problem of complete and detailed 3D model reconstruction of objects filmed by multiple cameras under varying illumination. Firstly, initial normal maps are obtained to enhance the correspondence mapping. Then, the depth for every pixel is estimated by combining photometric constraint with occlusion robust photo-consistency. Finally, after filtering the point cloud, a Poisson surface reconstruction is applied to obtain a watertight mesh. In contrast with traditional photometric stereo techniques, the proposed algorithm does not directly calculate the photometric normal but integrates the photometric constraint into the depth estimation. Furthermore, different from classic multi-view stereo(MVS), we consider the counterpart under changing light conditions. The algorithm has been implemented based on our multi-camera and multi-light acquisition system. We validate the method by complete reconstruction of challenging real objects and show experimentally that this technique can greatly improve on correspondence-based MVS results.
- Published
- 2009
45. Accurate 3D reconstruction via surface-consistency
- Author
-
Chenglei Wu, Qionghai Dai, and Xun Cao
- Subjects
Pixel ,Computer science ,business.industry ,3D reconstruction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Point cloud ,Iterative reconstruction ,Stereopsis ,Photometric stereo ,Computer Science::Computer Vision and Pattern Recognition ,Computer vision ,Artificial intelligence ,business ,Normal ,Surface reconstruction ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We present an algorithm that fuses Multi-view stereo (MVS) and photometric stereo to reconstruct 3D model of objects filmed by multiple cameras under varying illuminations. Firstly, we obtain the surface normal scaled by albedo for each view through photometric stereo techniques. Then, based on the scaled normal, a new correspondence matching method, namely surface-consistency metric, is proposed to acquire accurate 3D positions of pixels through triangulation. After filtering the point cloud, a Poisson surface reconstruction is applied to obtain a watertight mesh. The algorithm has been implemented based on our multi-camera and multi-light acquisition system. We validate the method by complete reconstruction of challenging real objects and show experimentally that this technique can greatly improve on previous MVS results.
- Published
- 2009
46. A Novel Method for Semi-automatic 2D to 3D Video Conversion
- Author
-
Qionghai Dai, Guihua Er, Xun Cao, Tao Li, Chenglei Wu, and Xudong Xie
- Subjects
Motion compensation ,Video post-processing ,Video capture ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Video processing ,Video compression picture types ,Computer graphics (images) ,Video tracking ,Computer vision ,Artificial intelligence ,Multiview Video Coding ,business ,Block-matching algorithm - Abstract
In this paper, we present a novel semi-automatic method for converting monoscopic video to stereoscopic video. An efficient interactive image cutout tool is first used to segment the object-of-interest in the key frames. Then, we assign the initial depth information to the segmented objects. These objects are tracked in the whole video sequence through a bi-directional KIT (Kanade-Iucas- Tomashi) algorithm. In addition, depth interpolation is employed to produce the depth information in the non-key frames. Finally, stereoscopic video is synthesized in consideration of different 3D display types. The experimental results show that our method can fulfill 2D to 3D video conversion both reliably and efficiently.
- Published
- 2008
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.