28 results on '"Ling, Nam"'
Search Results
2. Human centered perceptual adaptation for video coding
- Author
-
Tong, Minglei, Gu, Zhouye, Ling, Nam, and Yang, Junjie
- Published
- 2016
- Full Text
- View/download PDF
3. An efficient VC-1 to H.264 IPB-picture transcoder with pixel domain processing
- Author
-
Pantoja, Maria, Ling, Nam, Kalva, Hari, and Lee, Jae-Beom
- Published
- 2015
- Full Text
- View/download PDF
4. Transformer-Based Data-Driven Video Coding Acceleration for Industrial Applications.
- Author
-
Li, Yixiao, Li, Lixiang, Zhuang, Zirui, Fang, Yuan, Peng, Haipeng, and Ling, Nam
- Subjects
VIDEO coding ,INDUSTRIAL applications ,COMPUTER vision ,INDUSTRIAL capacity ,STREAMING video & television ,MANUFACTURING processes - Abstract
With the exploding development of edge intelligence and smart industry, deep learning-based intelligent industrial solutions are promptly applied in the manufacturing process. Many intelligent industrial solutions such as automatic manufacturing inspection are computer vision based and require fast and efficient video encoding techniques so that video streams can be processed as quickly as possible either at the edge cluster or over the cloud. As one of the most popular video coding standards, the high efficiency video coding (HEVC) standard has been applied to various industrial scenes. However, HEVC brings not only a higher compression rate but also a significant increase in encoding complexity, which hinders its practical application in industrial scenarios. Fortunately, a large amount of video coding data makes it possible to accelerate the encoding process in the industry. To speed up the video coding process in some industrial scenes, this paper proposes a data-driven fast approach for coding tree unit (CTU) partitioning in HEVC intracoding. First, we propose a method to represent the partition result of a CTU as a column vector of length 21. Then, we employ lots of encoding data produced in normal industry scenes to train transformer models used to predict the partitioning vector of the CTU. Finally, the final partitioning structure of the CTU is generated from the partitioning vector after a postprocessing operation and used by an industrial encoder. Compared with the original HEVC encoder used by some industrial applications, experiment results show that our approach achieves 58.77% encoding time reduction with 3.9% bit rate loss, which indicates that our data-driven approach for video coding has great capacity working in industrial applications. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. RDEN: Residual Distillation Enhanced Network-Guided Lightweight Synthesized View Quality Enhancement for 3D-HEVC.
- Author
-
Pan, Zhaoqing, Yuan, Feng, Yu, Weijie, Lei, Jianjun, Ling, Nam, and Kwong, Sam
- Subjects
DISTILLATION ,FEATURE extraction ,DEPTH perception ,MULTIMEDIA systems ,VIDEO coding ,COMPUTATIONAL complexity ,RENDERING (Computer graphics) - Abstract
In the three-dimensional video system, the depth image-based rendering is a key technique for generating synthesized views, which provides audiences with depth perception and interactivity. However, the inaccuracy of depth information leads to geometrical rendering position errors, and the compression distortion of texture and depth videos degrades the quality of the synthesized views. Although existing quality enhancement methods can eliminate the distortions in the synthesized views, their huge computational complexity hinders their applications in real-time multimedia systems. To this end, a residual distillation enhanced network (RDEN)-guided lightweight synthesized view quality enhancement (SVQE) method is proposed to minimize holes and compression distortions in the synthesized views while reducing the model complexity. First, a rethinking on the deep-learning-based SVQE methods is performed. Then, a feature distillation attention block is proposed to effectively reduce the distortions in the synthesized views and make the model fulfill more real-time tasks, which is a lightweight and flexible feature extraction block using an information distillation mechanism and a lightweight multi-scale spatial attention mechanism. Third, a residual feature fusion block is proposed to improve the enhancement performance by using the feature fusion mechanism, which efficiently improves the feature extraction capability without introducing any additional parameters. Experimental results prove that the proposed RDEN efficiently improves the SVQE performance while consuming few computational complexities compared with the state-of-the-art SVQE methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Multiple Resolution Prediction With Deep Up-Sampling for Depth Video Coding.
- Author
-
Li, Ge, Lei, Jianjun, Pan, Zhaoqing, Peng, Bo, and Ling, Nam
- Subjects
VIDEO coding ,IMAGE color analysis ,COMPUTATIONAL complexity ,FORECASTING - Abstract
The depth video contains large smooth contents with sharp edges. Since the deep learning-based color video orientated intra prediction methods pay no attention to the characteristics of depth video, they are unsuitable for optimizing the coding efficiency of depth video. In this paper, a multiple resolution prediction method with deep up-sampling is proposed to promote the coding efficiency of depth video. To efficiently encode the depth blocks of different complexity, the depth block is selectively encoded at different resolutions, including $\times 1$ , $\times 1$ /2, and $\times 1$ /4 resolutions. If the block is encoded with a low-resolution (LR), the resolution of reconstructed LR depth block is recovered by an up-sampling network. To constrain the quality of both reconstructed high-resolution depth block and its synthesized view, a view synthesis distortion guidance mechanism is proposed for the up-sampling network. In addition, a distillation-based lightweight up-sampling network is proposed to reduce the computational complexity. Experimental results demonstrate that the proposed multiple resolution prediction method obtains an average of 10.84% BD-rate saving in comparison with 3D-HEVC. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Transcoding with Resolution Conversion Using Super-Resolution and Irregular Sampling
- Author
-
Pantoja, Maria and Ling, Nam
- Published
- 2010
- Full Text
- View/download PDF
8. Disparity-Aware Reference Frame Generation Network for Multiview Video Coding.
- Author
-
Lei, Jianjun, Zhang, Zongqian, Pan, Zhaoqing, Liu, Dong, Liu, Xiangrui, Chen, Ying, and Ling, Nam
- Subjects
VIDEO coding ,IMAGE reconstruction - Abstract
Multiview video coding (MVC) aims to compress the multiview video through the elimination of video redundancies, where the quality of the reference frame directly affects the compression efficiency. In this paper, we propose a deep virtual reference frame generation method based on a disparity-aware reference frame generation network (DAG-Net) to transform the disparity relationship between different viewpoints and generate a more reliable reference frame. The proposed DAG-Net consists of a multi-level receptive field module, a disparity-aware alignment module, and a fusion reconstruction module. First, a multi-level receptive field module is designed to enlarge the receptive field, and extract the multi-scale deep features of the temporal and inter-view reference frames. Then, a disparity-aware alignment module is proposed to learn the disparity relationship, and perform disparity shift on the inter-view reference frame to align it with the temporal reference frame. Finally, a fusion reconstruction module is utilized to fuse the complementary information and generate a more reliable virtual reference frame. Experiments demonstrate that the proposed reference frame generation method achieves superior performance for multiview video coding. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Deep Affine Motion Compensation Network for Inter Prediction in VVC.
- Author
-
Jin, Dengchao, Lei, Jianjun, Peng, Bo, Li, Wanqing, Ling, Nam, and Huang, Qingming
- Subjects
VIDEO coding ,VIDEO compression ,FORECASTING ,FEATURE extraction - Abstract
In video coding, it is a challenge to deal with scenes with complex motions, such as rotation and zooming. Although affine motion compensation (AMC) is employed in Versatile Video Coding (VVC), it is still difficult to handle non-translational motions due to the adopted hand-craft block-based motion compensation. In this paper, we propose a deep affine motion compensation network (DAMC-Net) for inter prediction in video coding to effectively improve the prediction accuracy. To the best of our knowledge, our work is the first attempt to deal with the deformable motion compensation based on CNN in VVC. Specifically, a deformable motion-compensated prediction (DMCP) module is proposed to compensate the current encoding block through a learnable way to estimate accurate motion fields. Meanwhile, the spatial neighboring information and the temporal reference block as well as the initial motion field are fully exploited. By effectively fusing the multi-channel feature maps from DMCP, an attention-based fusion and reconstruction (AFR) module is designed to reconstruct the output block. The proposed DAMC-Net is integrated into VVC and the experimental results demonstrate that the proposed method considerably enhances the coding performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Bagged Tree and ResNet-Based Joint End-to-End Fast CTU Partition Decision Algorithm for Video Intra Coding.
- Author
-
Li, Yixiao, Li, Lixiang, Fang, Yuan, Peng, Haipeng, and Ling, Nam
- Subjects
VIDEO coding ,PARALLEL algorithms ,STATISTICAL decision making ,BLOCK codes ,DECISION trees ,VIDEO codecs ,DECISION making - Abstract
Video coding standards, such as high-efficiency video coding (HEVC), versatile video coding (VVC), and AOMedia video 2 (AV2), achieve an optimal encoding performance by traversing all possible combinations of coding unit (CU) partition and selecting the combination with the minimum coding cost. It is still necessary to further reduce the encoding time of HEVC, because HEVC is one of the most widely used coding standards. In HEVC, the process of searching for the best performance is the source of most of the encoding complexity. To reduce the complexity of the coding block partition in HEVC, a new end-to-end fast algorithm is presented to aid the partition structure decisions of the coding tree unit (CTU) in intra coding. In the proposed method, the partition structure decision problem of a CTU is solved by a novel two-stage strategy. In the first stage, a bagged tree model is employed to predict the splitting of a CTU. In the second stage, the partition problem of a 32 × 32-sized CU is modeled as a 17-output classification task for the first time, so that it can be solved by a single prediction. To achieve a high prediction accuracy, a residual network (ResNet) with 34 layers is employed. Jointly using bagged tree and ResNet, the proposed fast CTU partition algorithm is able to generate the partition quad-tree structure of a CTU through an end-to-end prediction process, which abandons the traditional scheme of making multiple decisions at various depth levels. In addition, several datasets are used in this paper to lay the foundation for high prediction accuracy. Compared with the original HM16.7 encoder, the experimental results show that the proposed algorithm can reduce the encoding time by 60.29% on average, while the Bjøntegaard delta rate (BD-rate) loss is as low as 2.03%, which outperforms the results of most of the state-of-the-art approaches in the field of fast intra CU partition. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. TSAN: Synthesized View Quality Enhancement via Two-Stream Attention Network for 3D-HEVC.
- Author
-
Pan, Zhaoqing, Yu, Weijie, Lei, Jianjun, Ling, Nam, and Kwong, Sam
- Subjects
DATA mining ,CONVOLUTIONAL neural networks ,VIDEO coding ,CONTEXTUAL learning ,MACHINE learning - Abstract
In three-dimensional video system, the texture and depth videos are jointly encoded, and then the Depth Image Based Rendering (DIBR) is utilized to realize view synthesis. However, the compression distortion of texture and depth videos, as well as the disocclusion problem in DIBR degrade the visual quality of the synthesized view. To address this problem, a Two-stream Attention Network (TSAN)-based synthesized view quality enhancement method is proposed for 3D-High Efficiency Video Coding (3D-HEVC) in this article. First, the shortcomings of the view synthesis technique and traditional convolutional neural networks are analyzed. Then, based on these analyses, a TSAN with two information extraction streams is proposed for enhancing the quality of the synthesized view, in which the global information extraction stream learns the contextual information, and the local information extraction stream extracts the texture information from the rendered image. Third, a Multi-Scale Residual Attention Block (MSRAB) is proposed, which can efficiently detect features in different scales, and adaptively refine features by considering interdependencies among spatial dimensions. Extensive experimental results show that the proposed synthesized view quality enhancement method achieves significantly better performance than the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Deep Multi-Domain Prediction for 3D Video Coding.
- Author
-
Lei, Jianjun, Shi, Yanan, Pan, Zhaoqing, Liu, Dong, Jin, Dengchao, Chen, Ying, and Ling, Nam
- Subjects
VIDEO coding ,CONVOLUTIONAL neural networks ,FORECASTING - Abstract
Three-dimensional (3D) video contains plentiful multi-domain correlations, including spatial, temporal, and inter-view correlations. In this paper, a deep multi-domain prediction method is proposed for 3D video coding. Different from previous methods, our proposed method utilizes not only spatial and temporal correlations but also inter-view correlation to obtain a more accurate prediction, and adopts deep convolutional neural networks to effectively fuse multi-domain references. More specifically, a hierarchical prediction mechanism, which includes a spatial-temporal prediction network and a multi-domain prediction network, is designed to overcome the fusion difficulty of multi-domain reference information. Furthermore, a progressive spatial-temporal prediction network and a multi-scale multi-domain prediction network are designed to obtain the spatial-temporal prediction result and multi-domain prediction result respectively. Experimental results show that the proposed method achieves considerable bitrate saving compared with 3D-HEVC. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. Low-Complexity Rate-Distortion Optimization for HEVC Encoders.
- Author
-
Huang, Bo, Chen, Zhifeng, Su, Kaixiong, Chen, Jian, and Ling, Nam
- Subjects
VIDEO coding ,CHANNEL coding ,COMPUTATIONAL complexity ,VIDEO compression ,COST estimates - Abstract
Typical high efficiency video coding (HEVC) encoders use rate-distortion optimization (RDO) to select the best coding parameters among numerous candidates to achieve high compression efficiency. However, calculating the rate-distortion cost for each coding parameter entails high computational complexity, which limits the real-time application of these encoders. In this study, time-consuming processes, such as reconstruction and entropy coding, are removed during the calculation of the full RD cost to reduce the complexity of the encoder. Instead, RD costs are estimated for transformed coefficients by using the RD cost model, which is made up of a distortion model based on the distribution features of the transformed coefficients, a residual rate model based on the characteristics of the entropy coding module, and a simple and convenient header rate model. In addition to the full RD cost, typical encoders also adopt fast RD cost, which is composed of the sum of absolute transform differences (SATD) and header bit. To further reduce encoding complexity, three simplified SATD (SSATD) methods based on DCT and DST features are proposed to serve as substitutes for SATD in fast RD cost, thus reducing the computational complexity of SATD calculation. Experimental results show that 35.53%, 14.57%, and 14.37% of coding time can be reduced respectively under AI, RA and LDP configurations with negligible RD performance loss when applying the proposed RD cost model and the SSATD methods to HM16.7. Besides, the proposed methods also can reduce 4%–9% of encoding time and exhibits the same RD performance as state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
14. Quantization Parameter Cascading for Surveillance Video Coding Considering All Inter Reference Frames.
- Author
-
Gong, Yanchao, Yang, Kaifang, Liu, Ying, Lim, Keng-Pang, Ling, Nam, and Wu, Hong Ren
- Subjects
VIDEO surveillance ,VIDEO coding ,STREAMING media - Abstract
Video surveillance and its applications have become increasingly ubiquitous in modern daily life. In video surveillance system, video coding as a critical enabling technology determines the effective transmission and storage of surveillance videos. In order to meet the real-time or time-critical transmission requirements of video surveillance systems, the low-delay (LD) configuration of the advanced high efficiency video coding (HEVC) standard is usually used to encode surveillance videos. The coding efficiency of the LD configuration is closely related to the quantization parameter (QP) cascading technique which selects or determines the QPs for encoding. However, the quantization parameter cascading (QPC) technique currently adopted for the LD configuration in HEVC test model (i.e., HM) is not optimized since it has not taken full account of the reference dependency in coding. In this paper, an efficient QPC technique for surveillance video coding, referred to as QPC-SV, is proposed, considering all inter reference frames under the LD configuration. Experimental results demonstrate the efficacy of the proposed QPC-SV. Compared with the default configuration of QPC in the HM, the QPC-SV achieves significant rate-distortion performance gain with average BD-rates of −9.35% and −9.76% for the LDP and LDB configurations, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
15. Recursive Residual Convolutional Neural Network- Based In-Loop Filtering for Intra Frames.
- Author
-
Zhang, Shufang, Fan, Zenghui, Ling, Nam, and Jiang, Minqiang
- Subjects
CONVOLUTIONAL neural networks ,VIDEO coding ,SIGNAL filtering ,FILTERS & filtration - Abstract
Although the in-loop filtering incorporated in High Efficiency Video Coding (HEVC) standard improves the subjective quality of reconstructed pictures and increases the compression efficiency, it still cannot satisfy the demand for higher quality in the rapid growth of video usage. In this paper, we propose recursive residual convolution neural network (RRCNN)-based in-loop filtering to further improve the quality of reconstructed intra frames while reducing the bitrates. Specifically, RRCNN estimates the residual images between the compressed distorted images and original noncompressed ones, and there are shortcut connections that skip a few stacked layers in the structure of RRCNN to ease the training difficulty. By applying the same set of weights recursively, RRCNN achieves excellent performance while utilizing far fewer parameters. For concise in-loop filtering, we train a single model capable of handling various bitrate settings. Different networks for the filtering of luma and chroma components are designed respectively to better learn the filtering characteristics of different channels. Moreover, to fully adapt the various input videos and boost the performance, a coding tree unit (CTU) control flag is signaled to indicate the filtering method from the sense of rate-distortion optimization (RDO). Extensive experimental results show that our scheme achieves significant bitrate savings compared to HEVC, leading to on average 8.7% BD-rate reduction, with up to a 15.1% BD-rate reduction for luma, and more than 20% BD-rate reductions for chroma on average. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
16. A Reconfigurable Architecture for Discrete Cosine Transform in Video Coding.
- Author
-
Zheng, Mingkui, Zheng, Jingyi, Chen, Zhifeng, Wu, Linhuang, Yang, Xiuzhi, and Ling, Nam
- Subjects
DISCRETE cosine transforms ,VIDEO coding ,VIDEO codecs ,GATE array circuits - Abstract
Discrete cosine transform (DCT) is an indispensable module in video codecs and is a major part in many video coding standards including the latest high efficiency video coding (HEVC). As the video resolution increases, both transform sizes and the number of transforms increase continuously which poses challenges to the reusability design especially in hardware implementation. This paper presents reconfigurable transform architecture to flexibly support the reusability of different transform sizes. The proposed architecture maximally reuses the hardware resources by rearranging the order of input data for different transform sizes while still exploiting the butterfly property. Furthermore, this architecture supports reconfigurable throughput according to different hardware resource requirements. By applying the proposed architecture to the field-programmable gate array (FPGA) design of HEVC core transform matrices, the synthesis results show much lower consumption of hardware resources comparing to existing methods in the literature. The implementation in Altera’s Stratix III FPGA can operate at 139 MHz and supports real-time processing of $3840\times 2160$ ultra-high definition video at a minimum of 45 f/s and up to 359 f/s for different DCT sizes. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
17. Saturation-aware human attention region of interest algorithm for efficient video compression.
- Author
-
N'guessan, Sylvia O. and Ling, Nam
- Subjects
VIDEO compression software ,ALGORITHMS ,LUMINANCE (Video) ,STREAMING video & television ,ATTENTION control - Abstract
We propose a saturation-aware human attention region-of-interest (SA-HAROI) video compression method that performs a perceptual adaptive quantization algorithm on video frames as a function of the distribution of their luminance, motion vector, and color saturation. Our work is an application of a psycho-visual study that demonstrated that human attention automatically enhanced perceived saturation. Consequently, the adaptive quantization phase of our compression algorithm is characterized by a luminance and saturation-aware just noticeable distortion (JND) function. After running multiple experiments on 18 videos with various resolutions ranging from QCIF to 4 K, results showed that our method achieves higher compression than that of both the H.264/AVC JM and the HEVC HM while maintaining subjective quality. We observed that in comparison to both implementation of the standards (JM and HM), for an IPPP coding structure, the performance of our algorithm culminated with HD and 4 K videos yielding a bit rate reduction averaging 15% and an encoding time reduction of about 20% in certain cases. Finally, after comparing our method to other similar techniques, we concluded that saturation is a significant parameter in the improvement of video compression. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
18. Region Adaptive R- $\lambda$ Model-Based Rate Control for Depth Maps Coding.
- Author
-
Lei, Jianjun, He, Xiaoxu, Yuan, Hui, Wu, Feng, Ling, Nam, and Hou, Chunping
- Subjects
DEPTH maps (Digital image processing) ,ALGORITHMS ,IMAGE processing ,ARTIFICIAL intelligence ,VIDEO codecs - Abstract
In this paper, a novel rate-control algorithm based on the region adaptive R- $\lambda $ model is proposed for depth maps coding. First, in order to obtain an accurate rate control for depth maps coding, a modified frame level bit allocation method based on coding bits statistical distribution of depth maps is proposed. Second, considering that different areas in a depth map have an imparity effect on virtual view rendering, the blocks of the depth map are divided into two types, namely, interested blocks for virtual view rending (IBV) and noninterested blocks for virtual view rending (NIBV). Then, two different R- $\lambda $ models are derived for IBV and NIBV, respectively. The optimal bitrates for IBV and NIBV are determined by solving an optimization problem. After that, based on the regional R- $\lambda $ models, the optimal Lagrange multipliers are calculated for both IBV and NIBV. Finally, the largest coding unit (LCU) level rate control is performed by adaptively adjusting the Lagrange multiplier to avoid blocking artifacts and smooth the quality of coding. Experimental results demonstrate that the proposed method can achieve considerable BD-PSNR gains compared with the unified rate-quantization model and conventional R- $\lambda $ model-based algorithms in terms of rendered virtual views quality. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Fast Mode Decision Based on Grayscale Similarity and Inter-View Correlation for Depth Map Coding in 3D-HEVC.
- Author
-
Lei, Jianjun, Duan, Jinhui, Wu, Feng, Ling, Nam, and Hou, Chunping
- Subjects
HIGH definition video recording ,VISUAL communication ,DEPTH maps (Digital image processing) ,VIDEO codecs ,COMPUTER engineering - Abstract
The 3D extension of High Efficiency Video Coding significantly improves the coding efficiency of 3D video at the expense of computational complexity. This paper presents a novel fast mode decision algorithm for depth map coding based on the grayscale similarity and inter-view correlation. First, depth map grayscale similarity is adopted to judge whether the reference frame could assist the coding of the current frame. When the difference in the average grayscale between the co-located coding unit (CU) and the current CU is smaller than the similarity threshold, the depth level of the current CU will be restricted by that of the coded reference CU. Second, the grayscale similarity and inter-view correlation are jointly used for dependent views to achieve early decision on the best prediction unit (PU) mode. The mode decision procedure will be determined early when the co-located CU, which has a grayscale similarity with the current CU, selects Merge or Inter $2N \times 2N$ as the best prediction mode. Moreover, when the corresponding CU in the independent view selects Merge or Inter $2N \times 2N$ as the best prediction mode, the current CU will skip other PU modes checking based on the strong inter-view correlation. Finally, different strategies are proposed for the P-frames and B-frames of dependent views in view of the characteristics of different prediction structures. For B frames, the PU mode information of the coded independent view is utilized as reference to skip the unnecessary mode decision processes. For P frames, the spatial–temporal correlation is considered in the process of early mode decision to determine whether to choose the Merge mode or Inter $2N \times 2N$ as the best mode. Experimental results show that our proposed scheme achieves considerable time saving with negligible degradation of coding performance. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
20. Motion and Structure Information Based Adaptive Weighted Depth Video Estimation.
- Author
-
Lei, Jianjun, Liu, Jianying, Zhang, Hailong, Gu, Zhouye, Ling, Nam, and Hou, Chunping
- Subjects
VIDEO coding ,IMAGE analysis ,IMAGE quality analysis ,DATA structures ,TEMPORAL databases - Abstract
This paper presents a novel depth video estimation method, which improves estimation property and refines the temporal consistency and spatial accuracy at the same time. The main idea is to incorporate more useful motion and structure information into depth estimation. First, an adaptive weight is calculated based on the motion information of adjacent frames and attached to a temporal term to update the energy function, thus reducing the matching errors and improving temporal consistency. Second, the depth continuity/discontinuity in spatial domain is properly judged by combining the edges of initial depth map and color segmentation, and used in the refinement strategy. Finally, we evaluate the performance of our algorithm with several public stereoscopic video sequences. Experimental results show that the qualities of depth video and synthesized virtual view are all improved. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
21. Fast merge mode decision for diamond search in High Efficiency Video Coding.
- Author
-
Kim, Miok, Lee, Hyuk-Jae, and Ling, Nam
- Abstract
This paper proposes a merge mode decision algorithm while maintaining the accuracy of a diamond search (DS) in motion estimation and compensation. In High Efficiency Video Coding (HEVC), the merge mode is used to reduce the bit rate in order to carry motion information. The rate-distortion (RD) cost of the merge mode is compared with the RD cost of the inter-prediction mode in the course of the motion estimation which can be terminated early when the merge cost is smaller than the estimated cost of the inter-prediction mode. To this end, this paper proposes a fast merge mode prediction algorithm when the DS is used for motion estimation. The main purpose of this work is to estimate the RD cost of the merge mode in advance by utilizing the distortion information of the RD cost of the motion vector prediction (MVP) so as to early terminate the motion search operation. Experimental results show that the proposed fast merge mode estimation achieves a comparable RD performance but reduces the amount of computation by 16.6% on the average when compared to the fast motion estimation with the diamond search implemented in the HM 8.0 reference software. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
22. Non-Delaunay hierarchical mesh-based motion estimation and compensation for Wavelet Video coding.
- Author
-
Kim, Miok, Ling, Nam, Ralston, John D., and Saunders, Steven E.
- Abstract
In this paper, we propose a non-Delaunay hierarchical mesh-based motion estimation and compensation technique over the spatial domain for wavelet video coding. In particular, we concentrate on reducing high-band signals and corresponding motion estimation errors in order to improve coding efficiency. We also analyze the trade-off between rate-distortion and computational complexity with a variety of video test sequences. Finally, experimental results show that a non-Delaunay hierarchical mesh-based method can give good performance at low computational complexity in a wavelet video codec. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
23. H.264/Advanced Video Control Perceptual Optimization Coding Based on JND-Directed Coefficient Suppression.
- Author
-
Luo, Zhengyi, Song, Li, Zheng, Shibao, and Ling, Nam
- Subjects
DISCRETE cosine transforms ,VIDEO coding ,DATA visualization ,SIGNAL quantization ,ENCODING ,LAGRANGE multiplier - Abstract
The field of video coding has been exploring the compact representation of video data, where perceptual redundancies in addition to signal redundancies are removed for higher compression. Many research efforts have been dedicated to modeling the human visual system's characteristics. The resulting models have been integrated into video coding frameworks in different ways. Among them, coding enhancements with the just noticeable distortion (JND) model have drawn much attention in recent years due to its significant gains. A common application of the JND model is the adjustment of quantization by a multiplying factor corresponding to the JND threshold. In this paper, we propose an alternative perceptual video coding method to improve upon the current H.264/advanced video control (AVC) framework based on an independent JND-directed suppression tool. This new tool is capable of finely tuning the quantization using a JND-normalized error model. To make full use of this new rate distortion adjustment component the Lagrange multiplier for rate distortion optimization is derived in terms of the equivalent distortion. Because the H.264/AVC integer discrete cosine transform (DCT) is different from classic DCT, on which state-of-the-art JND models are computed, we analytically derive a JND mapping formula between the integer DCT domain and the classic DCT domain which permits us to reuse the JND models in a more natural way. In addition, the JND threshold can be refined by adopting a saliency algorithm in the coding framework and we reduce the complexity of the JND computation by reusing the motion estimation of the encoder. Another benefit of the proposed scheme is that it remains fully compliant with the existing H.264/AVC standard. Subjective experimental results show that significant bit saving can be obtained using our method while maintaining a similar visual quality to the traditional H.264/AVC coded video. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
24. Improved H.264 rate control by enhanced MAD-based frame complexity prediction
- Author
-
Yi, Xiaoquan and Ling, Nam
- Subjects
- *
DIGITAL video , *RATES , *FRAMES (Video) , *MULTIMEDIA systems , *DIGITAL media , *DIGITAL technology - Abstract
This paper presents a revised rate control scheme based on an improved frame complexity measure. Rate control adopted by both MPEG-4 VM18 and H.264/AVC use a quadratic rate–distortion (R–D) model that determines quantization parameters (QPs). Classical quadratic R–D model is suitable for MPEG-4 but it performs poorly for H.264/AVC because one of the important parameters, mean absolute difference (MAD), is predicted through a linear model, whereas the MAD used in MPEG-4 VM18 is the actual MAD. Inaccurately predicted MAD results in wrong QP and consequently degrades rate–distortion optimization (RDO) performance in H.264. To overcome the limitation of the existing rate control schemes, we introduce an enhanced linear model for predicting MAD, utilizing some knowledge of current frame complexity. Moreover, we propose a more accurate frame complexity measure, namely, normalized MAD, to replace the current use of MAD parameter. Normalized MAD has a stronger correlation with optimally allocated bits than that of the predicted MAD. To minimize video quality variations, we also propose a novel long-term QP limiter (LTQPL). Finally, a dynamic bit allocation scheme among basic units is implemented. Extensive simulation results show that our method, with inexpensive computational complexity added, improves the average peak signal-to-noise ratio (PSNR) and reduces video quality variations considerably. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
25. On Enhancing H.264/AVC Video Rate Control by PSNR-Based Frame Complexity Estimation.
- Author
-
Jiang, Minqiang and Ling, Nam
- Subjects
- *
DIGITAL video , *PERFORMANCE , *COMPUTER engineering , *GEOMETRIC quantization , *DIFFERENTIAL geometry , *STANDARD deviations - Abstract
In this paper, we presents a PSNR-based frame complexity estimation to improve H.264/AVC rate control. Our scheme is based on adding PSNR-based complexity estimation to the existing mean absolute difference based (MAD-based) complexity measure to form combined frame complexity estimation. The aim is to allocate bits more accurately, especially for frames with scene changes and high motions. Bit allocation to each frame is not just computed by encoder buffer status but also by a combined frame complexity measure. The quadratic rate-quantization (R-Q) model is used to determine the quantization parameter (QP) for each frame and the computed QP is further adjusted in two special cases. We also propose a frame skipping decision scheme to improve the performance at high frame rates. Simulation results show that the H.264 coder, using our proposed scheme, achieves better visual performances with similar or smaller PSNR deviations when compared to that of the recent JM8.6 rate control. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
26. Introduction to the Issue on Video Coding: HEVC and Beyond.
- Author
-
He, Yun, Ostermann, Joern, Domanski, Marek, Au, Oscar C., and Ling, Nam
- Abstract
The 19 papers in this issue focus on High Efficiency Video Coding. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
27. Low complexity Bi-Partition mode selection for 3D video depth intra coding.
- Author
-
Gu, Zhouye, Zheng, Jianhua, Ling, Nam, and Zhang, Philipp
- Subjects
- *
VIDEO coding , *COMPUTATIONAL complexity , *THREE-dimensional display systems , *COMPUTER algorithms , *BIT rate - Abstract
This paper proposes a fast mode decision algorithm for 3D High Efficiency Video Coding (3D-HEVC) depth intra coding. In the current 3D-HEVC design, it is observed that for most of the cases, full rate-distortion (RD) cost search of Bi-Partition mode could be skipped since most coding units (CUs) of depth map are very flat or smooth while Bi-Partition modes are designed for CUs with edge or sharp transition. Using the rough RD cost value calculated by HEVC Rough Mode Decision as a selection threshold, we propose a fast Bi-Partition modes selection algorithm to speed up the encoding process. The test result for the proposed fast algorithm reports a 34.4% encoding time saving with a 0.3% bitrate increase on synthesized views for the All-Intra test case and negligible impact under the random access test case. Moreover, by simply varying the selection threshold, we can make a tradeoff between encoding time saving and bitrate loss based on the requirement of different applications. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
28. Long-term rate control for concurrent multipath real-time video transmission in heterogeneous wireless networks.
- Author
-
Chen, Feng, Zhang, Jie, Zheng, Mingkui, Wu, Jiyan, and Ling, Nam
- Subjects
- *
STREAMING video & television , *LYAPUNOV exponents , *DYNAMIC programming , *PROBLEM solving , *MULTIPATH channels , *STREAM Control Transmission Protocol (Computer network protocol) - Abstract
• A long-term quality model for rate control in concurrent MVT is proposed. • Lyapunov-based dynamic programming algorithm is developed to solve our problem. • Long-term video quality is improved without reducing the delay performance. Concurrent multipath transmission provides an effective solution for streaming high-quality mobile videos in heterogeneous wireless networks. Rate control is commonly adopted in multimedia communication systems to fully utilize the available network bandwidth. This paper proposes a novel rate control for concurrent multipath video transmission. The existing rate control algorithms mainly adapt bit rate in the short-term pattern, i.e., without considering the long-term video transmission quality. We propose a long-term rate control scheme that takes into account the status of both the transmission buffer and video frames. First, a mathematical model is developed to formulate the non-convex problem of long-term quality maximization. Second, we develop a dynamic programming solution for online encoding bit rate control based on buffer status. The performance evaluation is conducted in a real test bed over LTE and Wi-Fi networks. Experimental results demonstrate that the proposed long-term rate control scheme achieves appreciable improvements over the short-term rate control schemes in terms of video quality and delay performance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.