10 results on '"Learned video compression"'
Search Results
2. Adaptive Prediction Structure for Learned Video Compression.
- Author
-
Yang, Jiayu, Zhai, Yongqi, Jiang, Wei, Yang, Chunhui, Gao, Feng, and Wang, Ronggang
- Subjects
SMART structures ,VIDEO compression ,VIDEO coding ,ALGORITHMS ,FORECASTING ,ENCODING - Abstract
Learned video compression has developed rapidly and shown competitive rate-distortion performance compared with the latest traditional video coding standard H.266 (VVC). However, existing works were restricted to fixed prediction direction and GoP size. The inflexibility on prediction structure hinders learned video compression towards optimal compression efficiency in diverse motion scenarios. In this article, we propose to advance learned video compression with adaptive prediction structure decision. Specifically, we propose a unified compression framework that supports both forward prediction and bi-directional prediction. The framework can flexibly switch to different prediction direction to achieve better prediction performance. Meanwhile, we propose a low-complexity prediction structure decision algorithm, where prediction direction and GoP size are adaptively determined based on motion complexity to achieve optimal compression efficiency. Experimental results demonstrate that the proposed unified framework with adaptive decision algorithm improves compression efficiency of pure forward prediction-based or bi-directional prediction-based framework with neglectable (\(0.9\%\)) encoding time increment. Meanwhile, it achieves comparable compression performance with VVC and recent learned video coding methods. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
3. Bit-rate aware effective inter-layer motion prediction using multi-loop encoding structure.
- Author
-
Siddaramappa, Sandeep Gowdra and Mamatha, Gowdra Shivanandappa
- Subjects
VIDEO compression ,INTERNET content ,DEEP learning ,STREAMING video & television ,VIDEO coding ,CODECS - Abstract
Recently, there has been a notable increase in the use of video content on the internet, leading for the creation of improved codecs like versatile-videocoding (VVC) and high-efficiency video-coding (HEVC). It is important to note that these video coding techniques continue to demonstrate quality degradation and the presence of noise throughout the decoded frames. A number of deep-learning (DL) algorithm-based network structures have been developed by experts to tackle this problem; nevertheless, because many of these solutions use in-loop filtration, extra bits must be sent among the encoding and decoding layers. Moreover, because they used fewer reference frames, they were unable to extract significant features by taking advantage from the temporal connection between frames. Hence, this paper introduces inter-layer motion prediction aware multi-loop video coding (ILMPA-MLVC) techniques. The ILMPA-MLVC first designs an multiloop adaptive encoder (MLAE) architecture to enhance inter-layer motion prediction and optimization process; second, this work designs multi-loop probabilistic-bitrate aware compression (MLPBAC) model to attain improved bitrate efficiency with minimal overhead; the training of ILMPAMLVC is done through novel distortion loss function using UVG dataset; the result shows the proposed ILMPA-MLVC attain improved peak-singalto-noise-ratio (PSNR) and structural similarity (SSIM) performance in comparison with existing video coding techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
4. Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement.
- Author
-
Yang, Jiayu, Yang, Chunhui, Xiong, Fei, Zhai, Yongqi, and Wang, Ronggang
- Subjects
OPTICAL flow ,SIGNAL-to-noise ratio ,ENTROPY - Abstract
Learned video compression has drawn great attention and shown promising compression performance recently. In this article, we focus on the two components in the learned video compression framework, the conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion, and residual compression, which introduces a temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. In addition, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce the bitrate cost of residual coding. The module reuses decoded optical flow as a motion prior and utilizes deformable convolution to mine high-quality information from the reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual coding–based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in the low-delay scenario compared with recent learning-based methods and traditional H.265/HEVC in terms of Peak Signal-to-Noise Ratio (PSNR) and Multi-Scale Structural Similarity Index (MS-SSIM). The code is available at OpenLVC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. HDVC: Deep Video Compression With Hyperprior-Based Entropy Coding
- Author
-
Yusong Hu, Cheolkon Jung, Qipu Qin, Jiang Han, Yang Liu, and Ming Li
- Subjects
Hyperprior ,entropy coding ,learned video compression ,deep learning ,end-to-end ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In this paper, we propose deep video compression with hyperprior-based entropy coding, named HDVC. The proposed method is based on the deep video compression (DVC) framework that replaces traditional block-based video compression with end-to-end video compression based on deep learning, aiming to improve compression efficiency and reduce computational complexity while maintaining visual quality. Based on the DVC framework, we introduce hyperprior-based entropy coding into motion compression and optimize motion vector estimation (i.e. optical flow estimation) using window attention and fast residual channel attention. Moreover, we introduce residual channel attention intermediate module into both encoding and decoding to enhance residuals and the quality of reconstructed frames. We adopt hyperprior-based entropy coding in residual compression to model feature distribution. Besides, we use learned image compression for intraframe coding based on fast residual channel attention network to generate reference frames. Experimental results show that the proposed method achieves better PSNR and MS-SSIM performance than both traditional block-based and recent deep learning-based video compression methods on UVG dataset.
- Published
- 2024
- Full Text
- View/download PDF
6. Learned Video Compression With Efficient Temporal Context Learning.
- Author
-
Jin, Dengchao, Lei, Jianjun, Peng, Bo, Pan, Zhaoqing, Li, Li, and Ling, Nam
- Subjects
- *
IMAGE compression , *VIDEO coding , *CODECS , *VIDEO compression , *SIGNALS & signaling - Abstract
In contrast to image compression, the key of video compression is to efficiently exploit the temporal context for reducing the inter-frame redundancy. Existing learned video compression methods generally rely on utilizing short-term temporal correlations or image-oriented codecs, which prevents further improvement of the coding performance. This paper proposed a novel temporal context-based video compression network (TCVC-Net) for improving the performance of learned video compression. Specifically, a global temporal reference aggregation (GTRA) module is proposed to obtain an accurate temporal reference for motion-compensated prediction by aggregating long-term temporal context. Furthermore, in order to efficiently compress the motion vector and residue, a temporal conditional codec (TCC) is proposed to preserve structural and detailed information by exploiting the multi-frequency components in temporal context. Experimental results show that the proposed TCVC-Net outperforms public state-of-the-art methods in terms of both PSNR and MS-SSIM metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. IBVC: Interpolation-driven B-frame video compression.
- Author
-
Xu, Chenming, Liu, Meiqin, Yao, Chao, Lin, Weisi, and Zhao, Yao
- Subjects
- *
VIDEO compression , *BIT rate , *INTERPOLATION , *CODECS - Abstract
Learned B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. However, previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation or video frame interpolation. They suffer from inaccurate quantized motions and inefficient motion compensation. To address these issues, we propose a simple yet effective structure called Interpolation-driven B-frame Video Compression (IBVC). Our approach only involves two major operations: video frame interpolation and artifact reduction compression. IBVC introduces a bit-rate free MEMC based on interpolation, which avoids optical-flow quantization and additional compression distortions. Later, to reduce duplicate bit-rate consumption and focus on unaligned artifacts, a residual guided masking encoder is deployed to adaptively select the meaningful contexts with interpolated multi-scale dependencies. In addition, a conditional spatio-temporal decoder is proposed to eliminate location errors and artifacts instead of using MEMC coding in other methods. The experimental results on B-frame coding demonstrate that IBVC has significant improvements compared to the relevant state-of-the-art methods. Meanwhile, our approach can save bit rates compared with the random access (RA) configuration of H.266 (VTM). The code will be available at https://github.com/ruhig6/IBVC. • We design a simple yet effective pipeline for the learned B-frame video codec, which includes an interpolation-driven MEMC and an artifact reduction codec for B-frame compression. • The residual-guided masking encoder excels at precisely identifying intricate variances and interpolation artifacts, leading to a reduction in redundant bit rates within aligned areas. • The conditional spatio-temporal contextual decoder leverages prior conditions from historical frames to enhance temporal consistency and reduce interpolation artifacts. • The experimental results on B-frame coding demonstrate that IBVC has significant improvements compared to the relevant state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Learned video compression with channel-wise autoregressive entropy model.
- Author
-
Yu, Yang, He, Xiaohai, Wu, Xiaohong, Zhang, Tingrong, and Ren, Chao
- Subjects
- *
AUTOREGRESSIVE models , *VIDEO compression , *IMAGE compression , *SIGNAL-to-noise ratio , *PROBLEM solving - Abstract
Although learned image compression methods have achieved competitive rate-distortion performances, learned video compression remains challenging. The current mainstream learned video compression frameworks usually improve the motion prediction module to reduce the redundancy in video sequences. Although these methods can achieve a great compression ratio, they often ignore the improvement in the entropy model, which does not fully exploit the temporal and spatial characteristics in the video. Meanwhile, these methods always suffer from the error propagation problem. To solve those problems, we propose a learned video compression framework with channel-wise autoregressive entropy model. Our framework captures spatial–temporal dependencies through a powerful entropy model to reduce the redundancy in video sequences. In particular, we do not directly compress the frames in the pixel domain, with no need for the motion prediction module, avoiding the error propagation problem. To better utilize the temporal contexts of the previous frame, we propose the window temporal prior module. Experiments show that our proposed video compression framework achieves promising compression effects in terms of peak signal-to-noise ratio and multiscale structural similarity. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Multiple hypotheses based motion compensation for learned video compression.
- Author
-
Lin, Rongqun, Wang, Meng, Zhang, Pingping, Wang, Shiqi, and Kwong, Sam
- Subjects
- *
VIDEO compression , *HYPOTHESIS - Abstract
Recently, learned video compression has attracted copious research attention. However, among the existing methods, the motion used for alignment is limited to one hypothesis only, leading to inaccurate motion estimation, especially for the complicated scenes with complex movements. Motivated by multiple hypotheses philosophy in traditional video compression, we develop the multiple hypotheses based motion compensation for the learned video compression, in an effort to enhance the motion compensation efficiency by providing diverse hypotheses with efficient temporal information fusion. In particular, the multiple hypotheses module which produces multiple motions and warped features for mining sufficient temporal information, is proposed to provide various hypotheses inferences from the reference frame. To utilize these hypotheses more copiously, the hypotheses attention module is adopted by introducing the channel-wised squeeze-and-excitation layer and the multi-scale network. In addition, the context combination is employed to fuse the weighted hypotheses to generate effective contexts with powerful temporal priors. Finally, the valid contexts are used for promoting the compression efficiency by merging weighted warped features. Extensive experiments show that the proposed method can significantly improve the rate-distortion performance of learned video compression. Compared with the state-of-the-art method for end-to-end video compression, over 13% bit rate reductions on average in terms of PSNR and MS-SSIM can be achieved. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression
- Author
-
M. Akin Yilmaz, A. Murat Tekalp, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Yılmaz, Mustafa Akın, College of Engineering, and Department of Electrical and Electronics Engineering
- Subjects
Bidirectional control ,Image coding ,Video compression ,Motion compensation ,Optimization ,Entropy ,Video codecs ,Learned video compression ,Learned bi-directional motion compensation ,Flow field sub-sampling ,Flow vector prediction ,End-to-end optimization ,Computer science ,Artificial intelligence ,Engineering, electrical and electronic ,Computer Graphics and Computer-Aided Design ,Software - Abstract
Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/., Scientific and Technological Research Council of Turkey (TÜBİTAK); Turkish Academy of Sciences (TÜBA)
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.