201. Deep-PCAC: An End-to-End Deep Lossy Compression Framework for Point Cloud Attributes
- Author
-
Zhu Li, Li Li, Dong Liu, Zhiwei Xiong, Feng Wu, and Xihua Sheng
- Subjects
Computer science ,Point cloud ,Data_CODINGANDINFORMATIONTHEORY ,Lossy compression ,Autoencoder ,Computer Science Applications ,Convolution ,Feature (computer vision) ,Signal Processing ,Media Technology ,Entropy (information theory) ,Point (geometry) ,Electrical and Electronic Engineering ,Algorithm ,Block (data storage) - Abstract
Point clouds, consisting of geometry and attributes, are an emerging media format essential to the deployment of augmented reality applications. Additionally, the large data volume of point clouds poses severe challenges for efficient storage and transmission. In this paper, we borrow the idea of end-to-end deep-network-based image/video coding and propose the first--to our best knowledge--end-to-end deep framework for compressing point cloud attributes. Specifically, we propose a point cloud lossy attribute autoencoder, which directly encodes and decodes point cloud attributes with the help of geometry, instead of voxelizing or projecting the points. In the autoencoder, we propose a second-order point convolution that improves the general point convolution in previous point-based learning methods; the proposed convolution utilizes the spatial correlations between more points and the nonlinear relationship between attribute features. We also introduce a dense point-inception block, which derives from a combination of an inception-style block and a dense block, to improve feature propagation. In addition, we devise a multiscale loss to guide the autoencoder in focusing attention on the coarse-grained points with better coverage of the entire point cloud, which makes it easier for the autoencoder to obtain better optimization of the qualities of all points. Experimental results show that our proposed framework still has a performance gap compared with the state-of-the-art algorithms in the MPEG G-PCC reference software TMC13. However, it does outperform the region-adaptive hierarchical transform with a run-length Golomb-Rice entropy coder (RAHTRLGR), which is one of the core transforms used in TMC13 without many well-designed techniques that make TMC13 what it is today. It outperforms RAHT-RLGR by 2.63 dB, 1.77 dB, and 3.40 dB on average in terms of the Bjntegaard delta peak signal-to-noise ratio (BD-PSNR) for the Y, U, and V components on the test dataset. A subjective quality comparison demonstrates the advantages of our framework in preserving more local textures and reducing blocking and color noise artifacts.
- Published
- 2022