Back to Search Start Over

Rate–Distortion–Perception Optimized Neural Speech Transmission System for High-Fidelity Semantic Communications †.

Authors :
Yao, Shengshi
Xiao, Zixuan
Niu, Kai
Source :
Sensors (14248220); May2024, Vol. 24 Issue 10, p3169, 11p
Publication Year :
2024

Abstract

We consider the problem of learned speech transmission. Existing methods have exploited joint source–channel coding (JSCC) to encode speech directly to transmitted symbols to improve the robustness over noisy channels. However, the fundamental limit of these methods is the failure of identification of content diversity across speech frames, leading to inefficient transmission. In this paper, we propose a novel neural speech transmission framework named NST. It can be optimized for superior rate–distortion–perception (RDP) performance toward the goal of high-fidelity semantic communication. Particularly, a learned entropy model assesses latent speech features to quantify the semantic content complexity, which facilitates the adaptive transmission rate allocation. NST enables a seamless integration of the source content with channel state information through variable-length joint source–channel coding, which maximizes the coding gain. Furthermore, we present a streaming variant of NST, which adopts causal coding based on sliding windows. Experimental results verify that NST outperforms existing speech transmission methods including separation-based and JSCC solutions in terms of RDP performance. Streaming NST achieves low-latency transmission with a slight quality degradation, which is tailored for real-time speech communication. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14248220
Volume :
24
Issue :
10
Database :
Complementary Index
Journal :
Sensors (14248220)
Publication Type :
Academic Journal
Accession number :
177490337
Full Text :
https://doi.org/10.3390/s24103169