Back to Search Start Over

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

Authors :
Sadok, Samir
Leglaive, Simon
Girin, Laurent
Richard, Gaël
Alameda-Pineda, Xavier
Publication Year :
2025

Abstract

This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within a single model. AnCoGen can analyze speech by estimating key attributes, such as speaker identity, pitch, content, loudness, signal-to-noise ratio, and clarity index. In addition, it can generate speech from these attributes and allow precise control of the synthesized speech by modifying them. Extensive experiments demonstrated the effectiveness of AnCoGen across speech analysis-resynthesis, pitch estimation, pitch modification, and speech enhancement.<br />Comment: 5 pages, https://samsad35.github.io/site-ancogen

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2501.05332
Document Type :
Working Paper