Back to Search Start Over

Phase recovery with Bregman divergences for audio source separation

Authors :
Pierre-Hugo Vial
Cédric Févotte
Paul Magron
Thomas Oberlin
Signal et Communications (IRIT-SC)
Institut de recherche en informatique de Toulouse (IRIT)
Université Toulouse Capitole (UT Capitole)
Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J)
Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3)
Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI)
Université Toulouse - Jean Jaurès (UT2J)
Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3)
Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole)
Université de Toulouse (UT)
Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO)
Centre National de la Recherche Scientifique (CNRS)
ANR-19-P3IA-0004,ANITI,Artificial and Natural Intelligence Toulouse Institute(2019)
European Project: CoG-6681839,ERC FACTORY
Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3)
Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3)
Université Fédérale Toulouse Midi-Pyrénées
Université Toulouse 1 Capitole (UT1)
Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3)
Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)
Centre National de la Recherche Scientifique - CNRS (FRANCE)
Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
Institut Supérieur de l'Aéronautique et de l'Espace - ISAE-SUPAERO (FRANCE)
Université Toulouse III - Paul Sabatier - UT3 (FRANCE)
Université Toulouse - Jean Jaurès - UT2J (FRANCE)
Université Toulouse 1 Capitole - UT1 (FRANCE)
Source :
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2021, Toronto, Canada, ICASSP
Publication Year :
2021
Publisher :
HAL CCSD, 2021.

Abstract

International audience; Time-frequency audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a phase recovery algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has shown good performance in several recent works. This algorithm minimizes a quadratic reconstruction error between magnitude spectrograms. However, this loss does not properly account for some perceptual properties of audio, and alternative discrepancy measures such as beta-divergences have been preferred in many settings. In this paper, we propose to reformulate phase recovery in audio source separation as a minimization problem involving Bregman divergences. To optimize the resulting objective, we derive a projected gradient descent algorithm. Experiments conducted on a speech enhancement task show that this approach outperforms MISI for several alternative losses, which highlights their relevance for audio source separation applications.

Details

Language :
English
Database :
OpenAIRE
Journal :
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2021, Toronto, Canada, ICASSP
Accession number :
edsair.doi.dedup.....c205767866ab2d87d1884271095485b0