1. Unsupervised Blind Source Separation with Variational Auto-Encoders
- Author
-
Neri, Julian, Badeau, Roland, Depalle, Philippe, McGill University = Université McGill [Montréal, Canada], Département Images, Données, Signal (IDS), Télécom ParisTech, Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, and Badeau, Roland
- Subjects
Computer Science::Machine Learning ,unmixing ,Bayesian inference ,latent variable model ,020206 networking & telecommunications ,02 engineering and technology ,Statistics::Machine Learning ,030507 speech-language pathology & audiology ,03 medical and health sciences ,ComputingMethodologies_PATTERNRECOGNITION ,blind source separation ,universal sound separation ,0202 electrical engineering, electronic engineering, information engineering ,0305 other medical science ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing - Abstract
International audience; Supervised source separation requires expensive synthetic datasets containing clean, ground truth-source signals, while unsupervised separation requires only data mixtures. Existing unsupervised methods still use supervision to avoid over-separation and compete with fully supervised methods. We present a new method of completely unsupervised single-channel blind source separation, based on variational auto-encoding, that automatically learns the correct number of sources in data mixtures and quantitatively outperforms the existing methods. A deep inference network disentangles (separates) data mixtures into low-dimensional latent source variables. A deep generative network individually decodes each latent source into its source signal, such that their sum represents the given mixture. Qualitative and quantitative results from separation experiments on pairs of randomly mixed MNIST handwritten digits and mixed audio spectrograms demonstrate that our method outperforms stateof-the-art unsupervised and semi-supervised methods, showing promise as a solution to this long-standing problem in computer vision and audition.
- Published
- 2021
- Full Text
- View/download PDF