Back to Search
Start Over
Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise.
- Source :
-
SIAM journal on mathematics of data science [SIAM J Math Data Sci] 2021; Vol. 3 (1), pp. 388-413. Date of Electronic Publication: 2021 Mar 23. - Publication Year :
- 2021
-
Abstract
- A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate m <superscript>-1/2</superscript> , where m is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.
Details
- Language :
- English
- ISSN :
- 2577-0187
- Volume :
- 3
- Issue :
- 1
- Database :
- MEDLINE
- Journal :
- SIAM journal on mathematics of data science
- Publication Type :
- Academic Journal
- Accession number :
- 34124607
- Full Text :
- https://doi.org/10.1137/20M1342124