Back to Search Start Over

Towards a comprehensive visualisation of structure in large scale data sets

Authors :
Joan Garriga
Frederic Bartumeus
Source :
Machine Learning: Science and Technology, Vol 5, Iss 3, p 030503 (2024)
Publication Year :
2024
Publisher :
IOP Publishing, 2024.

Abstract

Dimensionality reduction methods are fundamental to the exploration and visualisation of large data sets. Basic requirements for unsupervised data exploration are flexibility and scalability. However, current methods have computational limitations that restrict our ability to explore data structures to the lower range of scales. We focus on t-SNE and propose a chunk-and-mix protocol that enables the parallel implementation of this algorithm, as well as a self-adaptive parametric scheme that facilitates its parametric configuration. As a proof of concept, we present the pt-SNE algorithm, a parallel version of Barnes-Hat-SNE (an $O\left(n\,\mathrm{log}\,n\right)$ implementation of t-SNE). In pt-SNE, a single free parameter for the size of the neighbourhood, namely the perplexity, modulates the visualisation of the data structure at different scales, from local to global. Thanks to parallelisation, the runtime of the algorithm remains almost independent of the perplexity, which extends the range of scales to be analysed. The pt-SNE converges to a good global embedding comparable to current solutions, although it adds little noise at the local scale. This noise illustrates an unavoidable trade-off between computational speed and accuracy. We expect the same approach to be applicable to faster embedding algorithms than Barnes-Hat-SNE, such as Fast-Fourier Interpolation-based t-SNE or Uniform Manifold Approximation and Projection, thus extending the state of the art and allowing a more comprehensive visualisation and analysis of data structures.

Details

Language :
English
ISSN :
26322153
Volume :
5
Issue :
3
Database :
Directory of Open Access Journals
Journal :
Machine Learning: Science and Technology
Publication Type :
Academic Journal
Accession number :
edsdoj.8e75a2a3a454753ae11d7f1b20a9396
Document Type :
article
Full Text :
https://doi.org/10.1088/2632-2153/ad6fea