1. Using LSDB to enable large-scale catalog distribution, cross-matching, and analytics
- Author
-
Caplar, Neven, Beebe, Wilson, Branton, Doug, Campos, Sandro, Connolly, Andrew, DeLucchi, Melissa, Jones, Derek, Juric, Mario, Kubica, Jeremy, Malanchev, Konstantin, Mandelbaum, Rachel, and McGuire, Sean
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
The Vera C. Rubin Observatory will generate an unprecedented volume of data, including approximately 60 petabytes of raw data and around 30 trillion observed sources, posing a significant challenge for large-scale and end-user scientific analysis. As part of the LINCC Frameworks Project we are addressing these challenges with the development of the HATS (Hierarchical Adaptive Tiling Scheme) format and analysis package LSDB (Large Scale Database). HATS partitions data adaptively using a hierarchical tiling system to balance the file sizes, enabling efficient parallel analysis. Recent updates include improved metadata consistency, support for incremental updates, and enhanced compatibility with evolving datasets. LSDB complements HATS by providing a scalable, user-friendly interface for large catalog analysis, integrating spatial queries, crossmatching, and time-series tools while utilizing Dask for parallelization. We have successfully demonstrated the use of these tools with datasets such as ZTF and Pan-STARRS data releases on both cluster and cloud environments. We are deeply involved in several ongoing collaborations to ensure alignment with community needs, with future plans for IVOA standardization and support for upcoming Rubin, Euclid and Roman data. We provide our code and materials at lsdb.io., Comment: 4 pages, 2 figures, Proceedings of XXXIV Astronomical Data Analysis Software & Systems (ADASS) conference, November 10-14 2024, Valletta, Malta
- Published
- 2025