1. Sfaira accelerates data and model reuse in single cell genomics
- Author
-
David S. Fischer, Leander Dony, Martin König, Abdul Moeed, Luke Zappia, Lukas Heumos, Sophie Tritschler, Olle Holmberg, Hananeh Aliee, and Fabian J. Theis
- Subjects
Single-cell genomics ,Data zoo ,Model zoo ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.
- Published
- 2021
- Full Text
- View/download PDF