Author: "Morgan, Alexandra" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

1. $\textit{greylock}$: A Python Package for Measuring The Composition of Complex Datasets

Author: Nguyen, Phuc, Arora, Rohit, Hill, Elliot D., Braun, Jasper, Morgan, Alexandra, Quintana, Liza M., Mazzoni, Gabrielle, Lee, Ghee Rye, Arnaout, Rima, and Arnaout, Ramy
Subjects: Quantitative Biology - Quantitative Methods
Abstract: Machine-learning datasets are typically characterized by measuring their size and class balance. However, there exists a richer and potentially more useful set of measures, termed diversity measures, that incorporate elements' frequencies and between-element similarities. Although these have been available in the R and Julia programming languages for other applications, they have not been as readily available in Python, which is widely used for machine learning, and are not easily applied to machine-learning-sized datasets without special coding considerations. To address these issues, we developed $\textit{greylock}$, a Python package that calculates diversity measures and is tailored to large datasets. $\textit{greylock}$ can calculate any of the frequency-sensitive measures of Hill's D-number framework, and going beyond Hill, their similarity-sensitive counterparts (Greylock is a mountain). $\textit{greylock}$ also outputs measures that compare datasets (beta diversities). We first briefly review the D-number framework, illustrating how it incorporates elements' frequencies and between-element similarities. We then describe $\textit{greylock}$'s key features and usage. We end with several examples - immunomics, metagenomics, computational pathology, and medical imaging - illustrating $\textit{greylock}$'s applicability across a range of dataset types and fields., Comment: 42 pages, many figures. Many thanks to Ralf Bundschuh for help with the submission process
Published: 2023

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"Morgan, Alexandra"'

1. $\textit{greylock}$: A Python Package for Measuring The Composition of Complex Datasets

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

1 results on '"Morgan, Alexandra"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources