Back to Search Start Over

Democratizing data science through data science training

Authors :
Lily Fierro
Crystal Stewart
Sumiko Abe
Gully A. P. C. Burns
Aakanchha Sinha
Kristina Lerman
José Luis Ambite
Jeana Kamdar
Caroline O'Driscoll
Avnish Bhattrai
Priyambada Jain
Xiaoxiao Lei
John D. Van Horn
Jonathan Gordon
Source :
PSB
Publication Year :
2017
Publisher :
WORLD SCIENTIFIC, 2017.

Abstract

The biomedical sciences have experienced an explosion of data which promises to overwhelm many current practitioners. Without easy access to data science training resources, biomedical researchers may find themselves unable to wrangle their own datasets. In 2014, to address the challenges posed such a data onslaught, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative. To this end, the BD2K Training Coordinating Center (TCC; bigdatau.org) was funded to facilitate both in-person and online learning, and open up the concepts of data science to the widest possible audience. Here, we describe the activities of the BD2K TCC and its focus on the construction of the Educational Resource Discovery Index (ERuDIte), which identifies, collects, describes, and organizes online data science materials from BD2K awardees, open online courses, and videos from scientific lectures and tutorials. ERuDIte now indexes over 9,500 resources. Given the richness of online training materials and the constant evolution of biomedical data science, computational methods applying information retrieval, natural language processing, and machine learning techniques are required - in effect, using data science to inform training in data science. In so doing, the TCC seeks to democratize novel insights and discoveries brought forth via large-scale data science training.

Details

Database :
OpenAIRE
Journal :
Biocomputing 2018
Accession number :
edsair.doi.dedup.....e7fbc8fc1732d7570314b3d90c82b955
Full Text :
https://doi.org/10.1142/9789813235533_0027