Back to Search Start Over

A data science roadmap for open science organizations engaged in early-stage drug discovery

Authors :
Kristina Edfeldt
Aled M. Edwards
Ola Engkvist
Judith Günther
Matthew Hartley
David G. Hulcoop
Andrew R. Leach
Brian D. Marsden
Amelie Menge
Leonie Misquitta
Susanne Müller
Dafydd R. Owen
Kristof T. Schütt
Nicholas Skelton
Andreas Steffen
Alexander Tropsha
Erik Vernet
Yanli Wang
James Wellnitz
Timothy M. Willson
Djork-Arné Clevert
Benjamin Haibe-Kains
Lovisa Holmberg Schiavone
Matthieu Schapira
Source :
Nature Communications, Vol 15, Iss 1, Pp 1-10 (2024)
Publication Year :
2024
Publisher :
Nature Portfolio, 2024.

Abstract

Abstract The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design.

Subjects

Subjects :
Science

Details

Language :
English
ISSN :
20411723
Volume :
15
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Nature Communications
Publication Type :
Academic Journal
Accession number :
edsdoj.7b4037ed6df47b49327448e7112d3fb
Document Type :
article
Full Text :
https://doi.org/10.1038/s41467-024-49777-x