Back to Search Start Over

Navigating the development challenges in creating complex data systems

Authors :
Dittmer, S
Roberts, M
Gilbey, J
Biguri, A
Selby, I
Breger, A
Thorpe, M
Weir-McCall
Gkrania-Klotsas, E
Korhonen, A
Jefferson, E
Langs, G
Yang, G
Prosch, H
Stanczuk, J
Tang, J
Babar, J
Escudero Sánchez, L
Teare, P
Patel, M
Wassin, M
Holzer, M
Walton, N
Lió, P
Shadbahr, T
Sala, E
Preller, J
Rudd, JHF
Aston, JAD
Schönlieb, CB
Dittmer, S [0000-0003-2919-4956]
Roberts, M [0000-0002-3484-5031]
Gilbey, J [0000-0002-5987-5261]
Biguri, A [0000-0002-2636-3032]
Preller, J [0000-0001-5706-816X]
Rudd, JHF [0000-0003-2243-3117]
Apollo - University of Cambridge Repository
Publication Year :
2023
Publisher :
Springer Science and Business Media LLC, 2023.

Abstract

Data science systems (DSSs) are a fundamental tool in many areas of research and are now developed by people with a myriad of backgrounds. This is coupled with a crisis in reproducibility of such DSSs despite the wide availability of powerful tools for data science and machine learning over the last decade. We believe that perverse incentives and a lack of widespread software engineering skills are among the many causes of this crisis and analyze why software engineering and building large complex systems is, in general, hard. Based on these insights, we identify how software engineering addresses those difficulties and how one might apply and generalize software engineering methods to make DSSs more fit for purpose. We advocate two key development philosophies: one should incrementally grow – not plan then build – DSSs, and one should use two types of feedback loops during development: one which tests the code’s correctness and another that evaluates the code’s efficacy.

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....e8111f47a08b42d43d3f832faefe6a57
Full Text :
https://doi.org/10.17863/cam.96892