1. Diversity, inclusivity and traceability of mammography datasets used in development of Artificial Intelligence technologies: a systematic review.
- Author
-
Laws E, Palmer J, Alderman J, Sharma O, Ngai V, Salisbury T, Hussain G, Ahmed S, Sachdeva G, Vadera S, Mateen B, Matin R, Kuku S, Calvert M, Gath J, Treanor D, McCradden M, Mackintosh M, Gichoya J, Trivedi H, Denniston AK, and Liu X
- Abstract
Purpose: There are many radiological datasets for breast cancer, some which have supported the development of AI medical devices for breast cancer screening and image classification. This review aims to identify mammography datasets (including digitised screen film mammography, 2D digital mammography and digital breast tomosynthesis) used in the development of AI technologies and present their characteristics, including their transparency of documentation, content, populations included and accessibility., Materials and Methods: MEDLINE and Google Dataset searches identified studies describing AI technology development and referencing breast imaging datasets up to June 2024. The characteristics of each dataset are summarised. In particular, the accompanying documentation was reviewed with a focus on diversity and inclusion of populations represented within each dataset., Results: 254 datasets were referenced in the literature search, 190 were privately held, 36 had barriers which prevented access, and 28 were accessible. Most datasets originated from Europe, East Asia and North America. There was poor reporting of individuals' attributes: 32 (12 %) datasets reported race or ethnicity; 76 (30 %) reported female/male categories with only one dataset explicitly defining whether these categories represented sex or gender attributes., Conclusion: Through this review, we demonstrate gaps in the data landscape for mammography, highlighting poor representation globally. To ensure datasets in breast imaging have maximum utility for researchers, their characteristics should be documented and limitations of datasets, such as their representativeness of populations and settings, should inform scientific efforts to translate data-driven insights into technologies and discoveries., Competing Interests: Declaration of competing interest MJC is Director of the Birmingham Health Partners Centre for Regulatory Science and Innovation, Director of the Centre for Patient Reported Outcomes Research, and is a National Institute for Health and Care Research (NIHR) Senior Investigator. MJC receives funding from the NIHR, UK Research and Innovation (UKRI), NIHR Birmingham Biomedical Research Centre, NIHR, Applied Research Collaboration (ARC) West Midlands, Research England, European Regional Development Fund, and the NIHR Blood and Transplant Research Unit in Precision Transplant and Cellular Therapeutics. MJC also receives funding from Innovate UK (part of UKRI), Macmillan Cancer Support, UCB Pharma, GSK, Anthony Nolan, Gilead Sciences, European Commission, European Federation of Pharmaceutical Industries and Associations, and The Brain Tumor Charity. MJC has received personal fees from Aparito, CIS Oncology, Gilead, Halfloop, Takeda Pharmaceuticals, Merck, Daiichi Sankyo, Glaukos, GSK, the Patient-Centered Outcomes Research Institute, Pfizer, Genentech, and Vertex Pharmaceuticals, outside of the submitted work. MMackintosh is an employee of Genomics England and Founder and Director of One HealthTech and Data Science for Health Equity (within One HealthTech Ltd). JG is a grantee of the 2022 Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program and declares support from RSNA Health Disparities Grant (EIHD2204), Lacuna Fund (67), Gordon and Betty Moore Foundation, and NIH (NIBIB) MIDRC grant under contracts 75N92020C00008 and 75N92020C00021. XL has received consulting fees from Hardian Health and was previously a Health Studies Scientist at Apple., (Copyright © 2024 The Authors. Published by Elsevier Inc. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF