Massachusetts Institute of Technology. Institute for Medical Engineering & Science, Massachusetts Institute of Technology. Institute for Data, Systems, and Society, MIT Critical Data (Laboratory), Celi, Leo Anthony, Cellini, Jacqueline, Charpignon, Marie-Laure, Dee, Edward Christopher, Dernoncourt, Franck, Eber, Rene, Mitchell, William Greig, Moukheiber, Lama, Schirmer, Julian, Situ, Julia, Paguio, Joseph, Park, Joel, Wawira, Judy Gichoya, Yao, Seth, Massachusetts Institute of Technology. Institute for Medical Engineering & Science, Massachusetts Institute of Technology. Institute for Data, Systems, and Society, MIT Critical Data (Laboratory), Celi, Leo Anthony, Cellini, Jacqueline, Charpignon, Marie-Laure, Dee, Edward Christopher, Dernoncourt, Franck, Eber, Rene, Mitchell, William Greig, Moukheiber, Lama, Schirmer, Julian, Situ, Julia, Paguio, Joseph, Park, Joel, Wawira, Judy Gichoya, and Yao, Seth
Background While artificial intelligence (AI) offers possibilities of advanced clinical prediction and decision-making in healthcare, models trained on relatively homogeneous datasets, and populations poorly-representative of underlying diversity, limits generalisability and risks biased AI-based decisions. Here, we describe the landscape of AI in clinical medicine to delineate population and data-source disparities. Methods We performed a scoping review of clinical papers published in PubMed in 2019 using AI techniques. We assessed differences in dataset country source, clinical specialty, and author nationality, sex, and expertise. A manually tagged subsample of PubMed articles was used to train a model, leveraging transfer-learning techniques (building upon an existing BioBERT model) to predict eligibility for inclusion (original, human, clinical AI literature). Of all eligible articles, database country source and clinical specialty were manually labelled. A BioBERT-based model predicted first/last author expertise. Author nationality was determined using corresponding affiliated institution information using Entrez Direct. And first/last author sex was evaluated using the Gendarize.io API. Results Our search yielded 30,576 articles, of which 7,314 (23.9%) were eligible for further analysis. Most databases came from the US (40.8%) and China (13.7%). Radiology was the most represented clinical specialty (40.4%), followed by pathology (9.1%). Authors were primarily from either China (24.0%) or the US (18.4%). First and last authors were predominately data experts (i.e., statisticians) (59.6% and 53.9% respectively) rather than clinicians. And the majority of first/last authors were male (74.1%). Interpr