1. The shape of cancer relapse: Topological data analysis predicts recurrence in paediatric acute lymphoblastic leukaemia.
- Author
-
Chulián, Salvador, Stolz, Bernadette J., Martínez-Rubio, Álvaro, Blázquez Goñi, Cristina, Rodríguez Gutiérrez, Juan F., Caballero Velázquez, Teresa, Molinos Quintana, Águeda, Ramírez Orellana, Manuel, Castillo Robleda, Ana, Fuster Soler, José Luis, Minguela Puras, Alfredo, Martínez Sánchez, María V., Rosa, María, Pérez-García, Víctor M., and Byrne, Helen M.
- Subjects
CANCER relapse ,LYMPHOBLASTIC leukemia ,ACUTE leukemia ,DATA analysis ,FLOW cytometry ,MACHINE learning ,SURVIVAL analysis (Biometry) ,NOMOGRAPHY (Mathematics) - Abstract
Although children and adolescents with acute lymphoblastic leukaemia (ALL) have high survival rates, approximately 15-20% of patients relapse. Risk of relapse is routinely estimated at diagnosis by biological factors, including flow cytometry data. This high-dimensional data is typically manually assessed by projecting it onto a subset of biomarkers. Cell density and "empty spaces" in 2D projections of the data, i.e. regions devoid of cells, are then used for qualitative assessment. Here, we use topological data analysis (TDA), which quantifies shapes, including empty spaces, in data, to analyse pre-treatment ALL datasets with known patient outcomes. We combine these fully unsupervised analyses with Machine Learning (ML) to identify significant shape characteristics and demonstrate that they accurately predict risk of relapse, particularly for patients previously classified as 'low risk'. We independently confirm the predictive power of CD10, CD20, CD38, and CD45 as biomarkers for ALL diagnosis. Based on our analyses, we propose three increasingly detailed prognostic pipelines for analysing flow cytometry data from ALL patients depending on technical and technological availability: 1. Visual inspection of specific biological features in biparametric projections of the data; 2. Computation of quantitative topological descriptors of such projections; 3. A combined analysis, using TDA and ML, in the four-parameter space defined by CD10, CD20, CD38 and CD45. Our analyses readily extend to other haematological malignancies. Author summary: Acute lymphoblastic leukaemia (ALL) is a blood cancer which affects predominantly children and adolescents. Therapy typically fails in approximately 20% of patients, who suffer from relapse. To determine disease status, clinicians assess cell types, their interactions, as well as deviations from normal behaviour. Flow cytometry (FC) is a method that quantifies the intensity of specific cell markers, giving rise to high-dimensional data. This routinely collected information is then reduced to obtain human-interpretable visualisation for prognosis. Topological Data Analysis (TDA) is a field of mathematics that studies shapes in data, considering isolated data islands and empty spaces between them. We showcase how to use TDA to extract shape characteristics in FC data of relapsing patients. We propose three pipelines, of increasing methodological complexity, to aid clinical decisions for risk stratification in ALL. In combination with Machine Learning, TDA enables high-accuracy predictions of relapse to be made at the time of diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF