1. Analysis of Forced Vital Capacity (FVC) trajectories in Idiopathic Pulmonary Fibrosis (IPF) identifies four distinct clusters of disease behaviour
- Author
-
Fainberg, H.P., Oldham, J.M., Molyneaux, P.L., Allen, R.J., Kraven, L.M., Fahy, W.A., Porte, J., Braybrooke, R., Saini, G., Karsdal, M.A., Leeming, D.J., Triguero, I., Oballa, E., Wells, A., Renzoni, E., Wain, L.V., Noth, I., Maher, T.M., Stewart, I.D., and Jenkins, R.G.
- Abstract
Background: Idiopathic Pulmonary Fibrosis (IPF) is a progressive fibrotic lung disease with a variable clinical trajectory. Decline in Forced Vital Capacity (FVC) is the main indicator of progression, however missingness prevents long-term analysis of lung function patterns. We used Machine Learning (ML) techniques to identify patterns of lung function trajectory. Methods: Longitudinal FVC data were collected from 415 participants with IPF. The imputation performance of conventional and ML techniques to impute missing data was evaluated, then the fully imputed dataset was analysed by unsupervised clustering using Self-Organizing Maps (SOM). Anthropometrics, genomic associations, blood biomarkers and clinical outcomes were compared between clusters. Replication was performed using an independent dataset. Results: An unsupervised ML algorithm had the lowest imputation error amongst tested methods, and SOM identified four distinct clusters (CL1 to CL4), confirmed by sensitivity analysis. CL1 (n=140): linear decline over three years; CL2 (n=100): initial improvement in FVC before declining; CL3 (n=113): initial FVC decline before stabilisation; CL4(n=62): stable lung function. Median survival was shortest in CL1 (2.87 - 95%CI: 2.29–3.40) and longest in CL4 (5.65 - 95%CI: 5.18–6.62). Baseline FEV1/FVC ratio and biomarker SPD levels were significantly higher among clusters CL1 and CL3. Similar lung function clusters with some shared anthropometric characteristics were identified in the replication dataset. Conclusions: Using a data-driven unsupervised approach, we identified four clusters of lung function trajectory with distinct clinical and biochemical features. Enriching or stratifying longitudinal spirometric data into clusters may optimise evaluation of intervention efficacy during clinical trials and patient management