Back to Search
Start Over
Investigating the stability of multimorbidity clusters within and between clustering algorithms, in simulated and health record data
- Publication Year :
- 2022
- Publisher :
- Elsevier, 2022.
-
Abstract
- Introduction A number of analytic methods have been used in large datasets to group individuals into clusters by the presence of long-term health conditions. We investigate the reproducibility and validity of these methods in a large simulated dataset and in a dataset derived from electronic primary care records. Methods We selected four clustering algorithms: latent class analysis (LCA) and hierarchical cluster analysis (HCA), multiple correspondence analysis followed by k-means (MCA-kmeans) and k-means (kmeans). Clustering algorithms were first investigated in simulated datasets with 26 diseases of varying prevalence in predetermined clusters and varying prevalence of noise (patients not in a cluster). We compared the derived clusters to known clusters using the adjusted Rand Index (aRI). Methods were then investigated in the medical records of male patients, aged 65 to 84 from 50 UK general practices. Forty-nine long-term health conditions with at least 1% prevalence were considered. Within cluster morbidity profiles were using the Pearson correlation coefficient (PCC). Cluster stability was assessed in 400 bootstrap samples. Results In the simulated datasets, aRI declined as the prevalence of noise increased, approaching zero and when the prevalence of noise approached that of long-term health conditions. LCA and MCA-kmeans algorithms gave the closest agreement (largest aRI) to known clusters. In the patient records dataset, all four algorithms identified one cluster accounting for 20-25% of the study population; LCA and MCA-kmeans found a second cluster of about 7% of the population. Other clusters were found by only one algorithm. LCA and MCA-kmeans clustering gave the most similar partitioning (aRI 0.54). About 82% of patients in the large cluster were the same across all four methods. Discussion The amount of noise markedly affects clustering algorithms, suggesting individuals with a single condition should be excluded. LCA achieved higher aRI than other clustering algorithms.
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....baa6040527aa4e2352d3253ff2d4623c
- Full Text :
- https://doi.org/10.17863/cam.89191