1. Opportunities offered by latent-based multiblock strategies to integrate biomarkers of chemical exposure and biomarkers of effect in environmental health studies.
- Author
-
Babin É, Vigneau E, Antignac JP, Le Bizec B, and Cano-Sancho G
- Subjects
- Humans, Least-Squares Analysis, Environmental Health, Environmental Pollutants analysis, Female, Biomarkers analysis, Environmental Exposure statistics & numerical data
- Abstract
Modern environmental epidemiology benefits from a new generation of technologies that enable comprehensive profiling of biomarkers, including environmental chemical exposure and omic datasets. The integration and analysis of large and structured datasets to identify functional associations is constrained by computational challenges that cannot be overcome using conventional regression methods. Some extensions of Partial Least Squares (PLS) regression have been developed to efficently integrate multiple datasets, including Multiblock PLS (MB-PLS) and Sequential and Orthogonalized PLS; however, these approaches remain seldom applied in environmental epidemiology. To address that research gap, this study aimed to assess and compare the applicability of PLS-based multiblock models in an observational case study, where biomarkers of exposure to environmental chemicals and endogenous biomarkers of effect were simultaneously integrated to highlight biological links related to a health outcome. The methods were compared with and without sparsity coupling two metrics to support the variable selection: Variable Importance in Projection (VIP) and Selectivity Ratio (SR). The framework was applied to a case-study dataset mimicking the structure of 36 environmental exposure biomarkers (E-block), 61 inflammation biomarkers (M-block), and their relationships with the gestational age at delivery of 161 mother-infant pairs. The results showed an overall consistency in the selected variables across models, although some specific selection patterns were identified. The block-scaled concatenation-based approaches (e.g. MB-PLS) tended to select more variables from the E-block, while these methods were unable to identify certain variables in the M-block. Overall, the number of variables selected using the SR criterion was higher than using the VIP criterion, with lower predictive performances. The multiblock models coupled to VIP, appeared to be the methods of choice for identifying relevant variables with similar statistical performances. Overall, the use of multiblock PLS-based methods appears to be a good strategy to efficiently support the variable selection process in modern environmental epidemiology., Competing Interests: Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: É.Babin reports financial support provided by Regional Council of Pays de la Loire and INRAE with a a doctoral grant (REF 34001102)., (Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF