1. A multi-objective based clustering for inferring BCR clonal lineages from high-throughput B cell repertoire data.
- Author
-
Abdollahi, Nika, Jeusset, Lucile, De Septenville, Anne Langlois, Ripoche, Hugues, Davi, Frédéric, and Bernardes, Juliana Silva
- Subjects
B cells ,B cell receptors ,B cell lymphoma ,NUCLEOTIDE sequencing ,BOOSTING algorithms ,CELL populations - Abstract
The adaptive B cell response is driven by the expansion, somatic hypermutation, and selection of B cell clonal lineages. A high number of clonal lineages in a B cell population indicates a highly diverse repertoire, while clonal size distribution and sequence diversity antigen selective pressure. Identifying clonal lineages is fundamental to many repertoire studies, including repertoire comparisons, clonal tracking, and statistical analysis. Several methods have been developed to group sequences from high-throughput B cell repertoire data. Current methods use clustering algorithms to group clonally-related sequences based on their similarities or distances. Such approaches create groups by optimizing a single objective that typically minimizes intra-clonal distances. However, optimizing several objective functions can be advantageous and boost the algorithm convergence rate. Here we propose a new method based on multi-objective clustering. Our approach requires V(D)J annotations to obtain the initial groups and iteratively applies two objective functions that optimize cohesion and separation within clonal lineages simultaneously. We show that our method greatly improves clonal lineage grouping on simulated benchmarks with varied mutation rates compared to other tools. When applied to experimental repertoires generated from high-throughput sequencing, its clustering results are comparable to the most performing tools and can reproduce the results of previous publications. The method based on multi-objective clustering can accurately identify clonally-related antibody sequences and presents the lowest running time among state-of-art tools. All these features constitute an attractive option for repertoire analysis, particularly in the clinical context. MobiLLe can potentially help unravel the mechanisms involved in developing and evolving B cell malignancies. Author summary: High-throughput sequencing can produce a large set of sequences and has profoundly changed our ability to study immune repertoires, particularly B cell receptor sequences. An important application is the analysis of the clonal lineage composition of B cell populations; it is the starting point of many immune repertoire studies, for instance, to differentiate between healthy individuals and those with lymphoid malignancies or other diseases. Several computational methods have been developed to identify clonal lineages from a set of B cell receptor sequences. Most of them apply clustering algorithms and optimize a single objective function that typically minimizes intra-clonal distances. However, optimizing several objective functions in parallel can benefit and increase the clustering performance and efficiency. We propose MobiLLe, the first multi-objective clonal lineage grouping method, which simultaneously optimizes two objective functions for minimizing intra-clonal diversity and maximizing inter-clonal differences. Our approach greatly improved clonal grouping on simulated benchmarks and performed comparably to the most powerful and recent methods on experimental samples. MobiLLe is computationally more efficient than existing tools and does not require any training process or hyper-parameter optimization. It can easily manage large-scale experimental repertoires, providing useful plots to help researchers detect clonally-related sequences in high-throughput B cell repertoire data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF