Back to Search Start Over

adabmDCA: adaptive Boltzmann machine learning for biological sequences

Authors :
Anna Paola Muntoni
Andrea Pagnani
Francesco Zamponi
Martin Weigt
Systèmes Désordonnés et Applications
Laboratoire de physique de l'ENS - ENS Paris (LPENS (UMR_8023))
École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)
Biologie Computationnelle et Quantitative = Laboratory of Computational and Quantitative Biology (LCQB)
Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Institut de Biologie Paris Seine (IBPS)
Sorbonne Université (SU)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)
Laboratoire de physique de l'ENS - ENS Paris (LPENS)
Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)-Sorbonne Université (SU)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)-Sorbonne Université (SU)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
Source :
BMC Bioinformatics, BMC Bioinformatics, BioMed Central, 2021, 22 (1), ⟨10.1186/s12859-021-04441-9⟩, BMC Bioinformatics, Vol 22, Iss 1, Pp 1-19 (2021)
Publication Year :
2021
Publisher :
Springer Science and Business Media LLC, 2021.

Abstract

BackgroundBoltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generatingin silicofunctional sequences.ResultsOur adaptive implementation of Boltzmann machine learning, , can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available athttps://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain.ConclusionsThe models learned by are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

Details

ISSN :
14712105
Volume :
22
Database :
OpenAIRE
Journal :
BMC Bioinformatics
Accession number :
edsair.doi.dedup.....ebcfd31e16b82f2e8feb8349ef34a1a4