Back to Search
Start Over
adabmDCA: adaptive Boltzmann machine learning for biological sequences
- Source :
- BMC Bioinformatics, BMC Bioinformatics, BioMed Central, 2021, 22 (1), ⟨10.1186/s12859-021-04441-9⟩, BMC Bioinformatics, Vol 22, Iss 1, Pp 1-19 (2021)
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- BackgroundBoltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generatingin silicofunctional sequences.ResultsOur adaptive implementation of Boltzmann machine learning, , can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available athttps://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain.ConclusionsThe models learned by are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.
- Subjects :
- QH301-705.5
Computer science
Computer applications to medicine. Medical informatics
R858-859.7
Boltzmann machine
FOS: Physical sciences
Quantitative Biology - Quantitative Methods
01 natural sciences
Biochemistry
Boltzmann machine learning
Domain (software engineering)
Machine Learning
03 medical and health sciences
symbols.namesake
RNA modelling
Structural Biology
0103 physical sciences
Statistical inference
Code (cryptography)
Humans
Pruning (decision trees)
Biology (General)
[PHYS.COND.CM-SM]Physics [physics]/Condensed Matter [cond-mat]/Statistical Mechanics [cond-mat.stat-mech]
010306 general physics
Molecular Biology
Quantitative Methods (q-bio.QM)
030304 developmental biology
Lossless compression
0303 health sciences
Applied Mathematics
Proteins
Protein modelling
Biomolecules (q-bio.BM)
Disordered Systems and Neural Networks (cond-mat.dis-nn)
Condensed Matter - Disordered Systems and Neural Networks
Computer Science Applications
Quantitative Biology - Biomolecules
FOS: Biological sciences
Boltzmann constant
symbols
RNA
Pairwise comparison
Algorithm
Software
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 22
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....ebcfd31e16b82f2e8feb8349ef34a1a4