Back to Search
Start Over
Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets
- Source :
- Current Protocols. 1
- Publication Year :
- 2021
- Publisher :
- Wiley, 2021.
-
Abstract
- Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. The following protocols are included in this manuscript: Basic Protocol 1: Generic use of the bio_embeddings pipeline to plot protein sequences and annotations Basic Protocol 2: Generate embeddings from protein sequences using the bio_embeddings pipeline Basic Protocol 3: Overlay sequence annotations onto a protein space visualization Basic Protocol 4: Train a machine learning classifier on protein embeddings Alternate Protocol 1: Generate 3D instead of 2D visualizations Alternate Protocol 2: Visualize protein solubility instead of protein subcellular localization Support Protocol: Join embedding generation and sequence space visualization in a pipeline.
- Subjects :
- Computer science
Health Informatics
Machine learning
computer.software_genre
ENCODE
General Biochemistry, Genetics and Molecular Biology
Plot (graphics)
Machine Learning
Deep Learning
Artificial Intelligence
General Pharmacology, Toxicology and Pharmaceutics
Protocol (object-oriented programming)
Natural Language Processing
Learning classifier system
General Immunology and Microbiology
business.industry
General Neuroscience
Deep learning
Proteins
Pipeline (software)
Visualization
Medical Laboratory Technology
ComputingMethodologies_PATTERNRECOGNITION
Language model
Artificial intelligence
business
computer
Subjects
Details
- ISSN :
- 26911299
- Volume :
- 1
- Database :
- OpenAIRE
- Journal :
- Current Protocols
- Accession number :
- edsair.doi.dedup.....935acf08f90deb57e18b2a291f42ca39