Back to Search
Start Over
LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
- Source :
- NAR Genomics and Bioinformatics
- Publication Year :
- 2021
- Publisher :
- Oxford University Press (OUP), 2021.
-
Abstract
- Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs.
- Subjects :
- AcademicSubjects/SCI01140
0303 health sciences
Liquid-crystal display
AcademicSubjects/SCI01060
Computer science
AcademicSubjects/SCI00030
Sequence alignment
Computational biology
Composition (combinatorics)
AcademicSubjects/SCI01180
law.invention
Low complexity
03 medical and health sciences
Identification (information)
0302 clinical medicine
Protein sequencing
law
Methods Article
Feature (machine learning)
AcademicSubjects/SCI00980
UniProt
030217 neurology & neurosurgery
030304 developmental biology
Subjects
Details
- ISSN :
- 26319268
- Volume :
- 3
- Database :
- OpenAIRE
- Journal :
- NAR Genomics and Bioinformatics
- Accession number :
- edsair.doi.dedup.....e36c77f2e21b2b695ff97005c8547120
- Full Text :
- https://doi.org/10.1093/nargab/lqab048