Back to Search
Start Over
AncestralClust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees
- Source :
- Bioinformatics
- Publication Year :
- 2021
- Publisher :
- Oxford University Press (OUP), 2021.
-
Abstract
- MotivationClustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences.ResultsWe describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods.Availability and implementationAncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust.Contactlpipes@berkeley.eduSupplementary informationSupplementary figures and table are available online.
- Subjects :
- Statistics and Probability
Sequence
Sequence reconstruction
Phylogenetic tree
Computer science
Computational biology
Biology
Original Papers
Biochemistry
Computer Science Applications
Set (abstract data type)
Computational Mathematics
Computational Theory and Mathematics
Cluster (physics)
Table (database)
Cluster analysis
Greedy algorithm
Molecular Biology
Homologous gene
Sequence (medicine)
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 38
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....832790c4809514e62c49b568098d92ce
- Full Text :
- https://doi.org/10.1093/bioinformatics/btab723