Back to Search
Start Over
SICaRiO: short indel call filtering with boosting.
- Source :
-
Briefings in Bioinformatics . Jul2021, Vol. 22 Issue 4, p1-12. 12p. - Publication Year :
- 2021
-
Abstract
- Despite impressive improvement in the next-generation sequencing technology, reliable detection of indels is still a difficult endeavour. Recognition of true indels is of prime importance in many applications, such as personalized health care, disease genomics and population genetics. Recently, advanced machine learning techniques have been successfully applied to classification problems with large-scale data. In this paper, we present SICaRiO, a gradient boosting classifier for the reliable detection of true indels, trained with the gold-standard dataset from 'Genome in a Bottle' (GIAB) consortium. Our filtering scheme significantly improves the performance of each variant calling pipeline used in GIAB and beyond. SICaRiO uses genomic features that can be computed from publicly available resources, i.e. it does not require sequencing pipeline-specific information (e.g. read depth). This study also sheds lights on prior genomic contexts responsible for the erroneous calling of indels made by sequencing pipelines. We have compared prediction difficulty for three categories of indels over different sequencing pipelines. We have also ranked genomic features according to their predictivity in determining false positives. [ABSTRACT FROM AUTHOR]
- Subjects :
- *POPULATION genetics
*NUCLEOTIDE sequencing
*MACHINE learning
*GENOMICS
*GENOMES
Subjects
Details
- Language :
- English
- ISSN :
- 14675463
- Volume :
- 22
- Issue :
- 4
- Database :
- Academic Search Index
- Journal :
- Briefings in Bioinformatics
- Publication Type :
- Academic Journal
- Accession number :
- 152575435
- Full Text :
- https://doi.org/10.1093/bib/bbaa238