Back to Search Start Over

Enhancing breakpoint resolution with deep segmentation model: A general refinement method for read-depth based structural variant callers

Authors :
Satoru Miyano
Rui Yamaguchi
Yao-zhong Zhang
Seiya Imoto
Source :
PLoS Computational Biology, Vol 17, Iss 10, p e1009186 (2021), PLoS Computational Biology
Publication Year :
2021
Publisher :
Public Library of Science (PLoS), 2021.

Abstract

Read-depths (RDs) are frequently used in identifying structural variants (SVs) from sequencing data. For existing RD-based SV callers, it is difficult for them to determine breakpoints in single-nucleotide resolution due to the noisiness of RD data and the bin-based calculation. In this paper, we propose to use the deep segmentation model UNet to learn base-wise RD patterns surrounding breakpoints of known SVs. We integrate model predictions with an RD-based SV caller to enhance breakpoints in single-nucleotide resolution. We show that UNet can be trained with a small amount of data and can be applied both in-sample and cross-sample. An enhancement pipeline named RDBKE significantly increases the number of SVs with more precise breakpoints on simulated and real data. The source code of RDBKE is freely available at https://github.com/yaozhong/deepIntraSV.<br />Author summary In this paper, we used the deep segmentation model UNet to alleviate the bin size limitation of RD-based SV callers. UNet was initially proposed for image data. Here, we demonstrated that the UNet model could also be applied for one-dimensional genomic data. We formalized the breakpoint prediction as a segmentation task and inferred breakpoints in single-nucleotide resolution from predicted label marks. Through a set of experiments on both simulated and real WGS data, we demonstrated that the UNet model could be trained with a small amount of data, and an enhancement pipeline called RDBKE significantly increased the number of SVs with more precise breakpoints.

Details

ISSN :
15537358
Volume :
17
Database :
OpenAIRE
Journal :
PLOS Computational Biology
Accession number :
edsair.doi.dedup.....9959176e1eab6d3272909284ec3125f5