Back to Search Start Over

Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants

Authors :
Jiadong Lin
Xiaofei Yang
Walter Kosters
Tun Xu
Yanyan Jia
Songbo Wang
Qihui Zhu
Mallory Ryan
Li Guo
Chengsheng Zhang
Charles Lee
Scott E. Devine
Evan E. Eichler
Kai Ye
Mark B. Gerstein
Ashley D. Sanders
Micheal C. Zody
Michael E. Talkowski
Ryan E. Mills
Jan O. Korbel
Tobias Marschall
Peter Ebert
Peter A. Audano
Bernardo Rodriguez-Martin
David Porubsky
Marc Jan Bonder
Arvis Sulovari
Jana Ebler
Weichen Zhou
Rebecca Serra Mari
Feyza Yilmaz
Xuefang Zhao
PingHsun Hsieh
Joyce Lee
Sushant Kumar
Tobias Rausch
Yu Chen
Zechen Chong
Katherine M. Munson
Mark J.P. Chaisson
Junjie Chen
Xinghua Shi
Aaron M. Wenger
William T. Harvey
Patrick Hansenfeld
Allison Regier
Ira M. Hall
Paul Flicek
Alex R. Hastie
Susan Fairely
Source :
Genomics, Proteomics and Bioinformatics, 20(1), 205-218
Publication Year :
2022
Publisher :
Elsevier BV, 2022.

Abstract

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.

Details

ISSN :
16720229
Volume :
20
Database :
OpenAIRE
Journal :
Genomics, Proteomics & Bioinformatics
Accession number :
edsair.doi.dedup.....6e5942fbb13f962b958ac29c2ee509d0