383 results on '"String graph"'
Search Results
2. String Graphs with Precise Number of Intersections
- Author
-
Chmel, Petr, Jelínek, Vít, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bekos, Michael A., editor, and Chimani, Markus, editor
- Published
- 2023
- Full Text
- View/download PDF
3. Coloring Hasse Diagrams and Disjointness Graphs of Curves
- Author
-
Pach, János, Tomon, István, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Archambault, Daniel, editor, and Tóth, Csaba D., editor
- Published
- 2019
- Full Text
- View/download PDF
4. Approximating Minimum Dominating Set on String Graphs
- Author
-
Chakraborty, Dibyayan, Das, Sandip, Mukherjee, Joydeep, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sau, Ignasi, editor, and Thilikos, Dimitrios M., editor
- Published
- 2019
- Full Text
- View/download PDF
5. A Sharp Threshold Phenomenon in String Graphs.
- Author
-
Tomon, István
- Subjects
- *
PLANE curves , *INTERSECTION graph theory , *SPARSE graphs - Abstract
A string graph is the intersection graph of curves in the plane. We prove that for every ϵ > 0 , if G is a string graph with n vertices such that the edge density of G is below 1 / 4 - ϵ , then V(G) contains two linear sized subsets A and B with no edges between them. The constant 1/4 is a sharp threshold for this phenomenon as there are string graphs with edge density less than 1 / 4 + ϵ such that there is an edge connecting any two logarithmic sized subsets of the vertices. The existence of linear sized sets A and B with no edges between them in sufficiently sparse string graphs is a direct consequence of a recent result of Lee about separators. Our main theorem finds the largest possible density for which this still holds. In the special case when the curves are x-monotone, the same result was proved by Pach and the author of this paper, who also proposed the conjecture for the general case. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Induced subgraphs of graphs with large chromatic number. V. Chandeliers and strings.
- Author
-
Chudnovsky, Maria, Scott, Alex, and Seymour, Paul
- Subjects
- *
SUBDIVISION surfaces (Geometry) , *SUBGRAPHS , *CHANDELIERS , *INTERSECTION graph theory , *PLANE curves , *MULTIGRAPH - Abstract
It is known that every graph of sufficiently large chromatic number and bounded clique number contains, as an induced subgraph, a subdivision of any fixed forest, and a subdivision of any fixed cycle. Equivalently, every forest is pervasive, and K 3 is pervasive, in the class of all graphs, where we say a graph H is "pervasive" (in some class of graphs) if for all ℓ ≥ 1 , every graph in the class of bounded clique number and sufficiently large chromatic number has an induced subgraph that is a subdivision of H , in which every edge of H is replaced by a path of at least ℓ edges. Which other graphs are pervasive? It was proved by Chalopin, Esperet, Li and Ossona de Mendez that every such graph is a "forest of lanterns": roughly, every block is a "lantern", a graph obtained from a tree by adding one extra vertex, and there are rules about how the blocks fit together. It is not known whether every forest of lanterns is pervasive in the class of all graphs; but in another paper two of us prove that all "banana trees" are pervasive, that is, multigraphs obtained from a forest by adding parallel edges, thus generalizing the two results above. This paper contains the first half of the proof, which works for any forest of lanterns, not just for banana trees. Say a class of graphs is " ρ -controlled" if for every graph in the class, its chromatic number is at most some function (determined by the class) of the largest chromatic number of a ρ -ball in the graph. In this paper we prove that for every ρ ≥ 2 , and for every ρ -controlled class, every forest of lanterns is pervasive in this class. These results turn out particularly nicely when applied to string graphs. A "chandelier" is a special lantern, a graph obtained from a tree by adding a vertex adjacent to precisely the leaves of the tree. A "string graph" is the intersection graph of a set of curves in the plane. There are string graphs with clique number two and chromatic number arbitrarily large. We prove that the class of string graphs is 2-controlled, and consequently every forest of lanterns is pervasive in this class; but in fact something stronger is true, that every string graph of sufficiently large chromatic number and bounded clique number contains each fixed chandelier as an induced subgraph (not just as a subdivision); and the same for most forests of chandeliers (there is an extra condition on how the blocks are attached together). [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
7. RMI-DBG algorithm: A more agile iterative de Bruijn graph algorithm in short read genome assembly.
- Author
-
Hosseini, Zeinab Zare, Rahimi, Shekoufeh Kolahdouz, Forouzan, Esmaeil, and Baraani, Ahmad
- Subjects
- *
DE Bruijn graph , *GRAPH algorithms , *ALGORITHMS , *MAXIMA & minima , *GENOMES - Abstract
The de Bruijn Graph algorithm (DBG) as one of the cornerstones algorithms in short read assembly has extended with the rapid advancement of the Next Generation Sequencing (NGS) technologies and low-cost production of millions of high-quality short reads. Erroneous reads, non-uniform coverage, and genomic repeats are three major problems that influence the performance of short read assemblers. To encounter these problems, the iterative DBG algorithm applies multiple k -mers instead of a single k -mer, by iterating the DBG graph over a range of k -mer sizes from the minimum to the maximum. However, the iteration paradigm of iterative DBG deals with complex graphs from the beginning of the algorithm and therefore, causes more potential errors and computational time for resolving various unreal branches. In this research, we propose the Reverse Modified Iterative DBG graph (named RMI-DBG) for short read assembly. RMI-DBG utilizes the DBG algorithm and String graph to achieve the advantages of both algorithms. We present that RMI-DBG performs faster with comparable results in comparison to iterative DBG. Additionally, the quality of the proposed algorithm in terms of continuity and accuracy is evaluated with some commonly-used assemblers via several real datasets of the GAGE-B benchmark. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. FSG: Fast String Graph Construction for De Novo Assembly of Reads Data
- Author
-
Bonizzoni, Paola, Della Vedova, Gianluca, Pirola, Yuri, Previtali, Marco, Rizzi, Raffaella, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Bourgeois, Anu, editor, Skums, Pavel, editor, Wan, Xiang, editor, and Zelikovsky, Alex, editor
- Published
- 2016
- Full Text
- View/download PDF
9. From Indexing Data Structures to de Bruijn Graphs
- Author
-
Cazaux, Bastien, Lecroq, Thierry, Rivals, Eric, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, Kulikov, Alexander S., editor, Kuznetsov, Sergei O., editor, and Pevzner, Pavel, editor
- Published
- 2014
- Full Text
- View/download PDF
10. An Efficient Algorithm for Chinese Postman Walk on Bi-directed de Bruijn Graphs
- Author
-
Kundeti, Vamsi, Rajasekaran, Sanguthevar, Dinh, Heiu, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Wu, Weili, editor, and Daescu, Ovidiu, editor
- Published
- 2010
- Full Text
- View/download PDF
11. IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler
- Author
-
Peng, Yu, Leung, Henry C. M., Yiu, S. M., Chin, Francis Y. L., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael S., editor, and Berger, Bonnie, editor
- Published
- 2010
- Full Text
- View/download PDF
12. Comparative analysis of de novo assemblers for variation discovery in personal genomes.
- Author
-
Tian, Shulan, Yan, Huihuang, Klee, Eric W, Kalmbach, Michael, and Slager, Susan L
- Subjects
- *
COMPARATIVE studies , *GENOMES , *NUCLEOTIDE sequence , *EXOMES , *GENE mapping - Abstract
Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
13. Induced subgraphs of graphs with large chromatic number. V. Chandeliers and strings
- Author
-
Maria Chudnovsky, Alex Scott, and Paul Seymour
- Subjects
Vertex (graph theory) ,010102 general mathematics ,String (computer science) ,Induced subgraph ,0102 computer and information sciences ,Intersection graph ,01 natural sciences ,Tree (graph theory) ,Theoretical Computer Science ,Combinatorics ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,String graph ,Path (graph theory) ,FOS: Mathematics ,Mathematics - Combinatorics ,Discrete Mathematics and Combinatorics ,Combinatorics (math.CO) ,Multiple edges ,0101 mathematics ,Mathematics - Abstract
It is known that every graph of sufficiently large chromatic number and bounded clique number contains, as an induced subgraph , a subdivision of any fixed forest, and a subdivision of any fixed cycle. Equivalently, every forest is pervasive, and K 3 is pervasive, in the class of all graphs, where we say a graph H is “pervasive” (in some class of graphs) if for all l ≥ 1 , every graph in the class of bounded clique number and sufficiently large chromatic number has an induced subgraph that is a subdivision of H, in which every edge of H is replaced by a path of at least l edges. Which other graphs are pervasive? It was proved by Chalopin, Esperet, Li and Ossona de Mendez that every such graph is a “forest of lanterns”: roughly, every block is a “lantern”, a graph obtained from a tree by adding one extra vertex, and there are rules about how the blocks fit together. It is not known whether every forest of lanterns is pervasive in the class of all graphs; but in another paper two of us prove that all “banana trees” are pervasive, that is, multigraphs obtained from a forest by adding parallel edges, thus generalizing the two results above. This paper contains the first half of the proof, which works for any forest of lanterns, not just for banana trees. Say a class of graphs is “ρ-controlled” if for every graph in the class, its chromatic number is at most some function (determined by the class) of the largest chromatic number of a ρ-ball in the graph. In this paper we prove that for every ρ ≥ 2 , and for every ρ-controlled class, every forest of lanterns is pervasive in this class. These results turn out particularly nicely when applied to string graphs. A “chandelier” is a special lantern, a graph obtained from a tree by adding a vertex adjacent to precisely the leaves of the tree. A “string graph” is the intersection graph of a set of curves in the plane. There are string graphs with clique number two and chromatic number arbitrarily large. We prove that the class of string graphs is 2-controlled, and consequently every forest of lanterns is pervasive in this class; but in fact something stronger is true, that every string graph of sufficiently large chromatic number and bounded clique number contains each fixed chandelier as an induced subgraph (not just as a subdivision); and the same for most forests of chandeliers (there is an extra condition on how the blocks are attached together).
- Published
- 2021
- Full Text
- View/download PDF
14. A Sharp Threshold Phenomenon in String Graphs
- Author
-
István Tomon
- Subjects
Logarithm ,String graph ,Separator ,0102 computer and information sciences ,Edge (geometry) ,01 natural sciences ,Article ,Theoretical Computer Science ,Combinatorics ,FOS: Mathematics ,C++ string handling ,Mathematics - Combinatorics ,Discrete Mathematics and Combinatorics ,0101 mathematics ,Mathematics ,Conjecture ,Plane (geometry) ,010102 general mathematics ,Intersection graph ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,Combinatorics (math.CO) ,Geometry and Topology ,Constant (mathematics) ,05C10 - Abstract
A string graph is the intersection graph of curves in the plane. We prove that for every e>0, if G is a string graph with n vertices such that the edge density of G is below 1/4−e, then V(G) contains two linear sized subsets A and B with no edges between them. The constant 1/4 is a sharp threshold for this phenomenon as there are string graphs with edge density less than 1/4+e such that there is an edge connecting any two logarithmic sized subsets of the vertices. The existence of linear sized sets A and B with no edges between them in sufficiently sparse string graphs is a direct consequence of a recent result of Lee about separators. Our main theorem finds the largest possible density for which this still holds. In the special case when the curves are x-monotone, the same result was proved by Pach and the author of this paper, who also proposed the conjecture for the general case., Discrete & Computational Geometry, 67 (1), ISSN:0179-5376, ISSN:1432-0444
- Published
- 2021
- Full Text
- View/download PDF
15. FSG: Fast String Graph Construction for De Novo Assembly.
- Author
-
Bonizzoni, Paola, Vedova, Gianluca Della, Pirola, Yuri, Previtali, Marco, and Rizzi, Raffaella
- Subjects
- *
BIOINFORMATICS , *BIOLOGICAL mathematical modeling , *ALGORITHMS , *COMPUTATIONAL biology , *BIOLOGICAL databases - Abstract
The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this article, we explore a novel approach to compute the string graph, based on the FM-index and Burrows and Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the string graph assembler (SGA) as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
16. On String Graph Limits and the Structure of a Typical String Graph.
- Author
-
Janson, Svante and Uzzell, Andrew J.
- Subjects
- *
GRAPH theory , *MATHEMATICS , *GRAPH algorithms , *ALGEBRA , *GRAPHIC methods - Abstract
We study limits of convergent sequences of string graphs, that is graphs with an intersection representation consisting of curves in the plane. We use these results to study the limiting behavior of a sequence of random string graphs. We also prove similar results for several related graph classes. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
17. Almost All String Graphs are Intersection Graphs of Plane Convex Sets
- Author
-
Bruce Reed, Yelena Yuditsky, and János Pach
- Subjects
Plane convex ,050101 languages & linguistics ,05 social sciences ,02 engineering and technology ,Intersection graph ,Graph ,Theoretical Computer Science ,Vertex (geometry) ,Combinatorics ,Corollary ,Computational Theory and Mathematics ,String graph ,0202 electrical engineering, electronic engineering, information engineering ,Discrete Mathematics and Combinatorics ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Geometry and Topology ,Structured program theorem ,Mathematics - Abstract
A string graph is the intersection graph of a family of continuous arcs in the plane. The intersection graph of a family of plane convex sets is a string graph, but not all string graphs can be obtained in this way. We prove the following structure theorem conjectured by Janson and Uzzell: The vertex set of almost all string graphs on n vertices can be partitioned into five cliques such that some pair of them is not connected by any edge ($$n\rightarrow \infty $$). We also show that every graph with the above property is an intersection graph of plane convex sets. As a corollary, we obtain that almost all string graphs on n vertices are intersection graphs of plane convex sets.
- Published
- 2020
- Full Text
- View/download PDF
18. A history of DNA sequence assembly.
- Author
-
Myers Jr, Eugene W.
- Subjects
NUCLEOTIDE sequencing ,SHOTGUN sequencing ,DE Bruijn graph ,HISTORY - Abstract
DNA sequence assembly is a rich combinatorial problem that arose with the first DNA sequencing projects in the early 80's. Here we give a short history of the progression of algorithmic ideas used to solve the de novo problem of inferring a genome given a large sampling of substrings covering it. This classic inverse problem is compounded by a variety of experimental features and artifacts that must be considered in any realistic solution. While current methods produce very good results, the perfect assembler has yet to be built. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
19. Using Apache Spark on genome assembly for scalable overlap-graph reduction
- Author
-
Myoungkyu Song, Chongle Pan, Seung-Hwan Lim, Dylan Lawrence, Tae-Hyuk Ahn, and Alexander J. Paul
- Subjects
0106 biological sciences ,Workstation ,lcsh:QH426-470 ,Computer science ,Sequence assembly ,lcsh:Medicine ,Parallel computing ,Graph reduction ,01 natural sciences ,Apache spark ,law.invention ,Reduction (complexity) ,03 medical and health sciences ,Overlap-layout-consensus ,law ,Drug Discovery ,Spark (mathematics) ,String graph ,Databases, Genetic ,Genetics ,Humans ,Cloud computing ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Genome ,Genome assembly ,Base Sequence ,Genome, Human ,Research ,lcsh:R ,Sequence Analysis, DNA ,lcsh:Genetics ,Scalability ,Molecular Medicine ,Conyza ,Algorithms ,Genome, Plant ,010606 plant biology & botany ,Reference genome - Abstract
Background De novo genome assembly is a technique that builds the genome of a specimen using overlaps of genomic fragments without additional work with reference sequence. Sequence fragments (called reads) are assembled as contigs and scaffolds by the overlaps. The quality of the de novo assembly depends on the length and continuity of the assembly. To enable faster and more accurate assembly of species, existing sequencing techniques have been proposed, for example, high-throughput next-generation sequencing and long-reads-producing third-generation sequencing. However, these techniques require a large amounts of computer memory when very huge-size overlap graphs are resolved. Also, it is challenging for parallel computation. Results To address the limitations, we propose an innovative algorithmic approach, called Scalable Overlap-graph Reduction Algorithms (SORA). SORA is an algorithm package that performs string graph reduction algorithms by Apache Spark. The SORA’s implementations are designed to execute de novo genome assembly on either a single machine or a distributed computing platform. SORA efficiently compacts the number of edges on enormous graphing paths by adapting scalable features of graph processing libraries provided by Apache Spark, GraphX and GraphFrames. Conclusions We shared the algorithms and the experimental results at our project website, https://github.com/BioHPC/SORA. We evaluated SORA with the human genome samples. First, it processed a nearly one billion edge graph on a distributed cloud cluster. Second, it processed mid-to-small size graphs on a single workstation within a short time frame. Overall, SORA achieved the linear-scaling simulations for the increased computing instances.
- Published
- 2019
- Full Text
- View/download PDF
20. On dominating set of some subclasses of string graphs.
- Author
-
Chakraborty, Dibyayan, Das, Sandip, and Mukherjee, Joydeep
- Subjects
- *
DOMINATING set , *PLANE curves , *INTERSECTION graph theory , *APPROXIMATION algorithms , *RECTANGLES , *ALGORITHMS - Abstract
We provide constant factor approximation algorithms for the Minimum Dominating Set (MDS) problem on several subclasses of string graphs i.e. intersection graphs of simple curves on the plane. For k ≥ 0 , unit B k -VPG graphs are intersection graphs of simple rectilinear curves having at most k cusps (bends) and each segment of the curve being unit length. We give an 18-approximation algorithm for the MDS problem on unit B 0 -VPG graphs. This partially addresses a question of Katz et al. (2005) [24]. We also give an O (k 4) -approximation algorithm for the MDS problem on unit B k -VPG graphs. We show that there is an 8-approximation algorithm for the MDS problem on vertically-stabbed L -graphs. We also give a 656-approximation algorithm for the MDS problem on stabbed rectangle overlap graphs. This is the first constant-factor approximation algorithm for the MDS problem on stabbed rectangle overlap graphs and extends a result of Bandyapadhyay et al. (2019) [31]. We prove some hardness results to complement the above results. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. DRAGoM: Classification and Quantification of Noncoding RNA in Metagenomic Data
- Author
-
Ben Liu, Sirisha Thippabhotla, Jun Zhang, and Cuncong Zhong
- Subjects
0301 basic medicine ,Computer science ,Sequence assembly ,Computational biology ,QH426-470 ,De Bruijn graph ,noncoding RNA ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,String graph ,Genetics ,homology search ,Genetics (clinical) ,Original Research ,metagenomics ,RNA ,Robustness (evolution) ,Non-coding RNA ,covariance model ,030104 developmental biology ,Metagenomics ,symbols ,genome assembly ,Molecular Medicine ,Graph (abstract data type) ,030217 neurology & neurosurgery - Abstract
Noncoding RNAs (ncRNAs) play important regulatory and functional roles in microorganisms, such as regulation of gene expression, signaling, protein synthesis, and RNA processing. Hence, their classification and quantification are central tasks toward the understanding of the function of the microbial community. However, the majority of the current metagenomic sequencing technologies generate short reads, which may contain only a partial secondary structure that complicates ncRNA homology detection. Meanwhile, de novo assembly of the metagenomic sequencing data remains challenging for complex communities. To tackle these challenges, we developed a novel algorithm called DRAGoM (Detection of RNA using Assembly Graph from Metagenomic data). DRAGoM first constructs a hybrid graph by merging an assembly string graph and an assembly de Bruijn graph. Then, it classifies paths in the hybrid graph and their constituent readsinto differentncRNA families based on both sequence and structural homology. Our benchmark experiments show that DRAGoMcan improve the performance and robustness over traditional approaches on the classification and quantification of a wide class of ncRNA families.
- Published
- 2021
22. LJA: Assembling Long and Accurate Reads Using Multiplex de Bruijn Graphs
- Author
-
Pavel A. Pevzner, Anton Bankevich, Andrey Bzikadze, Dmitry Antipov, and Mikhail Kolmogorov
- Subjects
De Bruijn sequence ,symbols.namesake ,Theoretical computer science ,Fully automated ,Computer science ,String graph ,symbols ,Multiplex ,Human genome ,Construct (python library) ,Genome ,De Bruijn graph - Abstract
Although most existing genome assemblers are based on the de Bruijn graphs, it remains unclear how to construct these graphs for large genomes and largek-mer sizes. This algorithmic challenge has become particularly important with the emergence of long high-fidelity (HiFi) reads that were recently utilized to generate a semi-manual telomere-to-telomere assembly of the human genome and to get a glimpse into biomedically important regions that evaded all previous attempts to sequence them. To enable automated assemblies of long and accurate reads, we developed a fast LJA algorithm that reduces the error rate in these reads by three orders of magnitude (making them nearly error-free) and constructs the de Bruijn graph for large genomes and largek-mer sizes. Since the de Bruijn graph constructed for a fixedk-mer size is typically either too tangled or too fragmented, LJA uses a new concept of a multiplex de Bruijn graph with varyingk-mer sizes. We demonstrate that LJA improves on the state-of-the-art assemblers with respect to both accuracy and contiguity and enables automated telomere-to-telomere assemblies of entire human chromosomes.
- Published
- 2020
- Full Text
- View/download PDF
23. Clover: a clustering-oriented de novo assembler for Illumina sequences
- Author
-
Chuan Yi Tang, Chin Lung Lu, and Ming-Feng Hsieh
- Subjects
Time Factors ,Computer science ,Sequence assembly ,Genomics ,Bacterial genome size ,Computational biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,DNA sequencing ,De Bruijn graph ,symbols.namesake ,Structural Biology ,String graph ,Cluster Analysis ,Humans ,lcsh:QH301-705.5 ,Molecular Biology ,Chromosomes, Human, Pair 14 ,De Bruijn sequence ,Base Sequence ,Applied Mathematics ,High-Throughput Nucleotide Sequencing ,Chromosome ,Computer Science Applications ,lcsh:Biology (General) ,De bruijn graph ,symbols ,lcsh:R858-859.7 ,DNA microarray ,Software ,Algorithms ,Genome, Bacterial ,De novo genome assembly - Abstract
Background Next-generation sequencing technologies revolutionized genomics by producing high-throughput reads at low cost, and this progress has prompted the recent development of de novo assemblers. Multiple assembly methods based on de Bruijn graph have been shown to be efficient for Illumina reads. However, the sequencing errors generated by the sequencer complicate analysis of de novo assembly and influence the quality of downstream genomic researches. Results In this paper, we develop a de Bruijn assembler, called Clover (clustering-oriented de novo assembler), that utilizes a novel k-mer clustering approach from the overlap-layout-consensus concept to deal with the sequencing errors generated by the Illumina platform. We further evaluate Clover’s performance against several de Bruijn graph assemblers (ABySS, SOAPdenovo, SPAdes and Velvet), overlap-layout-consensus assemblers (Bambus2, CABOG and MSR-CA) and string graph assembler (SGA) on three datasets (Staphylococcus aureus, Rhodobacter sphaeroides and human chromosome 14). The results show that Clover achieves a superior assembly quality in terms of corrected N50 and E-size while remaining a significantly competitive in run time except SOAPdenovo. In addition, Clover was involved in the sequencing projects of bacterial genomes Acinetobacter baumannii TYTH-1 and Morganella morganii KT. Conclusions The marvel clustering-based approach of Clover that integrates the flexibility of the overlap-layout-consensus approach and the efficiency of the de Bruijn graph method has high potential on de novo assembly. Now, Clover is freely available as open source software from https://oz.nthu.edu.tw/~d9562563/src.html.
- Published
- 2020
- Full Text
- View/download PDF
24. Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly
- Author
-
Aydin Buluc, Katherine Yelick, Oguz Selvitopi, Leonid Oliker, Marquita Ellis, and Giulia Guidi
- Subjects
Genomics (q-bio.GN) ,FOS: Computer and information sciences ,Theoretical computer science ,cs.DC ,Computer science ,Pipeline (computing) ,Human Genome ,Parallel algorithm ,Sequence assembly ,Transitive reduction ,Genome ,Computer Science - Distributed, Parallel, and Cluster Computing ,FOS: Biological sciences ,String graph ,q-bio.GN ,Genetics ,Graph (abstract data type) ,Distributed memory ,Quantitative Biology - Genomics ,Distributed, Parallel, and Cluster Computing (cs.DC) - Abstract
One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies overlapping sequences, simplifies their layout, and creates consensus. Despite many algorithms developed in the literature, the efficient assembly of large genomes is still an open problem. In this work, we introduce new distributed-memory parallel algorithms for overlap detection and layout simplification steps of de novo genome assembly, and implement them in the diBELLA 2D pipeline. Our distributed memory algorithms for both overlap detection and layout simplification are based on linear-algebra operations over semirings using 2D distributed sparse matrices. Our layout step consists of performing a transitive reduction from the overlap graph to a string graph. We provide a detailed communication analysis of the main stages of our new algorithms. diBELLA 2D achieves near linear scaling with over 80% parallel efficiency for the human genome, reducing the runtime for overlap detection by 1.2 – $1.3 \times$ for the human genome and 1.5 – $1.9 \times$ for C.elegans compared to the state-of-the-art. Our transitive reduction algorithm outperforms an existing distributed-memory implementation by 10.5 – $13.3 \times$ for the human genome and 18– $29 \times$ for the C. elegans. Our work paves the way for efficient de novo assembly of large genomes using long reads in distributed memory.
- Published
- 2020
25. Second- and High-Order Graph Matching for Correspondence Problems
- Author
-
Zhang Ruonan and Wenmin Wang
- Subjects
Factor-critical graph ,Hypergraph ,Optimization problem ,Matching (graph theory) ,Computer science ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Distance-regular graph ,Simplex graph ,law.invention ,Coxeter graph ,Graph power ,law ,String graph ,3-dimensional matching ,Line graph ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Folded cube graph ,Electrical and Electronic Engineering ,Graph property ,Complement graph ,0105 earth and related environmental sciences ,Voltage graph ,Quartic graph ,Butterfly graph ,Graph ,Petersen graph ,Bipartite graph ,Graph (abstract data type) ,Cubic graph ,020201 artificial intelligence & image processing ,Null graph ,Graph factorization ,Algorithm - Abstract
Correspondence problems are challenging due to the complexity of real-world scenes. One way to solve this problem is to improve the graph matching (GM) process, which is flexible for matching non-rigid objects. GM can be classified into three categories that correspond with the variety of object functions: first-order, second-order, and high-order matching. Graph and hypergraph matching have been proposed separately in previous works. The former is equivalent to the second-order GM, and the latter is equivalent to high-order GM, but we use the terms second- and high-order GM to unify the terminology in this paper. Second- and high-order GM fit well with different types of problems; the key goal for these processes is to find better-optimized algorithms. Because the optimal problems for second- and high-order GM are different, we propose two novel optimized algorithms for them in this paper. (1) For the second-order GM, we first introduce a $K$ -nearest-neighbor-pooling matching method that integrates feature pooling into GM and reduces the complexity. Meanwhile, we evaluate each matching candidate using discriminative weights on its $k$ -nearest neighbors by taking locality as well as sparsity into consideration. (2) High-order GM introduces numerous outliers, because precision is rarely considered in related methods. Therefore, we propose a sub-pattern structure to construct a robust high-order GM method that better integrates geometric information. To narrow the search space and solve the optimization problem, a new prior strategy and a cell-algorithm-based Markov Chain Monte Carlo framework are proposed. In addition, experiments demonstrate the robustness and improvements of these algorithms with respect to matching accuracy compared with other state-of-the-art algorithms.
- Published
- 2018
- Full Text
- View/download PDF
26. String graphs and incomparability graphs
- Author
-
Fox, Jacob and Pach, János
- Subjects
- *
GRAPH theory , *ALGEBRAIC curves , *PLANE geometry , *COMBINATORIAL set theory , *STOCHASTIC convergence , *PATHS & cycles in graph theory , *INTERSECTION theory - Abstract
Abstract: Given a collection of curves in the plane, its string graph is defined as the graph with vertex set , in which two curves in are adjacent if and only if they intersect. Given a partially ordered set , its incomparability graph is the graph with vertex set , in which two elements of are adjacent if and only if they are incomparable. It is known that every incomparability graph is a string graph. For “dense” string graphs, we establish a partial converse of this statement. We prove that for every there exists with the property that if is a collection of curves whose string graph has at least edges, then one can select a subcurve of each such that the string graph of the collection has at least edges and is an incomparability graph. We also discuss applications of this result to extremal problems for string graphs and edge intersection patterns in topological graphs. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
27. Edges and vertices in a unique signed circle in a signed graph
- Author
-
Richard Behr
- Subjects
positive circle ,0102 computer and information sciences ,01 natural sciences ,Edge cover ,law.invention ,Combinatorics ,law ,String graph ,FOS: Mathematics ,Discrete Mathematics and Combinatorics ,Edge contraction ,Mathematics - Combinatorics ,0101 mathematics ,bridge ,Signed graph ,05C22 ,Mathematics ,Discrete mathematics ,negative circle ,lcsh:Mathematics ,010102 general mathematics ,Neighbourhood (graph theory) ,Complete graph ,balance ,lcsh:QA1-939 ,signed graph ,010201 computation theory & mathematics ,Combinatorics (math.CO) ,Circle graph ,MathematicsofComputing_DISCRETEMATHEMATICS ,Circle packing theorem - Abstract
We examine the conditions under which a signed graph contains an edge or a vertex that is contained in a unique negative circle or a unique positive circle. For an edge in a unique signed circle, the positive and negative case require the same structure on the underlying graph, but the requirements on the signature are different. We characterize the structure of the underlying graph necessary to support such an edge in terms of bridges of a circle. We then use the results from the edge version of the problem to help solve the vertex version., Comment: 12 pages, 4 figures
- Published
- 2017
28. And/or-convexity: a graph convexity based on processes and deadlock models
- Author
-
Dieter Rautenbach, Uéverton S. Souza, Carlos F. R. A. C. Lima, Jayme Luiz Szwarcfiter, and Fábio Protti
- Subjects
Discrete mathematics ,Vertex (graph theory) ,General Decision Sciences ,0102 computer and information sciences ,02 engineering and technology ,Management Science and Operations Research ,Deadlock ,01 natural sciences ,Convexity ,Blocking (computing) ,Combinatorics ,010201 computation theory & mathematics ,020204 information systems ,String graph ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Deadlock prevention algorithms ,Complement graph ,MathematicsofComputing_DISCRETEMATHEMATICS ,Mathematics - Abstract
Deadlock prevention techniques are essential in the design of robust distributed systems. However, despite the large number of different algorithmic approaches to detect and solve deadlock situations, yet there remains quite a wide field to be explored in the study of deadlock-related combinatorial properties. In this work we consider a simplified AND-OR model, where the processes and their communication are given as a graph G. Each vertex of G is labelled AND or OR, in such a way that an AND-vertex (resp., OR-vertex) depends on the computation of all (resp., at least one) of its neighbors. We define a graph convexity based on this model, such that a set $$S \subseteq V(G)$$ is convex if and only if every AND-vertex (resp., OR-vertex) $$v \in V(G){\setminus }S$$ has at least one (resp., all) of its neighbors in $$V(G) {\setminus } S$$ . We relate some classical convexity parameters to blocking sets that cause deadlock. In particular, we show that those parameters in a graph represent the sizes of minimum or maximum blocking sets, and also the computation time until system stability is reached. Finally, a study on the complexity of combinatorial problems related to such graph convexity is provided.
- Published
- 2017
- Full Text
- View/download PDF
29. Shrink: Distance preserving graph compression
- Author
-
Yongli Ren, Masoomeh Zameni, Flora D. Salim, Timos Sellis, Jeffrey Chan, and Amin Sadri
- Subjects
Theoretical computer science ,Computer science ,Ordered graph ,02 engineering and technology ,Strength of a graph ,computer.software_genre ,Simplex graph ,Graph power ,020204 information systems ,String graph ,0202 electrical engineering, electronic engineering, information engineering ,Lattice graph ,Random geometric graph ,Complement graph ,Distance-hereditary graph ,Block graph ,Graph database ,Voltage graph ,Butterfly graph ,Graph ,Graph bandwidth ,Hardware and Architecture ,Shortest path problem ,020201 artificial intelligence & image processing ,Null graph ,computer ,Software ,MathematicsofComputing_DISCRETEMATHEMATICS ,Information Systems - Abstract
The ever increasing size of graphs makes them difficult to query and store. In this paper, we present Shrink, a compression method that reduces the size of the graph while preserving the distances between the nodes. The compression is based on the iterative merging of the nodes. During each merging, a system of linear equations is solved to define new edge weights in a way that the new weights have the least effect on the distances. Merging nodes continues until the desired size for the compressed graph is reached. The compressed graph, also known as the coarse graph, can be queried without decompression. As the complexity of distance-based queries such as shortest path queries is highly dependent on the size of the graph, Shrink improves the performance in terms of time and storage. Shrink not only provides the length of the shortest path but also identifies the nodes on the path. The approach has been applied to both weighted and unweighted graphs including road network, friendship network, collaboration network, web graph and social network. In the experiment, a road network with more than 2.5 million nodes is reduced to fifth while the average relative error is less than 1%.
- Published
- 2017
- Full Text
- View/download PDF
30. Non-rigid 3D object retrieval using directional graph representation of wave kernel signature
- Author
-
Hossein Ebrahimnezhad and Mahsa Mirloo
- Subjects
Computer Networks and Communications ,Computer science ,02 engineering and technology ,Strength of a graph ,Graph power ,String graph ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Graph property ,Complement graph ,Distance-hereditary graph ,business.industry ,Voltage graph ,Quartic graph ,020207 software engineering ,Pattern recognition ,Directed graph ,Butterfly graph ,Graph ,Geometric graph theory ,Vertex (geometry) ,Graph bandwidth ,Hardware and Architecture ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Null graph ,Assignment problem ,Algorithm ,Software - Abstract
Extensive use of three dimensional models in various areas indicates the importance of 3D data retrieval accuracy. In this paper, a directional graph, is introduced for 3D model retrieval. The proposed directional graph is isometric invariant and extracts the features of 3D model vertices in a way that both provides a sample of the features of each salient part and illustrates how various parts are linked and connected with each other. Directionality of the graph does not refer to a certain physical direction, but it shows the arrangement of different points in the graph. The proposed method commences with determining the salient points and continues with constructing a directional graph. Each branch of the graph starts from one salient point in the 3D model and ends in another salient point of that model. The points between the beginning and end of each branch are the vertices of 3D model from which the shortest geodesic path between these two salient points crosses. The whole graph of each model is constructed out of the accumulation of multiple branches, each of which is associated with a pair of salient points. After constructing of the directional graph, WKS is calculated in the path of graph points as the geometric feature of the model. In fact, the points of the directional graph do not provide any information on the object features and they only specify the location where the features should be calculated. The features are calculated and placed in the graph points so, the final graph is built. After completing the feature extraction process, the difference between various models is estimated using Munkres Assignment problem. The experimental results indicate the effectiveness of the directional graph in object description for non-rigid 3D model retrieval. Comparing the proposed method with other approaches by computing the evaluation parameters as well as investigating the computational complexity substantiates the superior performance of it.
- Published
- 2017
- Full Text
- View/download PDF
31. Some results on the intersection graph of submodules of a module
- Author
-
Somayeh Khalashi Ghezelahmad, Saieed Akbari, and Hamid Agha Tavallaee
- Subjects
Combinatorics ,010201 computation theory & mathematics ,General Mathematics ,010102 general mathematics ,String graph ,Voltage graph ,0102 computer and information sciences ,0101 mathematics ,Indecomposable module ,Intersection graph ,01 natural sciences ,Clique number ,Mathematics - Abstract
Let R be a ring with identity and M be a unitary left R-module. The intersection graph of submodules of M, denoted by G(M), is defined to be a graph whose vertices are in one to one correspondence with all non-trivial submodules of M and two distinct vertices are adjacent if and only if the corresponding submodules of M have non-zero intersection. In this paper, we consider the intersection graph of submodules of a module. We determine the structure of modules whose clique numbers are finite. We show that if 1 < ω(G(M)) < ∞, then M is a direct sum of a finite module and a cyclic module, where ω(G(M)) denotes the clique number of G(M). We prove that if ω(G(M)) is not finite, then M contains an infinite clique. Among other results, it is shown that a Noetherian R-module whose intersection of all non-trivial submodules is non-zero, is Artinian.
- Published
- 2017
- Full Text
- View/download PDF
32. Price Based Linear Quadratic Control Under Transportation Delay
- Author
-
Richard Pates, Martin Heyden, and Anders Rantzer
- Subjects
0209 industrial biotechnology ,Mathematical optimization ,Computer science ,020208 electrical & electronic engineering ,Control (management) ,02 engineering and technology ,Transportation theory ,Control Engineering ,Flow network ,Set (abstract data type) ,020901 industrial engineering & automation ,Perspective (geometry) ,Control and Systems Engineering ,Simple (abstract algebra) ,String graph ,0202 electrical engineering, electronic engineering, information engineering ,Node (circuits) - Abstract
We study a simple transportation problem on a string graph. The objective is to regulate the node levels of some decaying quantity to optimize dynamical performance. This can be achieved by controlling the flows, which are subject to delay, between neighbouring nodes. The problem is considered from two perspectives. In the first (the social perspective), all nodes cooperate to find the flows that maximize the aggregated utility of the entire transportation network. In the second (the user perspective), the nodes instead try to maximimize their own utility. Our main contribution is to give an implementation of the feedback law that gives the social optimum, that only depends on the local states and a set of prices defined by a distributed update rule. These prices align the social and user optimum in a budget neutral way, and give all nodes no worse cost than if they were on their own.
- Published
- 2020
33. SOF: An Efficient String Graph Construction Algorithm
- Author
-
Shibu Yooseph and S. M. Iqbal Morshed
- Subjects
De Bruijn sequence ,Transitive relation ,Computer science ,business.industry ,String graph ,Memory footprint ,Preprocessor ,Sequence assembly ,Usability ,business ,Algorithm ,Time complexity - Abstract
In contrast to genome assemblers that use de Bruijn graphs, those based on string graphs are able to losslessly retain information from sequence data. However, despite the advantages provided by a string graph framework in repeat detection and in maintaining read coherence, the high computational cost for constructing a string graph hinders its usability for genome assembly. Even though different algorithms have been proposed over the last decade for string graph construction, efficiency is still a challenge due to the demand for processing a large amount of sequence data generated by Next-Generation Sequencing technologies. In this paper, we provide a novel, linear time and alphabet-size-independent algorithm SOF which uses the property of irreducible edges and transitive edges to efficiently construct a string graph from an overlap graph. Experimental results show that SOF is at least 2.3 times faster than the string graph construction algorithm provided in SGA (one of the most popular string graph-based assemblers), while maintaining almost the same memory footprint as SGA. Moreover, the implementation of SOF as a subprogram in the SGA assembly pipeline will allow a user easy access to the preprocessing and postprocessing steps for genome assembly provided in SGA. Implementation: https://github.com/iqbalmorshed/sof
- Published
- 2019
- Full Text
- View/download PDF
34. Accelerating Sequence Alignment to Graphs
- Author
-
Chirag Jain, Haowen Zhang, Alexander T. Dilthey, Srinivas Aluru, and Sanchit Misra
- Subjects
0303 health sciences ,Xeon ,Computer science ,0206 medical engineering ,Locality ,Parallel algorithm ,02 engineering and technology ,Graph ,Vertex (geometry) ,Dynamic programming ,03 medical and health sciences ,High memory ,0302 clinical medicine ,String graph ,SIMD ,Time complexity ,Algorithm ,020602 bioinformatics ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Aligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to graph alignment problem seeks to find the best matching path in the graph for an input query sequence. Solving this problem exactly using a sequential dynamic programming algorithm takes quadratic time in terms of the graph size and query length, making it difficult to scale to high throughput DNA sequencing data. In this work, we propose the first parallel algorithm for computing sequence to graph alignments that leverages multiple cores and single-instruction multiple-data (SIMD) operations. We take advantage of the available inter-task parallelism, and provide a novel blocked approach to compute the score matrix while ensuring high memory locality. Using a 48-core Intel Xeon Skylake processor, the proposed algorithm achieves peak performance of 317 billion cell updates per second (GCUPS), and demonstrates near linear weak and strong scaling on up to 48 cores. It delivers significant performance gains compared to existing algorithms, and results in run-time reduction from multiple days to three hours for the problem of optimally aligning high coverage long (PacBio/ONT) or short (Illumina) DNA reads to an MHC human variation graph containing 10 million vertices.AvailabilityThe implementation of our algorithm is available at https://github.com/ParBLiSS/PaSGAL. Data sets used for evaluation are accessible using https://alurulab.cc.gatech.edu/PaSGAL.
- Published
- 2019
- Full Text
- View/download PDF
35. RMI-DBG algorithm: A more agile iterative de Bruijn graph algorithm in short read genome assembly
- Author
-
Zeinab Zare Hosseini, Ahmad Baraani, Esmaeil Forouzan, and Shekoufeh Kolahdouz Rahimi
- Subjects
0303 health sciences ,Genome ,business.industry ,Computer science ,High-Throughput Nucleotide Sequencing ,Sequence assembly ,Genomics ,Sequence Analysis, DNA ,Short read ,Biochemistry ,De Bruijn graph ,Computer Science Applications ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,String graph ,symbols ,business ,Molecular Biology ,Algorithm ,Algorithms ,030217 neurology & neurosurgery ,030304 developmental biology ,Agile software development - Abstract
The de Bruijn Graph algorithm (DBG) as one of the cornerstones algorithms in short read assembly has extended with the rapid advancement of the Next Generation Sequencing (NGS) technologies and low-cost production of millions of high-quality short reads. Erroneous reads, non-uniform coverage, and genomic repeats are three major problems that influence the performance of short read assemblers. To encounter these problems, the iterative DBG algorithm applies multiple [Formula: see text]-mers instead of a single [Formula: see text]-mer, by iterating the DBG graph over a range of [Formula: see text]-mer sizes from the minimum to the maximum. However, the iteration paradigm of iterative DBG deals with complex graphs from the beginning of the algorithm and therefore, causes more potential errors and computational time for resolving various unreal branches. In this research, we propose the Reverse Modified Iterative DBG graph (named RMI-DBG) for short read assembly. RMI-DBG utilizes the DBG algorithm and String graph to achieve the advantages of both algorithms. We present that RMI-DBG performs faster with comparable results in comparison to iterative DBG. Additionally, the quality of the proposed algorithm in terms of continuity and accuracy is evaluated with some commonly-used assemblers via several real datasets of the GAGE-B benchmark.
- Published
- 2021
- Full Text
- View/download PDF
36. Graph surfaces on five-dimensional sub-Lorentzian structures
- Author
-
M. B. Karmanova
- Subjects
0209 industrial biotechnology ,Pure mathematics ,General Mathematics ,Wagner graph ,010102 general mathematics ,Mathematical analysis ,Quartic graph ,02 engineering and technology ,01 natural sciences ,Distance-regular graph ,Geometric graph theory ,Coxeter graph ,020901 industrial engineering & automation ,String graph ,Integral graph ,Graph (abstract data type) ,0101 mathematics ,Mathematics - Abstract
Studying the space-like graph surfaces of codimension 2 on the five-dimensional sub-Lorentzian structures with two negative directions of distinct degrees, we determine the differential properties of graph mappings and prove the area formula for the corresponding image surfaces.
- Published
- 2017
- Full Text
- View/download PDF
37. On the Chromatic Number of Disjointness Graphs of Curves
- Author
-
János Pach and István Tomon, Pach, János, Tomon, István, János Pach and István Tomon, Pach, János, and Tomon, István
- Abstract
Let omega(G) and chi(G) denote the clique number and chromatic number of a graph G, respectively. The disjointness graph of a family of curves (continuous arcs in the plane) is the graph whose vertices correspond to the curves and in which two vertices are joined by an edge if and only if the corresponding curves are disjoint. A curve is called x-monotone if every vertical line intersects it in at most one point. An x-monotone curve is grounded if its left endpoint lies on the y-axis. We prove that if G is the disjointness graph of a family of grounded x-monotone curves such that omega(G)=k, then chi(G)<= binom{k+1}{2}. If we only require that every curve is x-monotone and intersects the y-axis, then we have chi(G)<= k+1/2 binom{k+2}{3}. Both of these bounds are best possible. The construction showing the tightness of the last result settles a 25 years old problem: it yields that there exist K_k-free disjointness graphs of x-monotone curves such that any proper coloring of them uses at least Omega(k^{4}) colors. This matches the upper bound up to a constant factor.
- Published
- 2019
- Full Text
- View/download PDF
38. Information-optimal genome assembly via sparse read-overlap graphs
- Author
-
Ilan Shomorony, Thomas A. Courtade, David Tse, and Samuel H. Kim
- Subjects
0301 basic medicine ,Statistics and Probability ,Theoretical computer science ,Computational complexity theory ,Computer science ,Sequence assembly ,Biochemistry ,03 medical and health sciences ,symbols.namesake ,String graph ,Molecular Biology ,Time complexity ,Genome ,Computational Biology ,Eulerian path ,Sequence Analysis, DNA ,Models, Theoretical ,Hamiltonian path ,Graph ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,symbols ,Metagenomics ,Heuristics ,Algorithms ,Genome, Bacterial - Abstract
Motivation In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits. In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence? Results Based on insights from this information feasibility question, we present an algorithm—the Not-So-Greedy algorithm—to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50. Availability Available at github.com/samhykim/nsg Contact courtade@eecs.berkeley.edu or dntse@stanford.edu Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2016
- Full Text
- View/download PDF
39. A history of DNA sequence assembly
- Author
-
Eugene W. Myers
- Subjects
0301 basic medicine ,General Computer Science ,Shotgun sequencing ,Computer science ,Nucleic acid sequence ,Computational biology ,De Bruijn graph ,DNA sequencing ,03 medical and health sciences ,symbols.namesake ,030104 developmental biology ,Shortest common superstring ,String graph ,symbols - Abstract
DNA sequence assembly is a rich combinatorial problem that arose with the first DNA sequencing projects in the early 80's. Here we give a short history of the progression of algorithmic ideas used to solve the de novo problem of inferring a genome given a large sampling of substrings covering it. This classic inverse problem is compounded by a variety of experimental features and artifacts that must be considered in any realistic solution. While current methods produce very good results, the perfect assembler has yet to be built.
- Published
- 2016
- Full Text
- View/download PDF
40. An External-Memory Algorithm for String Graph Construction
- Author
-
Marco Previtali, Raffaella Rizzi, Gianluca Della Vedova, Paola Bonizzoni, Yuri Pirola, Bonizzoni, P, Della Vedova, G, Pirola, Y, Previtali, M, and Rizzi, R
- Subjects
FOS: Computer and information sciences ,0301 basic medicine ,General Computer Science ,Burrows–Wheeler transform ,Computer science ,Open problem ,0102 computer and information sciences ,01 natural sciences ,Set (abstract data type) ,03 medical and health sciences ,Computer Science - Data Structures and Algorithms ,String graph ,Quantitative Biology - Genomics ,Data Structures and Algorithms (cs.DS) ,Auxiliary memory ,Genomics (q-bio.GN) ,Applied Mathematics ,INF/01 - INFORMATICA ,Data structure ,Quantitative Biology::Genomics ,Computer Science Applications ,030104 developmental biology ,010201 computation theory & mathematics ,FOS: Biological sciences ,Theory of computation ,Out-of-core algorithm ,External memory algorithms, Burrows–Wheeler transform, String graphs, Genome assembly ,Algorithm - Abstract
Some recent results (Bauer et al. in Algorithms in bioinformatics, Springer, Berlin, pp 326---337, 2012; Cox et al. in Algorithms in bioinformatics, Springer, Berlin, pp. 214---224, 2012; Rosone and Sciortino in The nature of computation. Logic, algorithms, applications, Springer, Berlin, pp 353---364, 2013) have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows---Wheeler transform of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome from a large set of much shorter samples extracted from the unknown genome. The approaches that are currently used to tackle this problem are memory-intensive. This fact does not bode well with the ongoing increase in the availability of genomic data. A data structure that is used in genome assembly is the string graph, where vertices correspond to samples and arcs represent two overlapping samples. In this paper we address an open problem of Simpson and Durbin (Bioinformatics 26(12):i367---i373, 2010): to design an external-memory algorithm to compute the string graph.
- Published
- 2016
- Full Text
- View/download PDF
41. Stability of depths of powers of edge ideals
- Author
-
Tran Nam Trung
- Subjects
Connected component ,Discrete mathematics ,Algebra and Number Theory ,010102 general mathematics ,Minimal ideal ,0102 computer and information sciences ,Mathematics - Commutative Algebra ,Commutative Algebra (math.AC) ,01 natural sciences ,Upper and lower bounds ,Graph ,Combinatorics ,010201 computation theory & mathematics ,Graph power ,String graph ,FOS: Mathematics ,Mathematics - Combinatorics ,Bound graph ,Combinatorics (math.CO) ,0101 mathematics ,Graph factorization ,Mathematics - Abstract
Let $G$ be a graph and let $I := I (G)$ be its edge ideal. In this paper, we provide an upper bound of $n$ from which $\depth R/ I(G)^n$ is stationary, and compute this limit explicitly. This bound is always achieved if $G$ has no cycles of length $4$ and every its connected component is either a tree or a unicyclic graph., Comment: To appear in Journal of Algebra
- Published
- 2016
- Full Text
- View/download PDF
42. (3, 1)*-Choosability of graphs of nonnegative characteristic without intersecting short cycles
- Author
-
Haihui Zhang
- Subjects
Vertex (graph theory) ,Discrete mathematics ,General Mathematics ,010102 general mathematics ,0102 computer and information sciences ,01 natural sciences ,Graph ,Combinatorics ,Colored ,010201 computation theory & mathematics ,Graph power ,String graph ,0101 mathematics ,Mathematics - Abstract
A graph G is called (k,d)∗-choosable if for every list assignment L satisfying ∣L(v)∣ ≥k for all v∈V(G), there is an L-coloring of G such that each vertex of G has at most d neighbors colored with the same color as itself. In this paper, it is proved that every graph of nonnegative characteristic without intersecting i-cycles for all i=3,4,5 is (3,1)∗-choosable.
- Published
- 2016
- Full Text
- View/download PDF
43. On String Graph Limits and the Structure of a Typical String Graph
- Author
-
Svante Janson and Andrew J. Uzzell
- Subjects
Random string ,010102 general mathematics ,0102 computer and information sciences ,Limiting ,01 natural sciences ,Graph ,Combinatorics ,010201 computation theory & mathematics ,String graph ,Discrete Mathematics and Combinatorics ,Geometry and Topology ,0101 mathematics ,Hereditary property ,MathematicsofComputing_DISCRETEMATHEMATICS ,Mathematics - Abstract
We study limits of convergent sequences of string graphs, that is graphs with an intersection representation consisting of curves in the plane. We use these results to study the limiting behavior of a sequence of random string graphs. We also prove similar results for several related graph classes.
- Published
- 2016
- Full Text
- View/download PDF
44. Latest Advances in Solving the All-Pairs Suffix Prefix Problem
- Author
-
Maan Haj Rachid
- Subjects
Data set ,Prefix ,Computer science ,String graph ,Suffix ,Space (mathematics) ,Data structure ,Algorithm - Abstract
Finding the overlaps between sequences that are generated by Next Generation Sequencing (NGS) technology is a time- and space-consuming step in building a string graph in genome assembly. The problem is known in computer science as all-pairs suffix prefix (APSP). The problem has been tackled since 1992 and several solutions were presented to solve it. While some of them achieve optimal theoretical time consumption, they have a very high space-consumption in addition to being practically slow due to a raised constant factor. Some other recent solutions practically consume much less space and time to solve APSP despite their adaptations to techniques and data structures which don’t have optimal worst-case asymptotic complexity. Other few researches tackled the approximate version of the overlap problem hoping to avoid error-detecting stages in genome assembly. These solutions used the same data structures which were employed to solve APSP in addition to some advanced techniques in order to address the complexity of approximate matching. In this work, we evaluate these recent algorithms, in terms of time and space, in both exact and approximate formats. Our results show that FastAPSP has the best time-consumption unless the size of the data set is large. The high space demand of such large data sets would favor the usage of SOF and Readjoiner. Our experiments also show that AOF is, in general, faster than FM unless the data set is small and repetitive. In addition, it can handle large data sets that cannot be processed by FM.
- Published
- 2019
- Full Text
- View/download PDF
45. Coloring Hasse Diagrams and Disjointness Graphs of Curves
- Author
-
István Tomon and János Pach
- Subjects
050101 languages & linguistics ,05 social sciences ,Hasse diagram ,02 engineering and technology ,Disjoint sets ,Binary logarithm ,Graph ,Combinatorics ,String graph ,Family of curves ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Mathematics - Abstract
Given a family of curves \(\mathcal {C}\) in the plane, its disjointness graph is the graph whose vertices correspond to the elements of \(\mathcal {C}\), and two vertices are joined by an edge if and only if the corresponding sets are disjoint. We prove that for every positive integer r and n, there exists a family of n curves whose disjointness graph has girth r and chromatic number \(\varOmega (\frac{1}{r}\log n)\). In the process we slightly improve Bollobas’s old result on Hasse diagrams and show that our improved bound is best possible for uniquely generated partial orders.
- Published
- 2019
- Full Text
- View/download PDF
46. Approximating Minimum Dominating Set on String Graphs
- Author
-
Sandip Das, Dibyayan Chakraborty, and Joydeep Mukherjee
- Subjects
Plane (geometry) ,High Energy Physics::Lattice ,0102 computer and information sciences ,02 engineering and technology ,Intersection graph ,01 natural sciences ,Combinatorics ,Intersection ,010201 computation theory & mathematics ,Simple (abstract algebra) ,Dominating set ,String graph ,0202 electrical engineering, electronic engineering, information engineering ,C++ string handling ,020201 artificial intelligence & image processing ,Unit (ring theory) ,Mathematics - Abstract
A string graph is an intersection graph of simple curves on the plane. For \(k\ge 0\), \(B_k\)-VPG graphs are intersection graphs of simple rectilinear curves having at most k cusps (bends). It is well-known that any string graph is a \(B_k\)-VPG graph for some value of k. For \(k\ge 0\), unit \(B_k\)-VPG graphs are intersection graphs of simple rectilinear curves having at most k cusps (bends) and each segment of the curve being unit length. Any string graph is a unit-\(B_k\)-VPG graph for some value of k.
- Published
- 2019
- Full Text
- View/download PDF
47. SORA: Scalable Overlap-graph Reduction Algorithms for Genome Assembly using Apache Spark in the Cloud
- Author
-
Myoungkyu Song, Alexander J. Paul, Seung-Hwan Lim, Dylan Lawrence, Chongle Pan, and Tae-Hyuk Ahn
- Subjects
0301 basic medicine ,business.industry ,Computer science ,Sequence assembly ,Cloud computing ,Turnaround time ,Graph ,03 medical and health sciences ,030104 developmental biology ,String graph ,Scalability ,Graph reduction ,business ,Algorithm ,Computer memory - Abstract
The advent of high-throughput DNA sequencing techniques has permitted very high quality de novo assemblies of genomes, but raise an issue of requiring large amounts of computer memory to resolve the large genome graphs generated by most overlap graph de novo assemblers. To address these limitations, we present a novel algorithmic approach; Scalable Overlap-graph Reduction Algorithms (SORA). SORA adapts string graph reduction algorithms for the genome assembly using a distributed computing platform. To efficiently compute coverage for enormous paths in the graphs, SORA uses Apache Spark which is a cluster-based engine designed on top of Hadoop to handle very large datasets in the cloud. The experimental results show that SORA can process a nearly one billion edge graph in a distributed cloud cluster as well as smaller graphs on a local cluster with a short turnaround time. Moreover, our algorithms scale almost linearly with increasing numbers of virtual instances in the cloud. SORA is freely available for download at https://github.com/BioHPC/SORA/.
- Published
- 2018
- Full Text
- View/download PDF
48. GraphSeq: Accelerating String Graph Construction for De Novo Assembly on Spark
- Author
-
Chung-Tsai Su, Yungui Wang, Ming-Yang Chang, Yingyin Cheng, and Yuehua Li
- Subjects
Lossless compression ,Theoretical computer science ,Computer science ,String graph ,Graph (abstract data type) ,Sequence assembly ,Graph - Abstract
Summary: De novo genome assembly is an important application on both uncharacterized genome assembly and variant identification in a reference-unbiased way. In comparison with de Brujin graph, string graph is a lossless data representation for de novo assembly. However, string graph construction is computational intensive. We propose GraphSeq to accelerate string graph construction by leveraging the distributed computing framework.Availability and Implementation: GraphSeq is implemented with Scala on Spark and freely available at https://www.atgenomix.com/blog/graphseq.Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2018
- Full Text
- View/download PDF
49. Integrating long-range connectivity information into de Bruijn graphs
- Author
-
Gil McVean, Kiran V. Garimella, Isaac Turner, and Zamin Iqbal
- Subjects
Theoretical computer science ,Computer science ,Sequence assembly ,De Bruijn graph ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,Clique-width ,String graph ,Humans ,Complement graph ,030304 developmental biology ,De Bruijn sequence ,0303 health sciences ,Data Visualization ,High-Throughput Nucleotide Sequencing ,Genomics ,Sequence Analysis, DNA ,Data structure ,Original Papers ,Graph ,Klebsiella pneumoniae ,symbols ,Graph (abstract data type) ,Sequence Analysis ,030217 neurology & neurosurgery ,Algorithms ,Software - Abstract
MotivationThe de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input.ResultsWe present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both the de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterise the genomic context of drug-resistance genes.AvailabilityLinked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, available under the MIT license at https://github.com/mcvean/mccortex.Contactturner.isaac@gmail.com.
- Published
- 2018
50. Boundary Domination of Line and Middle Graph of Wheel Graph Families
- Author
-
Puttaswamy Rangaiah, Mohammed Alatif, and S R Nayaka
- Subjects
Domatic number ,Vertex (graph theory) ,Domination analysis ,Computer science ,Symmetric graph ,Distance-regular graph ,Simplex graph ,law.invention ,Combinatorics ,Circulant graph ,Windmill graph ,Graph power ,Dominating set ,law ,String graph ,Line graph ,Wheel graph ,Connectivity ,Degree (graph theory) ,Neighbourhood (graph theory) ,Butterfly graph ,Graph ,Vertex (geometry) ,Cycle graph ,Cubic graph ,Bound graph ,Regular graph ,Graph factorization - Abstract
Let G = (V;E) be a connected graph. A subset S of V (G) is called a boundary dominating set if every vertex of V S is boundary dominated by some vertex of S. The minimum taken over all boundary dominating sets of a graphG is called the boundary domination number ofG and is denoted by b(G). We define the boundary domatic number in graphs. Exact values of of Wheel Graph Families are obtained and some other interesting results are established.
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.