Author: "Sanchit Misra" / Publisher: cold spring harbor laboratory - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Sanchit Misra"' showing total 5 results

Start Over Author "Sanchit Misra" Publisher cold spring harbor laboratory

5 results on '"Sanchit Misra"'

1. Accelerating Identification of Chromatin Accessibility from noisy ATAC-seq Data using Modern CPUs

Author: Menachem Adelman, Narendra Chaudhary, Dhiraj D. Kalamkar, Barukh Ziv, Bharat Kaul, Sanchit Misra, Alexander Heinecke, and Evangelos Georganas
Subjects: Identification (information), Speedup, Computer science, business.industry, Filter (video), Deep learning, ATAC-seq, Artificial intelligence, Parallel computing, business, Training performance, Chromatin, Convolution
Abstract: Identifying accessible chromatin regions is a fundamental problem in epigenomics with ATAC-seq being a commonly used assay. Exponential rise in single cell ATAC-seq experiments has made it critical to accelerate processing of ATAC-seq data. ATAC-seq data can have a low signal-to-noise ratio for various reasons including low coverage or low cell count. To denoise and identify accessible chromatin regions from noisy ATAC-seq data, use of deep learning on 1D data – using large filter sizes, long tensor widths, and/or dilation - has recently been proposed. Here, we present ways to accelerate the end-to-end training performance of these deep learning based methods using CPUs. We evaluate our approach on the recently released AtacWorks toolkit. Compared to an Nvidia DGX-1 box with 8 V100 GPUs, we get up to 2.27× speedup using just 16 CPU sockets. To achieve this, we build an efficient 1D dilated convolution layer and demonstrate reduced precision (BFloat16) training.
Published: 2021
Full Text: View/download PDF

2. Accelerating long-read analysis on modern CPUs

Author: Vasimuddin, Chirag Jain, Sanchit Misra, and Saurabh Kalikar
Subjects: Reduction (complexity), Software, Computer science, business.industry, Chaining, Sequence assembly, Cache, SIMD, Parallel computing, business, Data structure, Reference genome
Abstract: Long read sequencing is now routinely used at scale for genomics and transcriptomics applications. Mapping of long reads or a draft genome assembly to a reference sequence is often one of the most time consuming steps in these applications. Here, we present techniques to accelerate minimap2, a widely used software for mapping. We present multiple optimizations using SIMD parallelization, efficient cache utilization and a learned index data structure to accelerate its three main computational modules, i.e., seeding, chaining and pairwise sequence alignment. These result in reduction of end-to-end mapping time of minimap2 by up to 1.8 × while maintaining identical output.
Published: 2021
Full Text: View/download PDF

3. LISA: Learned Indexes for Sequence Analysis

Author: Vasimuddin, Darryl Ho, Nesime Tatbul, Jialin Ding, Saurabh Kalikar, Tim Kraska, Sanchit Misra, and Heng Li
Subjects: Sequence analysis, Computer science, business.industry, Genomics, Machine learning, computer.software_genre, Genome, DNA sequencing, chemistry.chemical_compound, chemistry, Artificial intelligence, business, Throughput (business), computer, DNA
Abstract: BackgroundNext-generation sequencing (NGS) technologies have enabled affordable sequencing of billions of short DNA fragments at high throughput, paving the way for population-scale genomics. Genomics data analytics at this scale requires overcoming performance bottlenecks, such as searching for short DNA sequences over long reference sequences.ResultsIn this paper, we introduce LISA (Learned Indexes for Sequence Analysis), a novel learning-based approach to DNA sequence search. We focus on accelerating two of the most essential flavors of DNA sequence search—exact search and super-maximal exact match (SMEM) search. LISA builds on and extends FM-index, which is the state-of-the-art technique widely deployed in genomics tools. Experiments with human, animal, and plant genome datasets indicate that LISA achieves up to 2.2 and 10.8 speedups over the state-of-the-art FM-index based implementations for exact search and super-maximal exact match (SMEM) search, respectively.Code availabilityhttps://github.com/IntelLabs/Trans-Omics-Acceleration-Library/tree/master/LISA.
Published: 2020
Full Text: View/download PDF

4. Accelerating Sequence Alignment to Graphs

Author: Chirag Jain, Haowen Zhang, Alexander T. Dilthey, Srinivas Aluru, and Sanchit Misra
Subjects: 0303 health sciences, Xeon, Computer science, 0206 medical engineering, Locality, Parallel algorithm, 02 engineering and technology, Graph, Vertex (geometry), Dynamic programming, 03 medical and health sciences, High memory, 0302 clinical medicine, String graph, SIMD, Time complexity, Algorithm, 020602 bioinformatics, 030217 neurology & neurosurgery, 030304 developmental biology
Abstract: Aligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to graph alignment problem seeks to find the best matching path in the graph for an input query sequence. Solving this problem exactly using a sequential dynamic programming algorithm takes quadratic time in terms of the graph size and query length, making it difficult to scale to high throughput DNA sequencing data. In this work, we propose the first parallel algorithm for computing sequence to graph alignments that leverages multiple cores and single-instruction multiple-data (SIMD) operations. We take advantage of the available inter-task parallelism, and provide a novel blocked approach to compute the score matrix while ensuring high memory locality. Using a 48-core Intel Xeon Skylake processor, the proposed algorithm achieves peak performance of 317 billion cell updates per second (GCUPS), and demonstrates near linear weak and strong scaling on up to 48 cores. It delivers significant performance gains compared to existing algorithms, and results in run-time reduction from multiple days to three hours for the problem of optimally aligning high coverage long (PacBio/ONT) or short (Illumina) DNA reads to an MHC human variation graph containing 10 million vertices.AvailabilityThe implementation of our algorithm is available at https://github.com/ParBLiSS/PaSGAL. Data sets used for evaluation are accessible using https://alurulab.cc.gatech.edu/PaSGAL.
Published: 2019
Full Text: View/download PDF

5. Identification of Significant Computational Building Blocks through Comprehensive Investigation of NGS Secondary Analysis Methods

Author: Vasimuddin, Sanchit Misra, and Srinivas Aluru
Subjects: Smith–Waterman algorithm, Software, Computer architecture, Computer science, business.industry, Sequence assembly, Genomics, business, Genome, DNA sequencing
Abstract: Rapid advancements in next generation sequencing technologies have greatly improved the throughput of sequencing and reduced the cost to under $1000 per genome propelling ambitious projects across the globe that are pursuing sequencing million or more genomes. In addition, the sequencing throughput is increasing and the cost is decreasing at a rate much faster than the Moore9s law. This necessitates equivalent rate of acceleration of NGS secondary analysis that assembles the reads into full genomes and identifies variants between genomes. Conventional improvement in hardware can at best help accelerate this according to the Moore9s law if the corresponding software is able to use the hardware efficiently. This is currently not the case for majority of the dozens of software tools used for NGS secondary analysis. Thus, to keep pace with the rate of advancement of sequencers, we need - 1) hardware that is designed taking into account the computational requirements of NGS secondary analysis and 2) software tools that use the hardware efficiently. In this work, we take the first step towards that goal by identifying the computational requirements of NGS secondary analysis. We surveyed dozens of software tools from all the three major problems in secondary analysis - sequence mapping, de novo assembly, and variant calling - to select seven popular tools and a workflow for an in depth analysis. We performed runtime profiling of the tools using multiple real datasets to find that the majority of the runtime is dominated by just four building blocks - Smith Waterman alignment, FM-index based sequence search, Debruijn graph construction and traversal and pairwise hidden markov model algorithm. Together, these building blocks cover 80.5%-98.2% of the runtime for sequence mapping, 63.9%-99.4% of the runtime for De novo assembly, and 72%-93% of the runtime for variant calling. The beauty of this result is that by just tailoring our software and hardware for these building blocks, we can get a major performance improvement of NGS secondary analysis.
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Sanchit Misra"'

1. Accelerating Identification of Chromatin Accessibility from noisy ATAC-seq Data using Modern CPUs

2. Accelerating long-read analysis on modern CPUs

3. LISA: Learned Indexes for Sequence Analysis

4. Accelerating Sequence Alignment to Graphs

5. Identification of Significant Computational Building Blocks through Comprehensive Investigation of NGS Secondary Analysis Methods

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

5 results on '"Sanchit Misra"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources