21 results on '"Kim, Jeremie S."'
Search Results
2. RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory
- Author
-
Ghiasi, Nika Mansouri, Sadrosadati, Mohammad, Oliveira, Geraldo F., Kanellopoulos, Konstantinos, Ausavarungnirun, Rachata, Luna, Juan Gómez, Manglik, Aditya, Ferreira, João, Kim, Jeremie S., Giannoula, Christina, Vijaykumar, Nandita, Park, Jisung, and Mutlu, Onur
- Subjects
FOS: Computer and information sciences ,Computer Science - Distributed, Parallel, and Cluster Computing ,Hardware Architecture (cs.AR) ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Computer Science - Hardware Architecture - Abstract
Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift from the main memory to the core and cache hierarchy. Hence, there is a need to revisit current core and cache designs that have been conventionally tailored to tackle the memory bottleneck. Our goal is to redesign the core and cache hierarchy, given the fundamentally new trade-offs of M3D, to benefit a wide range of workloads. To this end, we take two steps. First, we perform a design space exploration of the cache and core's key components. We highlight that in M3D systems, (i) removing the shared last-level cache leads to similar or larger performance benefits than increasing its size or reducing its latency; (ii) improving L1 latency has a large impact on improving performance; (iii) wider pipelines are increasingly beneficial; (iv) the performance impact of branch speculation and pipeline frontend increases; (v) the current synchronization schemes limit parallel speedup. Second, we propose an optimized M3D system, RevaMp3D, where (i) using the tight connectivity between logic layers, we efficiently increase pipeline width, reduce L1 latency, and enable fine-grained synchronization; (ii) using the high-bandwidth and energy-efficient main memory, we alleviate the amplified energy and speculation bottlenecks by memoizing the repetitive fetched, decoded, and reordered instructions and turning off the relevant parts of the core pipeline when possible. RevaMp3D provides, on average, 81% speedup, 35% energy reduction, and 12.3% smaller area compared to the baseline M3D system.
- Published
- 2022
3. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies
- Author
-
Kim, Jeremie S., Senol Cali, Damla, Xin, Hongyi, Lee, Donghyuk, Ghose, Saugata, Alser, Mohammed, Hassan, Hasan, Ergin, Oguz, Alkan, Can, and Mutlu, Onur
- Published
- 2018
- Full Text
- View/download PDF
4. ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis
- Author
-
Firtina, Can, Pillai, Kamlesh, Kalsi, Gurpreet S., Suresh, Bharathwaj, Senol Cali, Damla, Kim, Jeremie S., Shahroodi, Taha, Cavlak, Meryem Banu, Lindegger, Joël, Alser, Mohammed, Gómez Luna, Juan, Subramoney, Sreenivas, and Mutlu, Onur
- Subjects
Genomics (q-bio.GN) ,FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,FOS: Biological sciences ,Hardware Architecture (cs.AR) ,Computer and information sciences ,FOS: Biological sciences [Hardware Architecture (cs.AR) ,Machine Learning (cs.LG) ,Quantitative Methods (q-bio.QM) ,FOS] - Abstract
Profile hidden Markov models (pHMMs) are widely used in many bioinformatics applications to accurately identify similarities between biological sequences (e.g., DNA or protein sequences). PHMMs use a commonly-adopted and highly-accurate method, called the Baum-Welch algorithm, to calculate these similarities. However, the Baum-Welch algorithm is computationally expensive, and existing works provide either software- or hardware-only solutions for a fixed pHMM design. When we analyze the state-of-the-art works, we find that there is a pressing need for a flexible, high-performant, and energy-efficient hardware-software co-design to efficiently and effectively solve all the major inefficiencies in the Baum-Welch algorithm for pHMMs. We propose ApHMM, the first flexible acceleration framework that can significantly reduce computational and energy overheads of the Baum-Welch algorithm for pHMMs. ApHMM leverages hardware-software co-design to solve the major inefficiencies in the Baum-Welch algorithm by 1) designing a flexible hardware to support different pHMMs designs, 2) exploiting the predictable data dependency pattern in an on-chip memory with memoization techniques, 3) quickly eliminating negligible computations with a hardware-based filter, and 4) minimizing the redundant computations. We implement our 1) hardware-software optimizations on a specialized hardware and 2) software optimizations for GPUs to provide the first flexible Baum-Welch accelerator for pHMMs. ApHMM provides significant speedups of 15.55x-260.03x, 1.83x-5.34x, and 27.97x compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms the state-of-the-art CPU implementations of three important bioinformatics applications, 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x-59.94x, 1.03x-1.75x, and 1.03x-1.95x, respectively., arXiv
- Published
- 2022
5. COVIDHunter: COVID-19 pandemic wave prediction and mitigation via seasonality-aware modeling
- Author
-
Alser, Mohammed, Kim, Jeremie S., Alserr, Nour Almadhoun, Tell, Stefan W., and Mutlu, Onur
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Physics - Physics and Society ,FOS: Biological sciences ,FOS: Physical sciences ,Applications (stat.AP) ,Physics and Society (physics.soc-ph) ,Quantitative Biology - Quantitative Methods ,Statistics - Applications ,Quantitative Methods (q-bio.QM) ,Machine Learning (cs.LG) - Abstract
Early detection and isolation of COVID-19 patients are essential for successful implementation of mitigation strategies and eventually curbing the disease spread. With a limited number of daily COVID-19 tests performed in every country, simulating the COVID-19 spread along with the potential effect of each mitigation strategy currently remains one of the most effective ways in managing the healthcare system and guiding policy-makers. We introduce COVIDHunter, a flexible and accurate COVID-19 outbreak simulation model that evaluates the current mitigation measures that are applied to a region, predicts COVID-19 statistics (the daily number of cases, hospitalizations, and deaths), and provides suggestions on what strength the upcoming mitigation measure should be. The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by simulating the average number of new infections caused by an infected person considering the effect of external factors, such as environmental conditions (e.g., climate, temperature, humidity), different variants of concern, vaccination rate, and mitigation measures. Using Switzerland as a case study, COVIDHunter estimates that we are experiencing a deadly new wave that will peak on 26 January 2022, which is very similar in numbers to the wave we had in February 2020. The policy-makers have only one choice that is to increase the strength of the currently applied mitigation measures for 30 days. Unlike existing models, the COVIDHunter model accurately monitors and predicts the daily number of cases, hospitalizations, and deaths due to COVID-19. Our model is flexible to configure and simple to modify for modeling different scenarios under different environmental conditions and mitigation measures. We release the source code of the COVIDHunter implementation at https://github.com/CMU-SAFARI/COVIDHunter., arXiv admin note: substantial text overlap with arXiv:2102.03667
- Published
- 2022
6. RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory
- Author
-
Mansouri Ghiasi, Nika, Sadrosadati, Mohammad, Oliveira, Geraldo F., Kanellopoulos, Constantinos, Ausavarungnirun, Rachata, Gómez Luna, Juan, Manglik, Aditya, Ferreira, João, Kim, Jeremie S., Giannoula, Christina, Vijaykumar, Nandita, Park, Jisung, and Mutlu, Onur
- Subjects
FOS: Computer and information sciences ,Computer and information sciences [Hardware Architecture (cs.AR) ,Distributed, Parallel, and Cluster Computing (cs.DC) ,FOS] ,Hardware Architecture (cs.AR) - Abstract
Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift from the main memory to the core and cache hierarchy. Hence, there is a need to revisit current core and cache designs that have been conventionally tailored to tackle the memory bottleneck. Our goal is to redesign the core and cache hierarchy, given the fundamentally new trade-offs of M3D, to benefit a wide range of workloads. To this end, we take two steps. First, we perform a design space exploration of the cache and core's key components. We highlight that in M3D systems, (i) removing the shared last-level cache leads to similar or larger performance benefits than increasing its size or reducing its latency; (ii) improving L1 latency has a large impact on improving performance; (iii) wider pipelines are increasingly beneficial; (iv) the performance impact of branch speculation and pipeline frontend increases; (v) the current synchronization schemes limit parallel speedup. Second, we propose an optimized M3D system, RevaMp3D, where (i) using the tight connectivity between logic layers, we efficiently increase pipeline width, reduce L1 latency, and enable fine-grained synchronization; (ii) using the high-bandwidth and energy-efficient main memory, we alleviate the amplified energy and speculation bottlenecks by memoizing the repetitive fetched, decoded, and reordered instructions and turning off the relevant parts of the core pipeline when possible. RevaMp3D provides, on average, 81% speedup, 35% energy reduction, and 12.3% smaller area compared to the baseline M3D system., arXiv
- Published
- 2022
- Full Text
- View/download PDF
7. BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis.
- Author
-
Firtina, Can, Park, Jisung, Alser, Mohammed, Kim, Jeremie S, Cali, Damla Senol, Shahroodi, Taha, Ghiasi, Nika Mansouri, Singh, Gagandeep, Kanellopoulos, Konstantinos, Alkan, Can, and Mutlu, Onur
- Published
- 2023
- Full Text
- View/download PDF
8. Improving DRAM Performance, Security, and Reliability by Understanding and Exploiting DRAM Timing Parameter Margins
- Author
-
Kim, Jeremie S.
- Subjects
Performance (cs.PF) ,FOS: Computer and information sciences ,Hardware_MEMORYSTRUCTURES ,Computer Science - Cryptography and Security ,Computer Science - Performance ,Hardware Architecture (cs.AR) ,Computer Science - Hardware Architecture ,Cryptography and Security (cs.CR) - Abstract
This dissertation rigorously characterizes many modern commodity DRAM devices and shows that by exploiting DRAM access timing margins within manufacturer-recommended DRAM timing specifications, we can significantly improve system performance, reduce power consumption, and improve device reliability and security. First, we characterize DRAM timing parameter margins and find that certain regions of DRAM can be accessed faster than other regions due to DRAM cell process manufacturing variation. We exploit this by enabling variable access times depending on the DRAM cells being accessed, which not only improves overall system performance, but also decreases power consumption. Second, we find that we can uniquely identify DRAM devices by the locations of failures that result when we access DRAM with timing parameters reduced below specification values. Because we induce these failures with DRAM accesses, we can generate these unique identifiers significantly more quickly than prior work. Third, we propose a random number generator that is based on our observation that timing failures in certain DRAM cells are randomly induced and can thus be repeatedly polled to very quickly generate true random values. Finally, we characterize the RowHammer security vulnerability on a wide range of modern DRAM chips while violating the DRAM refresh requirement in order to directly characterize the underlying DRAM technology without the interference of refresh commands. We demonstrate with our characterization of real chips, that existing RowHammer mitigation mechanisms either are not scalable or suffer from prohibitively large performance overheads in projected future devices and it is critical to research more effective solutions to RowHammer. Overall, our studies build a new understanding of modern DRAM devices to improve computing system performance, reliability and security all at the same time., Awarded the EDAA Outstanding Dissertation Award in 2021
- Published
- 2021
9. Security Analysis of the Silver Bullet Technique for RowHammer Prevention
- Author
-
Giray Yağlıkçı, Abdullah, Kim, Jeremie S., Devaux, Fabrice, and Mutlu, Onur
- Abstract
arXiv
- Published
- 2021
10. Security Analysis of the Silver Bullet Technique for RowHammer Prevention
- Author
-
Ya��l��k����, Abdullah Giray, Kim, Jeremie S., Devaux, Fabrice, and Mutlu, Onur
- Subjects
FOS: Computer and information sciences ,Computer Science - Cryptography and Security ,GeneralLiterature_INTRODUCTORYANDSURVEY ,Hardware Architecture (cs.AR) ,Computer Science - Hardware Architecture ,Cryptography and Security (cs.CR) - Abstract
The purpose of this document is to study the security properties of the Silver Bullet algorithm against worst-case RowHammer attacks. We mathematically demonstrate that Silver Bullet, when properly configured and implemented in a DRAM chip, can securely prevent RowHammer attacks. The demonstration focuses on the most representative implementation of Silver Bullet, the patent claiming many implementation possibilities not covered in this demonstration. Our study concludes that Silver Bullet is a promising RowHammer prevention mechanism that can be configured to operate securely against RowHammer attacks at various efficiency-area tradeoff points, supporting relatively small hammer count values (e.g., 1000) and Silver Bullet table sizes (e.g., 1.06KB)., 40 pages
- Published
- 2021
11. AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes
- Author
-
Kim, Jeremie S., Firtina, Can, Cavlak, Meryem Banu, Senol Cali, Damla, Hajinazar, Nastaran, Alser, Mohammed, Alkan, Can, and Mutlu, Onur
- Subjects
Genome Read Mapping ,Remapping ,LiftOver ,Genome Assembly ,Crossmap - Abstract
As genome sequencing tools and techniques improve, researchers are able to incrementally assemble more accurate reference genomes, which enable sensitivity in read mapping and downstream analysis such as variant calling. A more sensitive downstream analysis is critical for a better understanding of the genome donor (e.g., health characteristics). Therefore, read sets from sequenced samples should ideally be mapped to the latest available reference genome that represents the most relevant population. Unfortunately, the increasingly large amount of available genomic data makes it prohibitively expensive to fully re-map each read set to its respective reference genome every time the reference is updated. There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i.e., remapping). However, if a read maps to a region in the old reference that does not appear with a reasonable degree of similarity in the new reference, the read cannot be remapped. We find that, as a result of this drawback, a significant portion of annotations are lost when using state-of-the-art remapping tools. To address this major limitation in existing tools, we propose AirLift, a fast and comprehensive technique for remapping alignments from one genome to another. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces 1) the number of reads that need to be fully mapped to the new reference by up to 99.99\% and 2) the overall execution time to remap read sets between two reference genome versions by 6.7x, 6.6x, and 2.8x for large (human), medium (C. elegans), and small (yeast) reference genomes, respectively. We validate our remapping results with GATK and find that AirLift provides similar accuracy in identifying ground truth SNP and INDEL variants as the baseline of fully mapping a read set., arXiv
- Published
- 2021
12. COVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model
- Author
-
Alser, Mohammed, Kim, Jeremie S., Almadhoun Alserr, Nour, Tell, Stefan W., and Mutlu, Onur
- Abstract
Motivation: Early detection and isolation of COVID-19 patients are essential for successful implementation of mitigation strategies and eventually curbing the disease spread. With a limited number of daily COVID19 tests performed in every country, simulating the COVID-19 spread along with the potential effect of each mitigation strategy currently remains one of the most effective ways in managing the healthcare system and guiding policy-makers. We introduce COVIDHunter, a flexible and accurate COVID-19 outbreak simulation model that evaluates the current mitigation measures that are applied to a region and provides suggestions on what strength the upcoming mitigation measure should be. The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by simulating the average number of new infections caused by an infected person considering the effect of external factors, such as environmental conditions (e.g., climate, temperature, humidity) and mitigation measures. Results: Using Switzerland as a case study, COVIDHunter estimates that the policy-makers need to keep the current mitigation measures for at least 30 days to prevent demand from quickly exceeding existing hospital capacity. Relaxing the mitigation measures by 50% for 30 days increases both the daily capacity need for hospital beds and daily number of deaths exponentially by an average of 23.8x, who may occupy ICU beds and ventilators for a period of time. Unlike existing models, the COVIDHunter model accurately monitors and predicts the daily number of cases, hospitalizations, and deaths due to COVID-19. Our model is flexible to configure and simple to modify for modeling different scenarios under different environmental conditions and mitigation measures., arXiv
- Published
- 2021
13. FastRemap: a tool for quickly remapping reads between genome assemblies.
- Author
-
Kim, Jeremie S, Firtina, Can, Cavlak, Meryem Banu, Cali, Damla Senol, Alkan, Can, and Mutlu, Onur
- Subjects
- *
SOURCE code , *C++ , *STEVEDORES - Abstract
Motivation A genome read dataset can be quickly and efficiently remapped from one reference to another similar reference (e.g. between two reference versions or two similar species) using a variety of tools, e.g. the commonly used CrossMap tool. With the explosion of available genomic datasets and references, high-performance remapping tools will be even more important for keeping up with the computational demands of genome assembly and analysis. Results We provide FastRemap, a fast and efficient tool for remapping reads between genome assemblies. FastRemap provides up to a 7.82× speedup (6.47×, on average) and uses as low as 61.7% (80.7%, on average) of the peak memory consumption compared to the state-of-the-art remapping tool, CrossMap. Availability and implementation FastRemap is written in C++. Source code and user manual are freely available at: github.com/CMU-SAFARI/FastRemap. Docker image available at: https://hub.docker.com/r/alkanlab/fastremap. Also available in Bioconda at: https://anaconda.org/bioconda/fastremap-bio. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. GRIM-filter: fast seed filtering in read mapping using emerging memory technologies
- Author
-
Kim, Jeremie S, Senol, Damla, Xin, Hongyi, Lee, Donghyuk, Ghose, Saugata, Alser, Mohammed, Hassan, Hasan, Ergin, Oguz, Alkan, Can, and Mutlu, Onur
- Subjects
Genomics (q-bio.GN) ,FOS: Biological sciences ,Quantitative Biology - Genomics - Abstract
Motivation: Seed filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. Read mappers 1) quickly generate possible mapping locations (i.e., seeds) for each read, 2) extract reference sequences at each of the mapping locations, and then 3) check similarity between each read and its associated reference sequences with a computationally expensive dynamic programming algorithm (alignment) to determine the origin of the read. Location filters come into play before alignment, discarding seed locations that alignment would have deemed a poor match. The ideal location filter would discard all poor matching locations prior to alignment such that there is no wasted computation on poor alignments. Results: We propose a novel filtering algorithm, GRIM-Filter, optimized to exploit emerging 3D-stacked memory systems that integrate computation within a stacked logic layer, enabling processing-in-memory (PIM). GRIM-Filter quickly filters locations by 1) introducing a new representation of coarse-grained segments of the reference genome and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for 5% error acceptance rates, GRIM-Filter eliminates 5.59x-6.41x more false negatives and exhibits end-to-end speedups of 1.81x-3.65x compared to mappers employing the best previous filtering algorithm.
- Published
- 2017
15. RowHammer: A Retrospective.
- Author
-
Mutlu, Onur and Kim, Jeremie S.
- Subjects
- *
DYNAMIC random access memory , *PHASE change memory , *FLASH memory , *SCIENTIFIC literature - Abstract
This retrospective paper describes the RowHammer problem in dynamic random access memory (DRAM), which was initially introduced by Kim et al. at the ISCA 2014 Conference. RowHammer is a prime (and perhaps the first) example of how a circuit-level failure mechanism can cause a practical and widespread system security vulnerability. It is the phenomenon that repeatedly accessing a row in a modern DRAM chip causes bit flips in physically adjacent rows at consistently predictable bit locations. RowHammer is caused by a hardware failure mechanism called DRAM disturbance errors, which is a manifestation of circuit-level cell-to-cell interference in a scaled memory technology. Researchers from Google Project Zero demonstrated in 2015 that this hardware failure mechanism can be effectively exploited by user-level programs to gain kernel privileges on real systems. Many other follow-up works demonstrated other practical attacks exploiting RowHammer. In this paper, we comprehensively survey the scientific literature on RowHammer-based attacks as well as mitigation techniques to prevent RowHammer. We also discuss what other related vulnerabilities may be lurking in DRAM and other types of memories, e.g., NAND flash memory or phase change memory, that can potentially threaten the foundations of secure systems, as the memory technologies scale to higher densities. We conclude by describing and advocating a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
16. Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm.
- Author
-
Firtina, Can, Kim, Jeremie S, Alser, Mohammed, Cali, Damla Senol, Cicek, A Ercument, Alkan, Can, and Mutlu, Onur
- Subjects
- *
FORWARD-backward algorithm , *VITERBI decoding , *BASE pairs , *MARKOV processes , *ERROR rates - Abstract
Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
17. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.
- Author
-
Cali, Damla Senol, Kim, Jeremie S, Ghose, Saugata, Alkan, Can, and Mutlu, Onur
- Subjects
- *
ERROR rates , *SEQUENCE analysis , *TECHNOLOGY - Abstract
Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
18. The Reach Profiler (REAPER).
- Author
-
Patel, Minesh, Kim, Jeremie S., and Mutlu, Onur
- Published
- 2017
- Full Text
- View/download PDF
19. AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes.
- Author
-
Kim JS, Firtina C, Cavlak MB, Cali DS, Hajinazar N, Alser M, Alkan C, and Mutlu O
- Abstract
AirLift is the first read remapping tool that enables users to quickly and comprehensively map a read set, that had been previously mapped to one reference genome, to another similar reference. Users can then quickly run a downstream analysis of read sets for each latest reference release. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces the overall execution time to remap read sets between two reference genome versions by up to 27.4×. We validate our remapping results with GATK and find that AirLift provides high accuracy in identifying ground truth SNP/INDEL variants.
- Published
- 2024
- Full Text
- View/download PDF
20. COVIDHunter: COVID-19 Pandemic Wave Prediction and Mitigation via Seasonality Aware Modeling.
- Author
-
Alser M, Kim JS, Almadhoun Alserr N, Tell SW, and Mutlu O
- Subjects
- Climate, Humans, Models, Theoretical, Pandemics, Temperature, COVID-19 epidemiology, COVID-19 prevention & control
- Abstract
Early detection and isolation of COVID-19 patients are essential for successful implementation of mitigation strategies and eventually curbing the disease spread. With a limited number of daily COVID-19 tests performed in every country, simulating the COVID-19 spread along with the potential effect of each mitigation strategy currently remains one of the most effective ways in managing the healthcare system and guiding policy-makers. We introduce COVIDHunter, a flexible and accurate COVID-19 outbreak simulation model that evaluates the current mitigation measures that are applied to a region, predicts COVID-19 statistics (the daily number of cases, hospitalizations, and deaths), and provides suggestions on what strength the upcoming mitigation measure should be. The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by simulating the average number of new infections caused by an infected person considering the effect of external factors, such as environmental conditions (e.g., climate, temperature, humidity), different variants of concern, vaccination rate, and mitigation measures. Using Switzerland as a case study, COVIDHunter estimates that we are experiencing a deadly new wave that will peak on 26 January 2022, which is very similar in numbers to the wave we had in February 2020. The policy-makers have only one choice that is to increase the strength of the currently applied mitigation measures for 30 days. Unlike existing models, the COVIDHunter model accurately monitors and predicts the daily number of cases, hospitalizations, and deaths due to COVID-19. Our model is flexible to configure and simple to modify for modeling different scenarios under different environmental conditions and mitigation measures. We release the source code of the COVIDHunter implementation at https://github.com/CMU-SAFARI/COVIDHunter and show how to flexibly configure our model for any scenario and easily extend it for different measures and conditions than we account for., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2022 Alser, Kim, Almadhoun Alserr, Tell and Mutlu.)
- Published
- 2022
- Full Text
- View/download PDF
21. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.
- Author
-
Senol Cali D, Kim JS, Ghose S, Alkan C, and Mutlu O
- Subjects
- Animals, Chromosome Mapping, Computational Biology, Escherichia coli genetics, Genome, Bacterial, Genomics statistics & numerical data, Genomics trends, Humans, Nanopore Sequencing statistics & numerical data, Nanopore Sequencing trends, Sequence Analysis, DNA, Software, Genomics methods, Nanopore Sequencing methods
- Abstract
Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology., (© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.