143 results on '"Scott Levy"'
Search Results
52. Understanding the Effects of Communication and Coordination on Checkpointing at Scale.
- Author
-
Kurt B. Ferreira, Patrick M. Widener, Scott Levy, Dorian C. Arnold, and Torsten Hoefler
- Published
- 2014
- Full Text
- View/download PDF
53. Table S1 from Clinical Utility of Plasma Cell-Free DNA in Adult Patients with Newly Diagnosed Glioblastoma: A Pilot Prospective Study
- Author
-
Erica L. Carpenter, Arati S. Desai, Steven Brem, Andrew J. Cucchiara, Donald M. O'Rourke, Zev A. Binder, Jennifer J.D. Morrissette, MacLean P. Nasrallah, Stephanie S. Yee, Theresa Christensen, Samantha Guiry, Timothy Prior, Jasmin Hussain, Whitney Sarchiapone, Scott Levy, Jeffrey B. Ware, Jacob E. Till, Jazmine J. Mays, S. Ali Nabavizadeh, and Stephen J. Bagley
- Abstract
Tissue next-generation sequencing gene coverage (152 genes)
- Published
- 2023
- Full Text
- View/download PDF
54. Figure S1 from Clinical Utility of Plasma Cell-Free DNA in Adult Patients with Newly Diagnosed Glioblastoma: A Pilot Prospective Study
- Author
-
Erica L. Carpenter, Arati S. Desai, Steven Brem, Andrew J. Cucchiara, Donald M. O'Rourke, Zev A. Binder, Jennifer J.D. Morrissette, MacLean P. Nasrallah, Stephanie S. Yee, Theresa Christensen, Samantha Guiry, Timothy Prior, Jasmin Hussain, Whitney Sarchiapone, Scott Levy, Jeffrey B. Ware, Jacob E. Till, Jazmine J. Mays, S. Ali Nabavizadeh, and Stephen J. Bagley
- Abstract
Plasma cell-free DNA (cfDNA) concentration (ng/mL) is correlated with (A), total radiographic tumor burden (contrast-enhancing tumor + T2/FLAIR non-enhancing tumor) and (B), contrast-enhancing tumor burden at the first post-radiation MRI scan in patients with newly diagnosed glioblastoma
- Published
- 2023
- Full Text
- View/download PDF
55. Data from Clinical Utility of Plasma Cell-Free DNA in Adult Patients with Newly Diagnosed Glioblastoma: A Pilot Prospective Study
- Author
-
Erica L. Carpenter, Arati S. Desai, Steven Brem, Andrew J. Cucchiara, Donald M. O'Rourke, Zev A. Binder, Jennifer J.D. Morrissette, MacLean P. Nasrallah, Stephanie S. Yee, Theresa Christensen, Samantha Guiry, Timothy Prior, Jasmin Hussain, Whitney Sarchiapone, Scott Levy, Jeffrey B. Ware, Jacob E. Till, Jazmine J. Mays, S. Ali Nabavizadeh, and Stephen J. Bagley
- Abstract
Purpose:The clinical utility of plasma cell-free DNA (cfDNA) has not been assessed prospectively in patients with glioblastoma (GBM). We aimed to determine the prognostic impact of plasma cfDNA in GBM, as well as its role as a surrogate of tumor burden and substrate for next-generation sequencing (NGS).Experimental Design:We conducted a prospective cohort study of 42 patients with newly diagnosed GBM. Plasma cfDNA was quantified at baseline prior to initial tumor resection and longitudinally during chemoradiotherapy. Plasma cfDNA was assessed for its association with progression-free survival (PFS) and overall survival (OS), correlated with radiographic tumor burden, and subjected to a targeted NGS panel.Results:Prior to initial surgery, GBM patients had higher plasma cfDNA concentration than age-matched healthy controls (mean 13.4 vs. 6.7 ng/mL, P < 0.001). Plasma cfDNA concentration was correlated with radiographic tumor burden on patients' first post-radiation magnetic resonance imaging scan (ρ = 0.77, P = 0.003) and tended to rise prior to or concurrently with radiographic tumor progression. Preoperative plasma cfDNA concentration above the mean (>13.4 ng/mL) was associated with inferior PFS (median 4.9 vs. 9.5 months, P = 0.038). Detection of ≥1 somatic mutation in plasma cfDNA occurred in 55% of patients and was associated with nonstatistically significant decreases in PFS (median 6.0 vs. 8.7 months, P = 0.093) and OS (median 5.5 vs. 9.2 months, P = 0.053).Conclusions:Plasma cfDNA may be an effective prognostic tool and surrogate of tumor burden in newly diagnosed GBM. Detection of somatic alterations in plasma is feasible when samples are obtained prior to initial surgical resection.
- Published
- 2023
- Full Text
- View/download PDF
56. A study of the viability of exploiting memory content similarity to improve resilience to memory errors.
- Author
-
Scott Levy, Kurt B. Ferreira, Patrick G. Bridges, Aidan P. Thompson, and Christian R. Trott
- Published
- 2015
- Full Text
- View/download PDF
57. Exploiting Content Similarity to Improve Memory Performance in Large-Scale High-Performance Computing Systems.
- Author
-
Scott Levy
- Published
- 2013
- Full Text
- View/download PDF
58. Asking the Right Questions: Benchmarking Fault-Tolerant Extreme-Scale Systems.
- Author
-
Patrick M. Widener, Kurt B. Ferreira, Scott Levy, Patrick G. Bridges, Dorian C. Arnold, and Ron Brightwell
- Published
- 2013
- Full Text
- View/download PDF
59. Using unreliable virtual hardware to inject errors in extreme-scale systems.
- Author
-
Scott Levy, Matthew G. F. Dosanjh, Patrick G. Bridges, and Kurt B. Ferreira
- Published
- 2013
- Full Text
- View/download PDF
60. Using Simulation to Evaluate the Performance of Resilience Strategies at Scale.
- Author
-
Scott Levy, Bryan Topp, Kurt B. Ferreira, Dorian C. Arnold, Torsten Hoefler, and Patrick M. Widener
- Published
- 2013
- Full Text
- View/download PDF
61. Evaluating the feasibility of using memory content similarity to improve system resilience.
- Author
-
Scott Levy, Patrick G. Bridges, Kurt B. Ferreira, Aidan P. Thompson, and Christian R. Trott
- Published
- 2013
- Full Text
- View/download PDF
62. Variations of Classical and Nonclassical Ultrasound Nonlinearity Parameters during Heat-Induced Microstructural Evolution in an Iron-Copper Alloy
- Author
-
Laurence J. Jacobs, Katherine Marie Scott Levy, and Jin-Yeon Kim
- Subjects
010302 applied physics ,Heat induced ,Microstructural evolution ,Iron copper ,Materials science ,business.industry ,Mechanical Engineering ,Ultrasound ,Alloy ,engineering.material ,01 natural sciences ,Nonlinear system ,Mechanics of Materials ,0103 physical sciences ,engineering ,General Materials Science ,Composite material ,business ,010301 acoustics - Abstract
This research demonstrates and compares the potential of two nonlinear ultrasound techniques, second harmonic generation (SHG) and nonlinear resonant ultrasound spectroscopy (NRUS). This research examines a set of thermally aged iron-copper (Fe-1.0% Cu) alloy specimens, which are used as surrogate specimens for radiation damage. It is found that both SHG and NRUS are sensitive to the growth of the copper precipitates, while the changes in the respective nonlinearity parameters are due to different mechanisms. This research demonstrates the potential use of both of these nonlinear ultrasound techniques.
- Published
- 2021
- Full Text
- View/download PDF
63. Enhanced Fiber Tractography Using Edema Correction: Application and Evaluation in High-Grade Gliomas
- Author
-
Drew Parker, Ragini Verma, Timothy H. Lucas, Michael L. McGarvey, Mark A. Elliott, Fraser Henderson, Steven Brem, Wesley B Hodges, Ronald L. Wolf, Lisa Desiderio, Jessica Harsch, Lauren Karpf, Anupa Ambili Vijayakumari, Eileen Maloney-Wilensky, and Scott Levy
- Subjects
computer.software_genre ,03 medical and health sciences ,0302 clinical medicine ,Voxel ,Edema ,Fractional anisotropy ,Humans ,Medicine ,Diffusion Tractography ,medicine.diagnostic_test ,Brain Neoplasms ,business.industry ,Magnetic resonance imaging ,Glioma ,Magnetic Resonance Imaging ,Diffusion Tensor Imaging ,Research—Human—Clinical Studies ,030220 oncology & carcinogenesis ,Surgery ,Neurology (clinical) ,medicine.symptom ,business ,Functional magnetic resonance imaging ,Nuclear medicine ,computer ,030217 neurology & neurosurgery ,Diffusion MRI ,Tractography - Abstract
Background A limitation of diffusion tensor imaging (DTI)-based tractography is peritumoral edema that confounds traditional diffusion-based magnetic resonance metrics. Objective To augment fiber-tracking through peritumoral regions by performing novel edema correction on clinically feasible DTI acquisitions and assess the accuracy of the fiber-tracks using intraoperative stimulation mapping (ISM), task-based functional magnetic resonance imaging (fMRI) activation maps, and postoperative follow-up as reference standards. Methods Edema correction, using our bi-compartment free water modeling algorithm (FERNET), was performed on clinically acquired DTI data from a cohort of 10 patients presenting with suspected high-grade glioma and peritumoral edema in proximity to and/or infiltrating language or motor pathways. Deterministic fiber-tracking was then performed on the corrected and uncorrected DTI to identify tracts pertaining to the eloquent region involved (language or motor). Tracking results were compared visually and quantitatively using mean fiber count, voxel count, and mean fiber length. The tracts through the edematous region were verified based on overlay with the corresponding motor or language task-based fMRI activation maps and intraoperative ISM points, as well as at time points after surgery when peritumoral edema had subsided. Results Volume and number of fibers increased with application of edema correction; concordantly, mean fractional anisotropy decreased. Overlay with functional activation maps and ISM-verified eloquence of the increased fibers. Comparison with postsurgical follow-up scans with lower edema further confirmed the accuracy of the tracts. Conclusion This method of edema correction can be applied to standard clinical DTI to improve visualization of motor and language tracts in patients with glioma-associated peritumoral edema.
- Published
- 2021
- Full Text
- View/download PDF
64. Factors Associated with an Outbreak of COVID-19 in Oilfield Workers, Kazakhstan, 2020
- Author
-
Dilyara Nabirova, Ryszhan Taubayeva, Ainur Maratova, Alden Henderson, Sayagul Nassyrova, Marhzan Kalkanbayeva, Sevak Alaverdyan, Manar Smagul, Scott Levy, Aizhan Yesmagambetova, and Daniel Singer
- Subjects
Adult ,Male ,SARS-CoV-2 ,Health, Toxicology and Mutagenesis ,viruses ,fungi ,Public Health, Environmental and Occupational Health ,COVID-19 ,Kazakhstan ,Disease Outbreaks ,respiratory tract diseases ,body regions ,oilfield ,pandemic ,occupational setting ,individual factors ,environmental factors ,worker safety ,FETP ,Case-Control Studies ,Humans ,Female ,Oil and Gas Fields ,skin and connective tissue diseases - Abstract
From March to May 2020, 1306 oilfield workers in Kazakhstan tested positive for SARS-CoV-2. We conducted a case-control study to assess factors associated with SARS-CoV-2 transmission. The cases were PCR-positive for SARS-CoV-2 during June–September 2020. Controls lived at the same camp and were randomly selected from the workers who were PCR-negative for SARS-CoV-2. Data was collected telephonically by interviewing the oil workers. The study had 296 cases and 536 controls with 627 (75%) men, and 527 (63%) were below 40 years of age. Individual factors were the main drivers of transmission, with little contribution by environmental factors. Of the twenty individual factors, rare hand sanitizer use, travel before shift work, and social interactions outside of work increased SARS-CoV-2 transmission. Of the twenty-two environmental factors, only working in air-conditioned spaces was associated with SARS-CoV-2 transmission. Communication messages may enhance workers’ individual responsibility and responsibility for the safety of others to reduce SARS-CoV-2 transmission.
- Published
- 2022
- Full Text
- View/download PDF
65. Characterizing Memory Failures Using Benford’s Law
- Author
-
Kurt B. Ferreira and Scott Levy
- Published
- 2022
- Full Text
- View/download PDF
66. Tweet Naked: A Bare-All Social Media Strategy for Boosting Your Brand and Your Business
- Author
-
Scott Levy
- Published
- 2013
67. Exploiting MISD Performance Opportunities in Multi-core Systems.
- Author
-
Patrick G. Bridges, Donour Sizemore, and Scott Levy
- Published
- 2011
68. MiniMod: A Modular Miniapplication Benchmarking Framework for HPC
- Author
-
William Marts, Matthew Dosanjh, William Schonbein, Scott Levy, Ryan Grant, and Patrick Bridges
- Published
- 2021
- Full Text
- View/download PDF
69. MiniMod: A Modular Miniapplication Benchmarking Framework for HPC
- Author
-
Scott Levy, Matthew G. F. Dosanjh, Patrick G. Bridges, Ryan E. Grant, Whit Schonbein, and W. Pepper Marts
- Subjects
Flexibility (engineering) ,Computer science ,business.industry ,Benchmarking ,Modular design ,computer.software_genre ,Instruction set ,Kernel (linear algebra) ,Computer architecture ,Models of communication ,Middleware (distributed applications) ,business ,computer ,Data transmission - Abstract
The HPC application community has proposed many new application communication structures, middleware interfaces, and communication models to improve HPC application performance. Modifying proxy applications is the standard practice for the evaluation of these novel methodologies. Currently, this requires the creation of a new version of the proxy application for each combination of the approach being tested. In this article, we present a modular proxy-application framework, MiniMod, that enables evaluation of a combination of independently written computation kernels, data transfer logic, communication access, and threading libraries. MiniMod is designed to allow rapid development of individual modules which can be combined at runtime. Through MiniMod, developers only need a single implementation to evaluate application impact under a variety of scenarios.We demonstrate the flexibility of MiniMod’s design by using it to implement versions of a heat diffusion kernel and the miniFE finite element proxy application, along with a variety of communication, granularity, and threading modules. We examine how changing communication libraries, communication granularities, and threading approaches impact these applications on an HPC system. These experiments demonstrate that MiniMod can rapidly improve the ability to assess new middleware techniques for scientific computing applications and next-generation hardware platforms.
- Published
- 2021
- Full Text
- View/download PDF
70. Understanding the Effects of DRAM Correctable Error Logging at Scale
- Author
-
Victor Kuhns, Sean Blanchard, Nathan DeBardeleben, Kurt B. Ferreira, and Scott Levy
- Subjects
Random access memory ,Computer science ,Computer cluster ,Scale (chemistry) ,Fault tolerance ,State (computer science) ,Dram ,System characteristics ,Reliability engineering - Abstract
Fault tolerance poses a major challenge for future large-scale systems. Current research on fault tolerance has been principally focused on mitigating the impact of uncorrectable errors: errors that corrupt the state of the machine and require a restart from a known good state. However, correctable errors occur much more frequently than uncorrectable errors and may be even more common on future systems. Although an application can safely continue to execute when correctable errors occur, recovery from a correctable error requires the error to be corrected and, in most cases, information about its occurrence to be logged. The potential performance impact of these recovery activities has not been extensively studied in HPC. In this paper, we use simulation to examine the relationship between recovery from correctable errors and application performance for several important extreme-scale workloads. Our paper contains what is, to the best of our knowledge, the first detailed analysis of the impact of correctable errors on application performance. Our study shows that correctable errors can have significant impact on application performance for future systems. We also find that although the focus on correctable errors is focused on reducing failure rates, reducing the time required to log individual errors may have a greater impact on overheads at scale. Finally, this study outlines the error frequency and durations targets to keep correctable overheads similar to that of today’s systems. This paper provides critical analysis and insight into the overheads of correctable errors and provides practical advice to systems administrators and hardware designers in an effort to fine-tune performance to application and system characteristics.
- Published
- 2021
- Full Text
- View/download PDF
71. Using simulation to examine the effect of MPI message matching costs on application performance
- Author
-
Kurt B. Ferreira, Scott Levy, Whit Schonbein, Matthew G. F. Dosanjh, and Ryan E. Grant
- Subjects
Computer engineering ,Artificial Intelligence ,Computer Networks and Communications ,Hardware and Architecture ,Computer science ,Computer Graphics and Computer-Aided Design ,Queue ,Software ,Theoretical Computer Science - Abstract
Attaining high performance with MPI applications requires efficient message matching to minimize message processing overheads and the latency these overheads introduce into application communication. In this paper, we use a validated simulation-based approach to examine the relationship between MPI message matching performance and application time-to-solution. Specifically, we examine how the performance of several important HPC workloads is affected by the time required for matching. Our analysis yields several important contributions: (i) the performance of current workloads is unlikely to be significantly affected by MPI matching unless match queue operations get much slower or match queues get much longer; (ii) match queue designs that provide sublinear performance as a function of queue length are unlikely to yield much benefit unless match queue lengths increase dramatically; and (iii) we provide guidance on how long the mean time per match attempt may be without significantly affecting application performance. The results and analysis in this paper provide valuable guidance on the design and development of MPI message match queues.
- Published
- 2019
- Full Text
- View/download PDF
72. An Initial Examination of the Effect of Container Resource Constraints on Application Perturbation
- Author
-
Scott Levy and Kurt Ferreira
- Published
- 2021
- Full Text
- View/download PDF
73. Exploring the effect of noise on the performance benefit of nonblocking allreduce.
- Author
-
Patrick M. Widener, Kurt B. Ferreira, Scott Levy, and Torsten Hoefler
- Published
- 2014
- Full Text
- View/download PDF
74. Low-cost MPI Multithreaded Message Matching Benchmarking
- Author
-
Matthew G. F. Dosanjh, Scott Levy, W. Pepper Marts, Ryan E. Grant, and Whit Schonbein
- Subjects
020203 distributed computing ,Matching (statistics) ,Computer science ,Distributed computing ,Message Passing Interface ,Process (computing) ,010103 numerical & computational mathematics ,02 engineering and technology ,Network interface ,Benchmarking ,01 natural sciences ,Multithreading ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Code (cryptography) ,0101 mathematics - Abstract
The Message Passing Interface (MPI) standard allows user-level threads to concurrently call into an MPI library. While this feature is currently rarely used, there is considerable interest from developers in adopting it in the near future. There is reason to believe that multithreaded communication may incur additional message processing overheads in terms of number of items searched during demultiplexing and amount of time spent searching because it has the potential to increase the number of messages exchanged and to introduce non-deterministic message ordering. Therefore, understanding the implications of adding multithreading to MPI applications is important for future application development.One strategy for advancing this understanding is through ‘low-cost’ benchmarks that emulate full communication patterns using fewer resources. For example, while a complete, ‘real-world’ multithreaded halo exchange requires 9 or 27 nodes, the low-cost alternative needs only two, making it deployable on systems where acquiring resources is difficult because of high utilization (e.g., busy capacity-computing systems), or impossible because the necessary resources do not exist (e.g., testbeds with too few nodes). While such benchmarks have been proposed, the reported results have been limited to a single architecture or derived indirectly through simulation, and no attempt has been made to confirm that a low-cost benchmark accurately captures features of full (non-emulated) exchanges. Moreover, benchmark code has not been made publicly available.The purpose of the study presented in this paper is to quantify how accurately the low-cost benchmark captures the matching behavior of the full, real-world benchmark. In the process, we also advocate for the feasibility and utility of the low-cost benchmark. We present a ‘real-world’ benchmark implementing a full multithreaded halo exchange on 9 and 27 nodes, as defined by 5-point and 9-point 2D stencils, and 7-point and 27-point 3D stencils. Likewise, we present a ‘low-cost’ benchmark that emulates these communication patterns using only two nodes. We then confirm, across multiple architectures, that the low-cost benchmark gives accurate estimates of both number of items searched during message processing, and time spent processing those messages. Finally, we demonstrate the utility of the low-cost benchmark by using it to profile the performance impact of state-of-the-art Mellanox ConnectX-5 hardware support for offloaded MPI message demultiplexing. To facilitate further research on the effects of multithreaded MPI on message matching behavior, the source of our two benchmarks is to be included in the next release version of the Sandia MPI Micro-Benchmark Suite.
- Published
- 2020
- Full Text
- View/download PDF
75. Low-cost MPI Multithreaded Message Matching Benchmarking
- Author
-
William Schonbein, Ryan Grant, Scott Levy, Matthew Dosanjh, and William Marts
- Published
- 2020
- Full Text
- View/download PDF
76. Stress in the Workplace- Implementing Solutions: Preparing the Individual and Organization for When the Worst-Case Scenario is Actualized
- Author
-
Janis Davis-Street, Scott Levy, Christina Stevens, and Brian Walker
- Subjects
03 medical and health sciences ,0302 clinical medicine ,Risk analysis (engineering) ,Computer science ,Stress (linguistics) ,Worst-case scenario ,030212 general & internal medicine ,Resilience (network) ,030210 environmental & occupational health - Abstract
Objectives/Scope Workplace stress can happen for many different reasons but is very prominent during times of change and uncertainty. Many businesses have access to mental health and emotional well-being resources which they offer widely during normal operations and enhance those capabilities during times of expected stress. In this presentation we will discuss the challenge of what happens when that expected short-term period of difficulty morphs into longer term uncertainty. Method, Procedures, Process We will describe risk factors for workplace stress as well as short- and long-term solutions during periods of prolonged uncertainty. The discussion will include key components including baseline resources, gap assessments, and mitigating solutions. By working with business leaders to identify critical time periods and utilizing a validated survey tool, as well as a multitude of educational and informational materials, we were able to better deliver our health and wellness resources to deliver long-term support to the business. Results, Observations, Conclusions Mitigating solutions to address stress can be utilized to limit health and safety risks and optimize human performance. By offering a dynamic and relevant program which can be tailored to the individual workforce, we can maintain support for a prolonged period with positive results. Stress levels in the workplace can be assessed via validated tools, results of which can be managed across a spectrum of different work environments. Although our business environments differ considerably, by developing fit for purpose solutions we can implement high quality services to meet the needs of the business. Novel/Additive Information Fostering individual and organizational resilience early in the process and maintaining it throughout is an essential component especially when a worst-case scenario situation is highly probable. We will discuss how to best prepare both the organization and the individual to adapt, change and potentially even thrive during an extended period of uncertainty. We will explore simple methods of screening for mental health concerns well as developing interventions to optimize worker health and safety over prolonged periods of change and uncertainty.
- Published
- 2020
- Full Text
- View/download PDF
77. The Case for Explicit Reuse Semantics for RDMA Communication
- Author
-
Todd Kordenbrock, Patrick Widener, Scott Levy, and Craig D. Ulmer
- Subjects
Hardware_MEMORYSTRUCTURES ,Remote direct memory access ,business.industry ,Computer science ,Registered memory ,020206 networking & telecommunications ,02 engineering and technology ,Reuse ,Allocator ,Network interface controller ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Implicit memory ,business ,Computer network - Abstract
Remote Direct Memory Access (RDMA) is an increasingly important technology in high-performance computing (HPC). RDMA provides low-latency, high-bandwidth data transfer between compute nodes. Additionally, it does not require explicit synchronization with the destination processor. Eliminating unnecessary synchronization can significantly improve the communication performance of large-scale scientific codes. A long-standing challenge presented by RDMA communication is mitigating the cost of registering memory with the network interface controller (NIC). Reusing memory once it is registered has been shown to significantly reduce the cost of RDMA communication. However, existing approaches for reusing memory rely on implicit memory semantics. In this paper, we introduce an approach that makes memory reuse semantics explicit by exposing a separate allocator for registered memory. The data and analysis in this paper yield the following contributions: (i) managing registered memory explicitly enables efficient reuse of registered memory; (ii) registering large memory regions to amortize the registration cost over multiple user requests can significantly reduce cost of acquiring new registered memory; and (iii) reducing the cost of acquiring registered memory can significantly improve the performance of RDMA communication. Reusing registered memory is key to high-performance RDMA communication. By making reuse semantics explicit, our approach has the potential to improve RDMA performance by making it significantly easier for programmers to efficiently reuse registered memory.
- Published
- 2020
- Full Text
- View/download PDF
78. ALAMO: Autonomous Lightweight Allocation, Management, and Optimization
- Author
-
Jay Lofstead, Ann C. Gentile, Scott Levy, Andrew J. Younge, Kurt B. Ferreira, Ron Brightwell, Jim Brandt, Stephen L. Olivier, Ryan E. Grant, and Kevin Pedretti
- Subjects
Research program ,Computer science ,business.industry ,Scale (chemistry) ,020206 networking & telecommunications ,02 engineering and technology ,Variety (cybernetics) ,Software ,020204 information systems ,Paradigm shift ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Resource management ,Software engineering ,business - Abstract
Several recent workshops conducted by the DOE Advanced Scientific Computing Research program have established the fact that the complexity of developing applications and executing them on high-performance computing (HPC) systems is rising at a rate which will make it nearly impossible to continue to achieve higher levels of performance and scalability. Absent an alternative approach to managing this ever-growing complexity, HPC systems will become increasingly difficult to use. A more holistic approach to designing and developing applications and managing system resources is required. This paper outlines a research strategy for managing the increasing the complexity by providing the programming environment, software stack, and hardware capabilities needed for autonomous resource management of HPC systems. Developing portable applications for a variety of HPC systems of varying scale requires a paradigm shift from the current approach, where applications are painstakingly mapped to individual machine resources, to an approach where machine resources are automatically mapped and optimized to applications as they execute. Achieving such automated resource management for HPC systems is a daunting challenge that requires significant sustained investment in exploring new approaches and novel capabilities in software and hardware that span the spectrum from programming systems to device-level mechanisms. This paper provides an overview of the functionality needed to enable autonomous resource management and optimization and describes the components currently being explored at Sandia National Laboratories to help support this capability.
- Published
- 2020
- Full Text
- View/download PDF
79. Space-Efficient Reed-Solomon Encoding to Detect and Correct Pointer Corruption
- Author
-
Kurt B. Ferreira and Scott Levy
- Subjects
Hardware_MEMORYSTRUCTURES ,Correctness ,Memory errors ,Computer engineering ,Reed–Solomon error correction ,Computer science ,Pointer (computer programming) ,Encoding (memory) ,0202 electrical engineering, electronic engineering, information engineering ,02 engineering and technology ,Silent data corruption ,020202 computer hardware & architecture - Abstract
Concern about memory errors has been widespread in high-performance computing (HPC) for decades. These concerns have led to significant research on detecting and correcting memory errors to improve performance and provide strong guarantees about the correctness of the memory contents of scientific simulations. However, power concerns and changes in memory architectures threaten the viability of current approaches to protecting memory (e.g., Chipkill). Returning to less protective error-correcting codes (ECC), e.g., single-error correction, double-error detection (SECDED), may increase the frequency of memory errors, including silent data corruption (SDC). SDC has the potential to silently cause applications to produce incorrect results and mislead domain scientists. We propose an approach for exploiting unnecessary bits in pointer values to support encoding the pointer with a Reed-Solomon code. Encoding the pointer allows us to provides strong capabilities for correcting and detecting corruption of pointer values.
- Published
- 2020
- Full Text
- View/download PDF
80. Ethiopian paediatric oncology registry progress report: documentation practice improvements at tertiary care centre in Addis Ababa, Ethiopia
- Author
-
Sheila Weitzman, Julie Broas, Kaitlyn M Buhlinger, Daniel Hailu, Abdulkadir M Said Gidey, Wondwessen Bekele, Mohammed Mustefa, Vanessa Miller, Stephen M Clark, Thomas B. Alexander, David N. Korones, Tadele Hailu, Haileyesus Adam, Benyam Muluneh, Megan C. Roberts, Michael Chargualaf, Atalay Mulu Fentie, Mulugeta Ayalew Yimer, Scott Levy, Ali Mamude Dinkiye, Diriba Fufa, and Aziza T. Shad
- Subjects
medicine.medical_specialty ,Documentation ,Medical Oncology ,Pediatrics ,Tertiary care ,Article ,Unmet needs ,Tertiary Care Centers ,03 medical and health sciences ,0302 clinical medicine ,Paediatric cancer ,Neoplasms ,030225 pediatrics ,Humans ,Medicine ,Patient treatment ,Registries ,Child ,business.industry ,Paediatric oncology ,Medical record ,Quality Improvement ,Family medicine ,Pediatrics, Perinatology and Child Health ,Ethiopia ,business ,Delivery of Health Care ,Qualitative research - Abstract
Limited data are available regarding cancer in low and middle-income countries (LMICs), distorting the true burden of paediatric cancer.1 A sobering statistic based on available data shows that more than 80% of children diagnosed with cancer in high-income countries survive, while fewer than 25% of children in LMICs survive.2 While access to paediatric oncological care in Ethiopia is improving, the establishment of a national paediatric cancer registry remains an unmet need. Building on our previous work, we sought to standardise patient treatment documentation within the paediatric haematology and oncology department at Tikur Anbessa Specialized Hospital (TASH) in Addis Ababa, Ethiopia, to begin formal paediatric cancer registration at TASH.3 We interviewed medical record users and observed that there was a lack of consistency in treatment documentation as well as variability in the collection of data relating to cancer diagnoses. We attempted to address these gaps in documentation through the creation of two separate sets of data …
- Published
- 2021
- Full Text
- View/download PDF
81. Reform at Risk — Mandating Participation in Alternative Payment Plans
- Author
-
Rahul Rajkumar, Nicholas Bagley, and Scott Levy
- Subjects
media_common.quotation_subject ,Public administration ,Medicare ,Centers for Medicare and Medicaid Services, U.S ,Reimbursement Mechanisms ,03 medical and health sciences ,0302 clinical medicine ,Health care ,Agency (sociology) ,Health insurance ,Medicine ,030212 general & internal medicine ,media_common ,Government ,Medicaid ,business.industry ,Patient Protection and Affordable Care Act ,General Medicine ,Payment ,United States ,Health Care Reform ,Government Regulation ,United States Dept. of Health and Human Services ,030211 gastroenterology & hepatology ,Health Services Research ,Health care reform ,business - Abstract
Reform at Risk The Center for Medicare and Medicaid Innovation was meant to be the government’s innovation laboratory for health care. But HHS has quietly hobbled the agency, imperiling its ability...
- Published
- 2018
- Full Text
- View/download PDF
82. The Upcoming Storm: The Implications of Increasing Core Count on Scalable System Software
- Author
-
Matthew G.F. Dosanjh, Ryan E. Grant, Nathan Hjelm, Scott Levy, and Whit Schonbein
- Abstract
As clock speeds have stagnated, the number of cores in a node has been drastically increased to improve processor throughput. Most scalable system software was designed and developed for single-threaded environments. Multithreaded environments become increasingly prominent as application developers optimize their codes to leverage the full performance of the processor; however, these environments are incompatible with a number of assumptions that have driven scalable system software development. This paper will highlight a case study of this mismatch focusing on MPI message matching. MPI message matching has been designed and optimized for traditional serial execution. The reduced determinism in the order of MPI calls can significantly reduce the performance of MPI message matching, potentially overtaking time-per-iteration targets of many applications. Different proposed techniques attempt to address these issues and enable multithreaded MPI usage. These approaches highlight a number of tradeoffs that make adapting MPI message matching complex. This case study and its proposed solutions highlight a number of general concepts that need to be leveraged in the design of next generation scaleable system software.
- Published
- 2019
- Full Text
- View/download PDF
83. Clinical Utility of Plasma Cell-Free DNA in Adult Patients with Newly Diagnosed Glioblastoma: A Pilot Prospective Study
- Author
-
Arati Desai, Timothy Prior, Erica L. Carpenter, MacLean Nasrallah, Samantha Guiry, Donald M. O'Rourke, Jeffrey B. Ware, Zev A. Binder, S. Ali Nabavizadeh, Theresa Christensen, Whitney Sarchiapone, Steven Brem, Jennifer J.D. Morrissette, Jazmine Mays, Scott Levy, Jasmin Hussain, Jacob Till, Andrew J. Cucchiara, Stephanie S. Yee, and Stephen J Bagley
- Subjects
0301 basic medicine ,Oncology ,Adult ,Male ,Cancer Research ,medicine.medical_specialty ,Pilot Projects ,Newly diagnosed ,Plasma cell ,Free dna ,Article ,Circulating Tumor DNA ,03 medical and health sciences ,Young Adult ,0302 clinical medicine ,Germline mutation ,Internal medicine ,medicine ,Biomarkers, Tumor ,Humans ,Longitudinal Studies ,Prospective Studies ,Prospective cohort study ,Aged ,Aged, 80 and over ,Adult patients ,business.industry ,High-Throughput Nucleotide Sequencing ,Middle Aged ,medicine.disease ,Prognosis ,Magnetic Resonance Imaging ,Tumor Burden ,Survival Rate ,030104 developmental biology ,medicine.anatomical_structure ,030220 oncology & carcinogenesis ,Mutation ,Female ,business ,Glioblastoma ,Chemoradiotherapy - Abstract
Purpose: The clinical utility of plasma cell-free DNA (cfDNA) has not been assessed prospectively in patients with glioblastoma (GBM). We aimed to determine the prognostic impact of plasma cfDNA in GBM, as well as its role as a surrogate of tumor burden and substrate for next-generation sequencing (NGS). Experimental Design: We conducted a prospective cohort study of 42 patients with newly diagnosed GBM. Plasma cfDNA was quantified at baseline prior to initial tumor resection and longitudinally during chemoradiotherapy. Plasma cfDNA was assessed for its association with progression-free survival (PFS) and overall survival (OS), correlated with radiographic tumor burden, and subjected to a targeted NGS panel. Results: Prior to initial surgery, GBM patients had higher plasma cfDNA concentration than age-matched healthy controls (mean 13.4 vs. 6.7 ng/mL, P < 0.001). Plasma cfDNA concentration was correlated with radiographic tumor burden on patients' first post-radiation magnetic resonance imaging scan (ρ = 0.77, P = 0.003) and tended to rise prior to or concurrently with radiographic tumor progression. Preoperative plasma cfDNA concentration above the mean (>13.4 ng/mL) was associated with inferior PFS (median 4.9 vs. 9.5 months, P = 0.038). Detection of ≥1 somatic mutation in plasma cfDNA occurred in 55% of patients and was associated with nonstatistically significant decreases in PFS (median 6.0 vs. 8.7 months, P = 0.093) and OS (median 5.5 vs. 9.2 months, P = 0.053). Conclusions: Plasma cfDNA may be an effective prognostic tool and surrogate of tumor burden in newly diagnosed GBM. Detection of somatic alterations in plasma is feasible when samples are obtained prior to initial surgical resection.
- Published
- 2019
84. Arterial Spin Labeling and Dynamic Susceptibility Contrast-enhanced MR Imaging for evaluation of arteriovenous shunting and tumor hypoxia in glioblastoma
- Author
-
Samantha Guiry, Donald M. O'Rourke, Hamed Akbari, S. Ali Nabavizadeh, Steven Brem, Jeffrey B. Ware, Timothy Prior, Christos Davatzikos, Whitney Sarchiapone, Scott Levy, Ronald L. Wolf, Arati Desai, John A. Detre, MacLean Nasrallah, and Stephen J Bagley
- Subjects
0301 basic medicine ,Male ,medicine.medical_specialty ,lcsh:Medicine ,Contrast Media ,Article ,03 medical and health sciences ,0302 clinical medicine ,Text mining ,Internal medicine ,medicine ,Humans ,Arteriovenous shunting ,Prospective Studies ,lcsh:Science ,Aged ,Multidisciplinary ,Tumor hypoxia ,business.industry ,Brain Neoplasms ,lcsh:R ,Hypoxia (medical) ,Middle Aged ,medicine.disease ,Magnetic Resonance Imaging ,Cell Hypoxia ,Shunting ,CNS cancer ,030104 developmental biology ,Arterial spin labeling ,Cardiology ,lcsh:Q ,Cancer imaging ,Female ,Spin Labels ,medicine.symptom ,business ,Glioblastoma ,Perfusion ,030217 neurology & neurosurgery - Abstract
Glioblastoma (GBM) is the most common primary malignant brain tumor in adults and carries a dismal prognosis. Significant challenges in the care of patients with GBM include marked vascular heterogeneity and arteriovenous (AV) shunting, which results in tumor hypoxia and inadequate delivery of systemic treatments to reach tumor cells. In this study, we investigated the utility of different MR perfusion techniques to detect and quantify arteriovenous (AV) shunting and tumor hypoxia in patients with GBM. Macrovascular shunting was present in 33% of subjects, with the degree of shunting ranging from (37–60%) using arterial spin labeling perfusion. Among the dynamic susceptibility contrast-enhanced perfusion curve features, there were a strong negative correlation between hypoxia score, DSC perfusion curve recovery slope (r = −0.72, P = 0.018) and angle (r = −0.73, P = 0.015). The results of this study support the possibility of using arterial spin labeling and pattern analysis of dynamic susceptibility contrast-enhanced MR Imaging for evaluation of arteriovenous shunting and tumor hypoxia in glioblastoma.
- Published
- 2019
85. Hardware MPI message matching: Insights into MPI matching behavior to inform design
- Author
-
Taylor Groves, Michael J. Levenhagen, Kurt B. Ferreira, Ryan E. Grant, and Scott Levy
- Subjects
Thesaurus (information retrieval) ,Search engine ,Matching (statistics) ,Information retrieval ,Computational Theory and Mathematics ,Computer Networks and Communications ,Computer science ,Software ,Computer Science Applications ,Theoretical Computer Science - Published
- 2019
- Full Text
- View/download PDF
86. Mediating Data Center Storage Diversity in HPC Applications with FAODEL
- Author
-
Gary J. Templet, Scott Levy, Craig D. Ulmer, Patrick Widener, and Todd Kordenbrock
- Subjects
Service (systems architecture) ,business.industry ,Computer science ,Distributed computing ,Data management ,020206 networking & telecommunications ,02 engineering and technology ,Supercomputer ,Data type ,Workflow ,Computer data storage ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data center ,business - Abstract
Composition of computational science applications into both ad hoc pipelines for analysis of collected or generated data and into well-defined and repeatable workflows is becoming increasingly popular. Meanwhile, dedicated high performance computing storage environments are rapidly becoming more diverse, with both significant amounts of non-volatile memory storage and mature parallel file systems available. At the same time, computational science codes are being coupled to data analysis tools which are not filesystem-oriented. In this paper, we describe how the FAODEL data management service can expose different available data storage options and mediate among them in both application- and FAODEL-directed ways. These capabilities allow applications to exploit their knowledge of the different types of data they may exchange during a workflow execution, and also provide FAODEL with mechanisms to proactively tune data storage behavior when appropriate. We describe the implementation of these capabilities in FAODEL and how they are used by applications, and present preliminary performance results demonstrating the potential benefits of our approach.
- Published
- 2019
- Full Text
- View/download PDF
87. Comparison of changes in nonclassical (α) and classical (β) acoustic nonlinear parameters due to thermal aging of 9Cr–1Mo ferritic martensitic steel
- Author
-
Katherine Marie Scott Levy, Daniel Niklas Fahse, Jin-Yeon Kim, and Laurence J. Jacobs
- Subjects
Nonlinear system ,Rockwell scale ,Materials science ,Mechanical Engineering ,Martensite ,Nonlinear resonance ,Modulus ,General Materials Science ,Composite material ,Condensed Matter Physics ,Microstructure ,Laser Doppler vibrometer ,Carbide - Abstract
The objective of this research is to demonstrate the sensitivity of the hysteretic, nonclassical acoustic nonlinear parameter, α to track changes in the microstructure of 9Cr–1Mo ferritic martensitic steel due to thermal aging. The α parameter is measured with a non-contact nonlinear resonance ultrasound spectroscopy (NRUS) system with an air-coupled source and a laser Doppler vibrometer (LDV) receiver. This NRUS setup is used to track changes in multiple 9Cr–1Mo specimens subjected to different aging times at the same 650 ∘ C temperature. These α results are shown to be highly sensitive to the associated changes in the microstructure of the 9Cr–1Mo specimens, and are then compared to three different parameters – Rockwell hardness, Young's modulus, E and the classical acoustic nonlinear parameter, β – all measured in the same specimens. These results are then combined to infer microstructure changes such as the removal of dislocations and the formation of carbide precipitates occurring in the 9Cr–1Mo specimens during thermal aging.
- Published
- 2020
- Full Text
- View/download PDF
88. Lessons Learned from Memory Errors Observed Over the Lifetime of Cielo
- Author
-
Kurt B. Ferreira, Elisabeth Baseman, Vilas Sridharan, Taniya Siddiqua, Scott Levy, and Nathan DeBardeleben
- Subjects
Random access memory ,Memory errors ,Computer science ,Reliability (computer networking) ,Context (language use) ,02 engineering and technology ,Reliability engineering ,Memory management ,020204 information systems ,Cielo ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Static random-access memory ,Dram - Abstract
Maintaining the performance of high-performance computing (HPC) applications as failures increase is a major challenge for next-generation extreme-scale systems. Recent work demonstrates that hardware failures are expected to become more common. Few existing studies, however, have examined failures in the context of the entire lifetime of a single platform. In this paper, we analyze a corpus of empirical failure data collected over the entire five-year lifetime of Cielo, a leadership-class HPC system. Our analysis reveals several important findings about failures on Cielo: (i) its memory (DRAM and SRAM) exhibited no aging effects; detectable, uncorrectable errors (DUE) showed no discernible increase over its five-year lifetime; (ii) contrary to popular belief, correctable DRAM faults are not predictive of future uncorrectable DRAM faults; (iii) the majority of system down events have no identifiable hardware root cause, highlighting the need for more comprehensive logging facilities to improve failure analysis on future systems; and (iv) continued advances will be needed in order for current failure mitigation techniques to be viable on future systems. Our analysis of this corpus of empirical data provides critical analysis of, and guidance for, the deployment of extreme-scale systems.
- Published
- 2018
- Full Text
- View/download PDF
89. Using Simulation to Examine the Effect of MPI Message Matching Costs on Application Performance
- Author
-
Kurt B. Ferreira and Scott Levy
- Subjects
Message processing ,020203 distributed computing ,Computer engineering ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,Message Passing Interface ,010103 numerical & computational mathematics ,02 engineering and technology ,0101 mathematics ,Latency (engineering) ,01 natural sciences ,Queue - Abstract
Attaining high performance with MPI applications requires efficient message matching to minimize message processing overheads and the latency these overheads introduce into application communication. In this paper, we use a validated simulation-based approach to examine the relationship between MPI message matching performance and application time-to-solution. Specifically, we examine how the performance of several important HPC workloads is affected by the time required for matching. Our analysis yields several important contributions: (i) the performance of current workloads is unlikely to be significantly affected by MPI matching unless match queue operations get much slower or match queues get much longer; (ii) match queue designs that provide sublinear performance as a function of queue length are unlikely to yield much benefit unless match queue lengths increase dramatically; and (iii) we provide guidance on how long the mean time per match attempt may be without significantly affecting application performance. The results and analysis in this paper provide valuable guidance on the design and development of MPI message match queues.
- Published
- 2018
- Full Text
- View/download PDF
90. The unexpected virtue of almost: Exploiting MPI collective operations to approximately coordinate checkpoints
- Author
-
Scott Levy, Patrick Widener, and Kurt B. Ferreira
- Subjects
020203 distributed computing ,Virtue ,Computer Networks and Communications ,Computer science ,media_common.quotation_subject ,Fault tolerance ,010103 numerical & computational mathematics ,02 engineering and technology ,Parallel computing ,01 natural sciences ,Computer Science Applications ,Theoretical Computer Science ,Computational Theory and Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,Software ,media_common - Published
- 2018
- Full Text
- View/download PDF
91. ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE
- Author
-
Craig D. Ulmer, Edward G. Phillips, Christopher Siefert, Paul Lin, Eric C. Cyr, Gary J. Templet, Matthew Swan, Jonathan Joseph Hu, Christian A. Glusa, Scott Levy, Roger P. Pawlowski, Keith Cartwright, Irina Kalashnikova Tezaur, Curtis C. Ober, Sidafa Conde, Eric T. Phipps, Matthew Tyler Bettencourt, Richard Michael Jack Kramer, Micheal W. Glass, and Todd Kordenbrock
- Subjects
Aeronautics ,media_common.quotation_subject ,Empire ,Milestone ,media_common - Published
- 2018
- Full Text
- View/download PDF
92. Faodel
- Author
-
Jay Lofstead, Margaret Lawson, Shyamali Mukherjee, Todd Kordenbrock, Gary J. Templet, Patrick Widener, Craig D. Ulmer, and Scott Levy
- Subjects
business.industry ,Computer science ,Distributed computing ,Data management ,Scale (chemistry) ,Bandwidth (signal processing) ,020207 software engineering ,02 engineering and technology ,Set (abstract data type) ,Workflow ,020204 information systems ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,Data as a service ,business - Abstract
Composition of computational science applications, whether into ad hoc pipelines for analysis of simulation data or into well-defined and repeatable workflows, is becoming commonplace. In order to scale well as projected system and data sizes increase, developers will have to address a number of looming challenges. Increased contention for parallel filesystem bandwidth, accomodating in situ and ex situ processing, and the advent of decentralized programming models will all complicate application composition for next-generation systems. In this paper, we introduce a set of data services, Faodel, which provide scalable data management for workflows and composed applications. Faodel allows workflow components to directly and efficiently exchange data in semantically appropriate forms, rather than those dictated by the storage hierarchy or programming model in use. We describe the architecture of Faodel and present preliminary performance results demonstrating its potential for scalability in workflow scenarios.
- Published
- 2018
- Full Text
- View/download PDF
93. Obstructive Sleep Apnea OSA as a Cause Of Resistant Fatigue in the Safety Sensitive Workforce
- Author
-
Leslie Emma, Scott Levy, and Neelum Sanderson
- Subjects
Obstructive sleep apnea ,medicine.medical_specialty ,business.industry ,Workforce ,Emergency medicine ,medicine ,Apnea ,medicine.symptom ,medicine.disease ,business ,Sleep in non-human animals - Abstract
Objectives/Scope Fatigue is a known contributor to accidents. The potential for fatigue-related accidents also exists in the oil and energy industry. Fatigue risk management systems commonly involve review and adjustment of employee rosters and job functions to assist employees with getting rest based on their work demands. Although this approach is reasonable, it assumes that by giving the employee the ability to rest, that he/she will return refreshed. Certain medical conditions may inhibit an employee's ability to rest. Obstructive sleep apnea (OSA) is a medical condition where the patient has an airway obstruction which occurs when muscles in the upper airway relax while sleeping. This obstruction forces them to awaken, and if untreated, may lead to adverse medical conditions. For these patients, hours of work may not correlate as well with level of fatigue. Although there are many factors for OSA, the one most relevant for this abstract is Body Mass Index (BMI). In the adult population, OSA is estimated to be approximately 25% to 45% higher in obese subjects. The odds of having OSA increases as BMI rises and for individuals with a BMI of >35. Method, Procedures, Process We will describe risk factors for OSA, treatment of the condition, as well as methods to reduce fatigue related risk. The discussion will include key components of a medical screening program as well as health and wellness programming that can be considered in parallel with any Fatigue Risk Management System. Results, Observations, Conclusions Biometric data can be utilized to help predict the risk of fatigue related accidents in the workplace. By addressing the risks and providing solutions these incidents may decrease. Novel/Additive Information We will explore current OSA screening criteria, work hours limitations, and health and wellness programs as they relate to reducing risk. Most importantly, we will discuss a significant shortcoming with the identification of high risk individuals and an easy approach to help mitigate this risk.
- Published
- 2018
- Full Text
- View/download PDF
94. It’s Not the Heat, It’s the Humidity: Scheduling Resilience Activity at Scale
- Author
-
Patrick Widener, Kurt B. Ferreira, and Scott Levy
- Subjects
020203 distributed computing ,Risk analysis (engineering) ,Computer science ,media_common.quotation_subject ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,02 engineering and technology ,Psychological resilience ,Scheduling (computing) ,media_common - Abstract
Maintaining the performance of high-performance computing (HPC) applications with the expected increase in failures is a major challenge for next-generation extreme-scale systems. With increasing scale, resilience activities (e.g. checkpointing) are expected to become more diverse, less tightly synchronized, and more computationally intensive. Few existing studies, however, have examined how decisions about scheduling resilience activities impact application performance. In this work, we examine the relationship between the duration and frequency of resilience activities and application performance. Our study reveals several key findings: (i) the aggregate amount of time consumed by resilience activities is not an effective metric for predicting application performance; (ii) the duration of the interruptions due to resilience activities has the greatest influence on application performance; shorter, but more frequent, interruptions are correlated with better application performance; and (iii) the differential impact of resilience activities across applications is related to the applications’ inter-collective frequencies; the performance of applications that perform infrequent collective operations scales better in the presence of resilience activities than the performance of applications that perform more frequent collective operations. This initial study demonstrates the importance of considering how resilience activities are scheduled. We provide critical analysis and direct guidance on how the resilience challenges of future systems can be met while minimizing the impact on application performance.
- Published
- 2018
- Full Text
- View/download PDF
95. Empress
- Author
-
Jay Lofstead, Todd Kordenbrock, Scott Levy, Shyamali Mukherjee, Margaret Lawson, Gary J. Templet, Patrick Widener, and Craig D. Ulmer
- Subjects
Metadata ,World Wide Web ,Information retrieval ,Data element ,Computer science ,Metadata management ,Data_FILES ,Geospatial metadata ,Meta Data Services ,Metadata modeling ,Database catalog ,Metadata repository - Abstract
Significant challenges exist in the efficient retrieval of data from extreme-scale simulations. An important and evolving method of addressing these challenges is application-level metadata management. Historically, HDF5 and NetCDF have eased data retrieval by offering rudimentary attribute capabilities that provide basic metadata. ADIOS simplified data retrieval by utilizing metadata for each process' data. EMPRESS provides a simple example of the next step in this evolution by integrating per-process metadata with the storage system itself, making it more broadly useful than single file or application formats. Additionally, it allows for more robust and customizable metadata.
- Published
- 2017
- Full Text
- View/download PDF
96. Lifetime memory reliability data from the field
- Author
-
Scott Levy, Vilas Sridharan, Nathan DeBardeleben, Elisabeth Baseman, Taniya Siddiqua, Steven Raasch, Qiang Guan, and Kurt B. Ferreira
- Subjects
020203 distributed computing ,Computer science ,Reliability (computer networking) ,Mode (statistics) ,Percentage point ,02 engineering and technology ,Fault (power engineering) ,Field (computer science) ,Reliability engineering ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Static random-access memory ,Resilience (network) ,Dram - Abstract
In order to provide high system resilience, it is important to understand the nature of the faults that occur in the field. This study analyzes fault rates from a production system that has been monitored for five years, capturing data for the entire operational lifetime of the system. The data show that devices in this system did not show any sign of aging during the monitoring period, suggesting that the lifetime of a system may be longer than five years. In DRAM, the relative incidence of fault modes changed insignificantly over the system's lifetime: the relative rate of each fault mode at the end of the system's lifetime was within 1.4 percentage point of the rate observed during the first year. SRAM caches in the system exhibited different fault modes including cache-way fault and single-bit faults. Overall, this study provides insights on how fault modes and types in a system evolve over the system's lifetime.
- Published
- 2017
- Full Text
- View/download PDF
97. Evaluating the Viability of Using Compression to Mitigate Silent Corruption of Read-Mostly Application Data
- Author
-
Patrick G. Bridges, Scott Levy, and Kurt B. Ferreira
- Subjects
020203 distributed computing ,business.industry ,Computer science ,Language change ,020207 software engineering ,02 engineering and technology ,Construct (python library) ,Silent data corruption ,Computer security ,computer.software_genre ,Exascale computing ,Memory management ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,business ,Resource management (computing) ,Resilience (network) ,computer ,Memory protection - Abstract
Aggregating millions of hardware components to construct an exascale computing platform will pose significant resilience challenges. In addition to slowdowns associated with detected errors, silent errors are likely to further degrade application performance. Moreover, silent data corruption (SDC) has the potential to undermine the integrity of the results produced by important scientific applications.In this paper, we propose an application-independent mechanism to efficiently detect and correct SDC in read-mostly memory, where SDC may be most likely to occur. We use memory protection mechanisms to maintain compressed backups of application memory. We detect SDC by identifying changes in memory contents that occur without explicit write operations. We demonstrate that, for several applications, our approach can potentially protect a significant fraction of application memory pages from SDC with modest overheads. Moreover, our proposed technique can be straightforwardly combined with many other approaches to provide a significant bulwark against SDC.
- Published
- 2017
- Full Text
- View/download PDF
98. Bringing Regenerative Medicine to Patients: The Coverage, Coding, and Reimbursement Processes
- Author
-
Sujata K. Bhatia, Scott Levy, and Khin-Kyemon Aung
- Subjects
medicine.medical_specialty ,business.industry ,Medicine ,Coding (therapy) ,Medical physics ,business ,Regenerative medicine ,Reimbursement ,Biomedical engineering - Published
- 2017
- Full Text
- View/download PDF
99. Horseshoes and Hand Grenades: The Case for Approximate Coordination in Local Checkpointing Protocols
- Author
-
Patrick Widener, Kurt B. Ferreira, and Scott Levy
- Subjects
Scheme (programming language) ,Computer science ,Distributed computing ,Message Passing Interface ,020207 software engineering ,02 engineering and technology ,Synchronization ,Asynchrony (computer programming) ,020204 information systems ,Synchronization (computer science) ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,computer ,computer.programming_language - Abstract
Fault-tolerance poses a major challenge for future large-scale systems. Active research into coordinated, uncoordinated, and hybrid checkpointing systems has explored how the introduction of asynchrony can address anticipated scalability issues. While fully uncoordinated approaches have been shown to have significant delays, the degree of sychronization required to keep overheads low has not yet been significantly addressed. In this paper, we use a simulation-based approach to show the impact of synchronization on local checkpoint activity. Specifically, we show the degree of synchronization needed to keep the impacts of local checkpointing low is attainable with current technology for a number of key production HPC workloads. Our work provides a critical analysis and comparison of synchronization and local checkpointing. This enables users and system administrators to fine-tune the checkpointing scheme to the application and system characteristics available.
- Published
- 2017
- Full Text
- View/download PDF
100. A study of the viability of exploiting memory content similarity to improve resilience to memory errors
- Author
-
Kurt B. Ferreira, Patrick G. Bridges, Aidan P. Thompson, Scott Levy, and Christian Robert Trott
- Subjects
Memory errors ,Hardware and Architecture ,Computer science ,Node (networking) ,Distributed computing ,Similarity (psychology) ,Fault tolerance ,Resilience (network) ,Supercomputer ,Software ,Theoretical Computer Science - Abstract
Building the next-generation of extreme-scale distributed systems will require overcoming several challenges related to system resilience. As the number of processors in these systems grow, the failure rate increases proportionally. One of the most common sources of failure in large-scale systems is memory. In this paper, we propose a novel runtime for transparently exploiting memory content similarity to improve system resilience by reducing the rate at which memory errors lead to node failure. We evaluate the viability of this approach by examining memory snapshots collected from eight high-performance computing (HPC) applications and two important HPC operating systems. Based on the characteristics of the similarity uncovered, we conclude that our proposed approach shows promise for addressing system resilience in large-scale systems.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.