62 results on '"Gary Grider"'
Search Results
52. DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) Report: Top Ten Exascale Research Challenges
- Author
-
Peter M. Kogge, William Carlson, James H. Laros, Thomas Sterling, Clayton G. Webster, M. Harper Langston, Al Geist, Robert Ross, George Liang-Tai Chiu, James A. Ang, Keren Bergman, Dean Micron Klein, Richard Micron Murphy, Paul W. Coteus, Rick Stevens, Adolfy Hoisie, Laura Carrington, Jon Hiller, Vivek Sarkar, K. H. Kim, Robert Colwell, Robert Schreiber, Erik Debenedictus, Sven Leyffer, Gary Grider, Jeffrey Hittinger, Richard Lethin, Rud Haring, Jack Dongarra, Ron Brightwell, Stefan M. Wild, Robert F. Lucas, Jon Bashor, John Shalf, Shekhar Borkar, and William J. Dally
- Subjects
Engineering ,Engineering management ,Software ,business.industry ,Concurrency ,Big data ,Energy consumption ,Software engineering ,business ,Resilience (network) ,Memory performance ,Exascale computing - Abstract
Exascale computing systems are essential for the scientific fields that will transform the 21st century global economy, including energy, biotechnology, nanotechnology, and materials science. Progress in these fields is predicated on the ability to perform advanced scientific and engineering simulations, and analyze the deluge of data. On July 29, 2013, ASCAC was charged by Patricia Dehmer, the Acting Director of the Office of Science, to assemble a subcommittee to provide advice on exascale computing. This subcommittee was directed to return a list of no more than ten technical approaches (hardware and software) that will enable the development of a system that achieves the Department's goals for exascale computing. Numerous reports over the past few years have documented the technical challenges and the non¬-viability of simply scaling existing computer designs to reach exascale. The technical challenges revolve around energy consumption, memory performance, resilience, extreme concurrency, and big data. Drawing from these reports and more recent experience, this ASCAC subcommittee has identified the top ten computing technology advancements that are critical to making a capable, economically viable, exascale system.
- Published
- 2014
53. The Petascale Data Storage Institute
- Author
-
Peter Honeyman, Darrell D. E. Long, William Kramer, John Shalf, Philip C. Roth, Evan Felix, Gary Grider, Lee Ward, and Garth A. Gibson
- Subjects
Petascale computing ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,business.industry ,Information storage ,Computer science ,Computer data storage ,Interoperability ,Scientific discovery ,Oak Ridge National Laboratory ,National laboratory ,business ,Data science - Abstract
Petascale computing infrastructures for scientific discovery make petascale demands on information storage capacity, performance, concurrency, reliability, availability, and manageability.The Petascale Data Storage Institute focuses on the data storage problems found in petascale scientific computing environments, with special attention to community issues such as interoperability, community buy-in, and shared tools.The Petascale Data Storage Institute is a collaboration between researchers at Carnegie Mellon University, National Energy Research Scientific Computing Center, Pacific Northwest National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratory, Los Alamos National Laboratory, University of Michigan, and the University of California at Santa Cruz.
- Published
- 2013
54. I/O acceleration with pattern detection
- Author
-
Aaron Torres, Jun He, Carlos Maltzahn, Xian-He Sun, Gary Grider, John M. Bent, and Garth A. Gibson
- Subjects
File system ,020203 distributed computing ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,computer.software_genre ,Bottleneck ,Metadata ,Pattern detection ,0202 electrical engineering, electronic engineering, information engineering ,Latency (engineering) ,Predictability ,computer - Abstract
The I/O bottleneck in high-performance computing is becoming worse as application data continues to grow. In this work, we explore how patterns of I/O within these applications can significantly affect the effectiveness of the underlying storage systems and how these same patterns can be utilized to improve many aspects of the I/O stack and mitigate the I/O bottleneck. We offer three main contributions in this paper. First, we develop and evaluate algorithms by which I/O patterns can be efficiently discovered and described. Second, we implement one such algorithm to reduce the metadata quantity in a virtual parallel file system by up to several orders of magnitude, thereby increasing the performance of writes and reads by up to 40 and 480 percent respectively. Third, we build a prototype file system with pattern-aware prefetching and evaluate it to show a 46 percent reduction in I/O latency. Finally, we believe that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many high-performance applications, offers significant potential to enable many additional I/O optimizations.
- Published
- 2013
55. Jitter-free co-processing on a prototype exascale storage stack
- Author
-
John M. Bent, Sorin Faibish, James Ahrens, Percy Tzelnic, John Patchett, Gary Grider, and Jon Woodring
- Subjects
File system ,Petascale computing ,Stack (abstract data type) ,Parallel processing (DSP implementation) ,Computer science ,Pipeline (computing) ,Solid-state storage ,Parallel computing ,Supercomputer ,computer.software_genre ,computer ,Auxiliary memory - Abstract
In the petascale era, the storage stack used by the extreme scale high performance computing community is fairly homogeneous across sites. On the compute edge of the stack, file system clients or IO forwarding services direct IO over an interconnect network to a relatively small set of IO nodes. These nodes forward the requests over a secondary storage network to a spindle-based parallel file system. Unfortunately, this architecture will become unviable in the exascale era. As the density growth of disks continues to outpace increases in their rotational speeds, disks are becoming increasingly cost-effective for capacity but decreasingly so for bandwidth. Fortunately, new storage media such as solid state devices are filling this gap; although not cost-effective for capacity, they are so for performance. This suggests that the storage stack at exascale will incorporate solid state storage between the compute nodes and the parallel file systems. There are three natural places into which to position this new storage layer: within the compute nodes, the IO nodes, or the parallel file system. In this paper, we argue that the IO nodes are the appropriate location for HPC workloads and show results from a prototype system that we have built accordingly. Running a pipeline of computational simulation and visualization, we show that our prototype system reduces total time to completion by up to 30%.
- Published
- 2012
56. On the role of burst buffers in leadership-class storage systems
- Author
-
Robert Ross, Christopher D. Carothers, Philip Carns, Ning Liu, Carlos Maltzahn, Jason Cope, Gary Grider, and Adam Crume
- Subjects
I/O scheduling ,Computer science ,business.industry ,media_common.quotation_subject ,Suite ,Fidelity ,computer.software_genre ,Kernel (image processing) ,External storage ,Converged storage ,Computer data storage ,Scalability ,Operating system ,business ,computer ,media_common - Abstract
The largest-scale high-performance (HPC) systems are stretching parallel file systems to their limits in terms of aggregate bandwidth and numbers of clients. To further sustain the scalability of these file systems, researchers and HPC storage architects are exploring various storage system designs. One proposed storage system design integrates a tier of solid-state burst buffers into the storage system to absorb application I/O requests. In this paper, we simulate and explore this storage system design for use by large-scale HPC systems. First, we examine application I/O patterns on an existing large-scale HPC system to identify common burst patterns. Next, we describe enhancements to the CODES storage system simulator to enable our burst buffer simulations. These enhancements include the integration of a burst buffer model into the I/O forwarding layer of the simulator, the development of an I/O kernel description language and interpreter, the development of a suite of I/O kernels that are derived from observed I/O patterns, and fidelity improvements to the CODES models. We evaluate the I/O performance for a set of multiapplication I/O workloads and burst buffer configurations. We show that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived throughput goal.
- Published
- 2012
57. Storage challenges at Los Alamos National Lab
- Author
-
John M. Bent, Brett Michael Kettering, Adam Manzanares, Gary Grider, Meghan McClelland, Aaron Torres, and Alfred Torrez
- Subjects
Interactivity ,Computer science ,business.industry ,Path (computing) ,Scale (chemistry) ,Operating system ,High bandwidth ,Usability ,Latency (engineering) ,business ,computer.software_genre ,computer - Abstract
There yet exist no truly parallel file systems. Those that make the claim fall short when it comes to providing adequate concurrent write performance at large scale. This limitation causes large usability headaches in HPC. Users need two major capabilities missing from current parallel file systems. One, they need low latency interactivity. Two, they need high bandwidth for large parallel IO; this capability must be resistant to IO patterns and should not require tuning. There are no existing parallel file systems which provide these features. Frighteningly, exascale renders these features even less attainable from currently available parallel file systems. Fortunately, there is a path forward.
- Published
- 2012
58. Power use of disk subsystems in supercomputers
- Author
-
Matthew L. Curry, Jay E. Harris, David Martinez, H. Lee Ward, Jill Gemmill, and Gary Grider
- Subjects
Petascale computing ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,Computer science ,business.industry ,Power consumption ,Computer data storage ,Parallel computing ,Supercomputer ,business ,Power budget ,Electrical efficiency ,Primary problem ,Power (physics) - Abstract
Exascale will present many challenges to the HPC community, but the primary problem will likely be power consumption. Current petascale systems already use a significant fraction of the power that an exascale system will be allotted. In this paper, we show measurements for real I/O power use in three large systems. We show that I/O power use is proportionally fairly low per machine, between 4.4 and 5.5% of the total consumption. We use these measurements to motivate a burst-buffer checkpointing solution for power-efficient I/O at exascale. We estimated this solution to use approximately 6.6% of the exascale machine power budget, which is on par with today's systems.
- Published
- 2011
59. First Record of Single-Event Upset on Ground, Cray-1 Computer at Los Alamos in 1976
- Author
-
Tom Fairbanks, E Normand, Jerry L. Wert, Sarah E. Michalak, Heather Quinn, Gary Grider, P Iwanchuk, John P. Morrison, S Johnson, and Stephen A. Wender
- Subjects
Physics ,Nuclear and High Energy Physics ,Hardware_MEMORYSTRUCTURES ,Computer errors ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,Computer performance ,business.industry ,Electrical engineering ,Hardware_PERFORMANCEANDRELIABILITY ,Upset ,Nuclear Energy and Engineering ,Single event upset ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Electrical and Electronic Engineering ,business ,Hardware_LOGICDESIGN - Abstract
Records of bit flips in the Cray-1 computer installed at Los Alamos, NM, in 1976 lead to an upset rate in the Cray-1's bipolar SRAMs that correlates with the single-event upsets (SEUs) being induced by the atmospheric neutrons.
- Published
- 2010
60. PLFS
- Author
-
Gary Grider, Milo Polte, Ben McClelland, Meghan Wingate, James Nunez, John M. Bent, Paul Nowoczynski, and Garth A. Gibson
- Subjects
FOS: Computer and information sciences ,File system ,Computer science ,Computer file ,Distributed computing ,Device file ,computer.file_format ,Everything is a file ,computer.software_genre ,Unix file types ,Virtual file system ,Torrent file ,File Control Block ,Self-certifying File System ,Journaling file system ,Data file ,Data_FILES ,Operating system ,Versioning file system ,89999 Information and Computing Sciences not elsewhere classified ,Fork (file system) ,computer ,File system fragmentation - Abstract
Parallel applications running across thousands of processors must protect themselves from inevitable system failures. Many applications insulate themselves from failures by checkpointing. For many applications, checkpointing into a shared single file is most convenient. With such an approach, the size of writes are often small and not aligned with file system boundaries. Unfortunately for these applications, this preferred data layout results in pathologically poor performance from the underlying file system which is optimized for large, aligned writes to non-shared files. To address this fundamental mismatch, we have developed a virtual parallel log structured file system, PLFS. PLFS remaps an application’s preferred data layout into one which is optimized for the underlying file system. Through testing on PanFS, Lustre, and GPFS, we have seen that this layer of indirection and reorganization can reduce checkpoint time by an order of magnitude for several important benchmarks and real applications without any application modification.
- Published
- 2009
61. A Cost-Effective, High Bandwidth Server I/O network Architecture for Cluster Systems
- Author
-
Gary Grider, Hsing-bung Chen, and Parks Fields
- Subjects
Input/output ,Network architecture ,Computer science ,business.industry ,Gigabit Ethernet ,Pascal (programming language) ,computer.software_genre ,File server ,Grid computing ,Server farm ,Software deployment ,Computer cluster ,Scalability ,Operating system ,Bandwidth (computing) ,business ,computer ,computer.programming_language ,Computer network - Abstract
In this paper we present a cost-effective, high bandwidth server I/O network architecture, named PaScal (Parallel and Scalable). We use the PaScal server I/O network to support data-intensive scientific applications running on very large-scale Linux clusters. PaScal server I/O network architecture provides (1) bi-level data transfer network by combining high speed interconnects for computing inter-process communication (IPC) requirements and low-cost gigabit Ethernet interconnect for global IP based storage/file access, (2) bandwidth on demand I/O network architecture without re-wiring and reconfiguring the system, (3) multi-path routing scheme, (4) reliability improvement through reducing large number of network components in server I/O network, and (5) global storage/file systems support in heterogeneous multi-cluster and grids environments. We have compared the PaScal server I/O network architecture with the federated server I/O network architecture (FESIO). Concurrent MPI-I/O performance testing results and deployment cost comparison demonstrate that the PaScal server I/O network architecture can outperform the FESIO network architecture in many categories: cost-effectiveness, scalability, and manageability and ease of large-scale I/O network.
- Published
- 2007
62. PaScal-- A New Parallel and Scalable Server IO Networking Infrastructure for Supporting Global Storage/File Systems in Large-size Linux Clusters
- Author
-
Steve Poole, Garth A. Gibson, S. Khalsa, James Nunez, R. Wacha, Hsing-bung Chen, A. Matthews, R. Martinez, P. Martinez, Gary Grider, and P. Fields
- Subjects
File system ,Computer science ,business.industry ,Pascal (programming language) ,Load balancing (computing) ,computer.software_genre ,Inter-process communication ,File server ,Computer cluster ,Multipath routing ,Operating system ,business ,computer ,Global file system ,computer.programming_language ,Computer network - Abstract
This paper presents the design and implementation of a new I/O networking infrastructure, named PaScal (parallel and scalable I/O networking framework). PaScal is used to support high data bandwidth IP based global storage systems for large scale Linux clusters. PaScal has several unique properties. It employs (1) Multi-level switch-fabric interconnection network by combining high speed interconnects for computing inter-process communication (IPC) requirements and low-cost Gigabit Ethernet interconnect for global IP based storage/file access, (2) A bandwidth on demand scaling I/O networking architecture, (3) open-standard IP networks (routing and switching), (4) multipath routing for load balancing and failover, (5) open shortest path first (OSPF) routing software, and (6) Supporting a global file system in multi-cluster and multi-platform environments. We describe both the hardware and software components of our proposed PaScal. We have implemented the PaScal I/O infrastructure on several large-size Linux clusters at LANL. We have conducted a sequence of parallel MPI-IO assessment benchmarks on LANL's Pink 1024 node Linux cluster and the Panasas global parallel file system. Performance results from our parallel MPI-IO benchmarks on the Pink cluster demonstrate that the PaScal I/O Infrastructure is robust and capable of scaling in bandwidth on large-size Linux clusters.
- Published
- 2006
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.