236 results on '"Rupak Biswas"'
Search Results
102. FastMap: A Distributed Scheme for Mapping Large Scale Applications onto Computational Grids.
- Author
-
Amit Jain, Soumya Sanyal 0002, Sajal K. Das 0001, and Rupak Biswas
- Published
- 2004
- Full Text
- View/download PDF
103. An Analysis of Performance Enhancement Techniques for Overset Grid Applications.
- Author
-
M. Jahed Djomehri, Rupak Biswas, Mark Potsdam, and Roger C. Strawn
- Published
- 2003
- Full Text
- View/download PDF
104. Job Superscheduler Architecture and Performance in Computational Grid Environments.
- Author
-
Hongzhang Shan, Leonid Oliker, and Rupak Biswas
- Published
- 2003
- Full Text
- View/download PDF
105. Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations.
- Author
-
Leonid Oliker, Andrew Canning, Jonathan Carter, John Shalf, David Skinner, Stéphane Ethier, Rupak Biswas, M. Jahed Djomehri, and Rob F. Van der Wijngaart
- Published
- 2003
- Full Text
- View/download PDF
106. A Latency-Tolerant Partitioner for Distributed Computing on the Information Power Grid.
- Author
-
Sajal K. Das 0001, Daniel J. Harvey, and Rupak Biswas
- Published
- 2001
- Full Text
- View/download PDF
107. Message Passing Vs. Shared Address Space on a Clusters of SMPs.
- Author
-
Hongzhang Shan, Jaswinder Pal Singh, Leonid Oliker, and Rupak Biswas
- Published
- 2001
- Full Text
- View/download PDF
108. A Comparison of Three Programming Models for Adaptive Applications on the Origin2000.
- Author
-
Hongzhang Shan, Jaswinder Pal Singh, Leonid Oliker, and Rupak Biswas
- Published
- 2000
- Full Text
- View/download PDF
109. Parallel adaptive high-order CFD simulations characterising SOFIA cavity acoustics
- Author
-
Christoph Brehm, Rupak Biswas, Michael F. Barad, and Cetin C. Kiris
- Subjects
0209 industrial biotechnology ,Theoretical computer science ,Computer science ,Infrared telescope ,Computational Mechanics ,Energy Engineering and Power Technology ,Aerospace Engineering ,02 engineering and technology ,Computational fluid dynamics ,01 natural sciences ,010305 fluids & plasmas ,Computational science ,law.invention ,Telescope ,020901 industrial engineering & automation ,law ,0103 physical sciences ,business.industry ,Stratospheric Observatory for Infrared Astronomy ,Adaptive mesh refinement ,Mechanical Engineering ,Astrophysics::Instrumentation and Methods for Astrophysics ,Immersed boundary method ,Condensed Matter Physics ,Supercomputer ,Fuselage ,Mechanics of Materials ,business - Abstract
This paper presents large-scale parallel computational fluid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy SOFIA. SOFIA is an airborne, 2.5-m infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady flow field inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge–Kutta, and a spatially fifth-order accurate WENO-5Z scheme were used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh refinement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32 k CPU cores and 4 billion computational cells show excellent scaling. Dynamic load balancing based on execution time on individual adaptive mesh refinement AMR blocks addresses irregular numerical cost associated with blocks containing boundaries. Limits to scaling beyond 32 k cores are identified, and targeted code optimisations are discussed.
- Published
- 2016
- Full Text
- View/download PDF
110. Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms.
- Author
-
Leonid Oliker and Rupak Biswas
- Published
- 1999
- Full Text
- View/download PDF
111. Quantum supremacy using a programmable superconducting processor
- Author
-
Josh Mutus, Rami Barends, Pedram Roushan, Andre Petukhov, Erik Lucero, Roberto Collins, Hartmut Neven, Paul V. Klimov, Z. Jamie Yao, Austin G. Fowler, Julian Kelly, Xiao Mi, Ryan Babbush, Matthew P. Harrigan, Marissa Giustina, David Landhuis, Jarrod R. McClean, B. Burkett, Joseph C. Bardin, Michael J. Hartmann, Rupak Biswas, Amit Vainsencher, Steve Habegger, Daniel Sank, Eric Ostby, William Courtney, Alexander N. Korotkov, Alan Ho, Keith Guerin, Ofer Naaman, Ping Yeh, Frank Arute, Kevin Sung, Zhang Jiang, Mike Lindmark, Markus R. Hoffmann, Salvatore Mandrà, Matthew D. Trevithick, Fernando G. S. L. Brandão, Dave Bacon, Anthony Megrant, Trent Huang, Theodore White, Andrew Dunsworth, Ben Chiaro, Kristel Michielsen, Adam Zalcman, David A. Buell, Evan Jeffrey, Benjamin Villalonga, John M. Martinis, Kevin J. Satzinger, Eleanor Rieffel, John Platt, Masoud Mohseni, Sergei V. Isakov, R. Graff, Sergio Boixo, Nicholas C. Rubin, Fedor Kostritsa, Dmitry I. Lyakh, Murphy Yuezhen Niu, Sergey Knysh, Kunal Arya, Zijun Chen, Matthew Neeley, Travis S. Humble, Craig Gidney, Chris Quintana, Charles Neill, Yu Chen, Dvir Kafri, Matt McEwen, Brooks Foxen, Vadim Smelyanskiy, Kostyantyn Kechedzhi, and Edward Farhi
- Subjects
Superconductivity ,Multidisciplinary ,Programming language ,Computer science ,Section (typography) ,02 engineering and technology ,021001 nanoscience & nanotechnology ,computer.software_genre ,01 natural sciences ,law.invention ,Consistency (statistics) ,law ,0103 physical sciences ,CLARITY ,ddc:500 ,010306 general physics ,0210 nano-technology ,computer ,Quantum - Abstract
The promise of quantum computers is that certain computational tasks might be executed exponentially faster on a quantum processor than on a classical processor1. A fundamental challenge is to build a high-fidelity processor capable of running quantum algorithms in an exponentially large computational space. Here we report the use of a processor with programmable superconducting qubits to create quantum states on 53 qubits, corresponding to a computational state-space of dimension 2^53 (about 10^16). Measurements from repeated experiments sample the resulting probability distribution, which we verify using classical simulations. Our Sycamore processor takes about 200 seconds to sample one instance of a quantum circuit a million times—our benchmarks currently indicate that the equivalent task for a state-of-the-art classical supercomputer would take approximately 10,000 years. This dramatic increase in speed compared to all known classical algorithms is an experimental realization of quantum supremacy for this specific computational task, heralding a much-anticipated computing paradigm.
- Published
- 2019
- Full Text
- View/download PDF
112. Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems.
- Author
-
Rupak Biswas, Leonid Oliker, and Andrew Sohn
- Published
- 1996
- Full Text
- View/download PDF
113. Quantum Annealing Applied to De-Conflicting Optimal Trajectories for Air Traffic Management
- Author
-
Rupak Biswas, Banavar Sridhar, Hokkwan Ng, Eleanor Rieffel, Davide Venturelli, Salvatore Mandrà, Olga Rodionova, Tobias Stollenwerk, and Bryan O'Gorman
- Subjects
FOS: Computer and information sciences ,050210 logistics & transportation ,Mathematical optimization ,Quantum Physics ,Discretization ,Computer science ,Mechanical Engineering ,05 social sciences ,Air traffic management ,Quantum annealing ,High Performance Computing ,FOS: Physical sciences ,Supercomputer ,Computer Science Applications ,Quadratic equation ,0502 economics and business ,Automotive Engineering ,Conflict resolution ,Computer Science - Data Structures and Algorithms ,Graph (abstract data type) ,Data Structures and Algorithms (cs.DS) ,Quantum Computing ,Quantum Physics (quant-ph) ,Quantum computer - Abstract
We present the mapping of a class of simplified air traffic management (ATM) problems (strategic conflict resolution) to quadratic unconstrained boolean optimization (QUBO) problems. The mapping is performed through an original representation of the conflict-resolution problem in terms of a conflict graph, where nodes of the graph represent flights and edges represent a potential conflict between flights. The representation allows a natural decomposition of a real world instance related to wind-optimal trajectories over the Atlantic ocean into smaller subproblems, that can be discretized and are amenable to be programmed in quantum annealers. In the study, we tested the new programming techniques and we benchmark the hardness of the instances using both classical solvers and the D-Wave 2X and D-Wave 2000Q quantum chip. The preliminary results show that for reasonable modeling choices the most challenging subproblems which are programmable in the current devices are solved to optimality with 99% of probability within a second of annealing time., Paper accepted for publication on: IEEE Transactions on Intelligent Transportation Systems
- Published
- 2018
114. A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware
- Author
-
Salvatore Mandrà, Eleanor Rieffel, Bron Nelson, Benjamin Villalonga, Rupak Biswas, Christopher E. Henze, and Sergio Boixo
- Subjects
Quantum Physics ,Computer Networks and Communications ,business.industry ,Computer science ,Computation ,Rejection sampling ,FOS: Physical sciences ,Statistical and Nonlinear Physics ,FLOPS ,Single-precision floating-point format ,lcsh:QC1-999 ,lcsh:QA75.5-76.95 ,Quantum circuit ,Computational Theory and Mathematics ,Computer Science (miscellaneous) ,Benchmark (computing) ,Overhead (computing) ,Tensor ,lcsh:Electronic computers. Computer science ,business ,Quantum Physics (quant-ph) ,Computer hardware ,Simulation ,lcsh:Physics - Abstract
Here we present qFlex, a flexible tensor network based quantum circuit simulator. qFlex can compute both exact amplitudes, essential for the verification of the quantum hardware, as well as low fidelity amplitudes, in order to mimic sampling from Noisy Intermediate-Scale Quantum (NISQ) devices. In this work, we focus on random quantum circuits (RQCs) in the range of sizes expected for supremacy experiments. Fidelity $f$ simulations are performed at a cost that is $1/f$ lower than perfect fidelity ones. We also present a technique to eliminate the overhead introduced by rejection sampling in most tensor network approaches. We benchmark the simulation of square lattices and Google's Bristlecone QPU. Our analysis is supported by extensive simulations on NASA HPC clusters Pleiades and Electra. For our most computationally demanding simulation, the two clusters combined reached a peak of 20 PFLOPS (single precision), i.e., $64\%$ of their maximum achievable performance, which represents the largest numerical computation in terms of sustained FLOPs and number of nodes utilized ever run on NASA HPC clusters. Finally, we introduce a novel multithreaded, cache-efficient tensor index permutation algorithm of general application., Comment: Paper published on NPJ Quantum Information
- Published
- 2018
- Full Text
- View/download PDF
115. Pleiades: NASA’s First Petascale Supercomputer
- Author
-
William Thigpen, Bryan A. Biegel, Piyush Mehrotra, John Parks, Chris Henze, Rupak Biswas, Robert Ciotti, and Robert Hood
- Subjects
Engineering ,Petascale computing ,business.industry ,Parallel computing ,business ,Pleiades ,Supercomputer - Published
- 2017
- Full Text
- View/download PDF
116. Quantum Approximate Optimization with Hard and Soft Constraints
- Author
-
Rupak Biswas, Davide Venturelli, Bryan O'Gorman, Zhihui Wang, Eleanor Rieffel, and Stuart Hadfield
- Subjects
Mathematical optimization ,Optimization problem ,Computer science ,Quantum annealing ,Constrained optimization ,Approximation algorithm ,01 natural sciences ,010305 fluids & plasmas ,0103 physical sciences ,Derivative-free optimization ,010306 general physics ,Heuristics ,Metaheuristic ,Quantum computer - Abstract
Challenging computational problems arising in the practical world are frequently tackled by heuristic algorithms. Small universal quantum computers will emerge in the next year or two, enabling a substantial broadening of the types of quantum heuristics that can be investigated beyond quantum annealing. The immediate question is: what experiments should we prioritize that will give us insight into quantum heuristics? One leading candidate is the quantum approximate optimization algorithm (QAOA) metaheuristic. In this work, we provide a framework for designing QAOA circuits for a variety of combinatorial optimization problems with both hard constraints that must be met and soft constraints whose violation we wish to minimize. We work through a number of examples, and discuss design principles.
- Published
- 2017
- Full Text
- View/download PDF
117. Opportunities and challenges for quantum-assisted machine learning in near-term quantum computers
- Author
-
Rupak Biswas, Alejandro Perdomo-Ortiz, John Realpe-Gómez, and Marcello Benedetti
- Subjects
FOS: Computer and information sciences ,0301 basic medicine ,Physics and Astronomy (miscellaneous) ,Computer science ,Materials Science (miscellaneous) ,FOS: Physical sciences ,Computer Science - Emerging Technologies ,Machine learning ,computer.software_genre ,01 natural sciences ,03 medical and health sciences ,0103 physical sciences ,Electrical and Electronic Engineering ,010306 general physics ,Quantum ,Quantum computer ,Quantum Physics ,business.industry ,Deep learning ,Supervised learning ,Atomic and Molecular Physics, and Optics ,Helmholtz machine ,Generative model ,Emerging Technologies (cs.ET) ,030104 developmental biology ,Qubit ,ComputerSystemsOrganization_MISCELLANEOUS ,Key (cryptography) ,Artificial intelligence ,Quantum Physics (quant-ph) ,business ,computer - Abstract
With quantum computing technologies nearing the era of commercialization and quantum supremacy, machine learning (ML) appears as one of the promising "killer" applications. Despite significant effort, there has been a disconnect between most quantum ML proposals, the needs of ML practitioners, and the capabilities of near-term quantum devices to demonstrate quantum enhancement in the near future. In this contribution to the focus collection on "What would you do with 1000 qubits?", we provide concrete examples of intractable ML tasks that could be enhanced with near-term devices. We argue that to reach this target, the focus should be on areas where ML researchers are struggling, such as generative models in unsupervised and semi-supervised learning, instead of the popular and more tractable supervised learning techniques. We also highlight the case of classical datasets with potential quantum-like statistical correlations where quantum models could be more suitable. We focus on hybrid quantum-classical approaches and illustrate some of the key challenges we foresee for near-term implementations. Finally, we introduce the quantum-assisted Helmholtz machine (QAHM), an attempt to use near-term quantum devices to tackle high-dimensional datasets of continuous variables. Instead of using quantum computers to assist deep learning, as previous approaches do, the QAHM uses deep learning to extract a low-dimensional binary representation of data, suitable for relatively small quantum processors which can assist the training of an unsupervised generative model. Although we illustrate this concept on a quantum annealer, other quantum platforms could benefit as well from this hybrid quantum-classical framework., Contribution to the special issue of Quantum Science & Technology (QST) on "What would you do with 1000 qubits"
- Published
- 2017
118. Estimation of effective temperatures in quantum annealers for sampling applications: A case study with possible applications in deep learning
- Author
-
Alejandro Perdomo-Ortiz, Marcello Benedetti, John Realpe-Gómez, and Rupak Biswas
- Subjects
Physics ,Quantum Physics ,Restricted Boltzmann machine ,Speedup ,business.industry ,Deep learning ,FOS: Physical sciences ,Sampling (statistics) ,Sample (statistics) ,01 natural sciences ,010305 fluids & plasmas ,symbols.namesake ,Theoretical physics ,0103 physical sciences ,Boltzmann constant ,symbols ,Artificial intelligence ,Quantum Physics (quant-ph) ,010306 general physics ,business ,Algorithm ,Quantum ,Block (data storage) - Abstract
An increase in the efficiency of sampling from Boltzmann distributions would have a significant impact on deep learning and other machine-learning applications. Recently, quantum annealers have been proposed as a potential candidate to speed up this task, but several limitations still bar these state-of-the-art technologies from being used effectively. One of the main limitations is that, while the device may indeed sample from a Boltzmann-like distribution, quantum dynamical arguments suggest it will do so with an {\it instance-dependent} effective temperature, different from its physical temperature. Unless this unknown temperature can be unveiled, it might not be possible to effectively use a quantum annealer for Boltzmann sampling. In this work, we propose a strategy to overcome this challenge with a simple effective-temperature estimation algorithm. We provide a systematic study assessing the impact of the effective temperatures in the learning of a special class of a restricted Boltzmann machine embedded on quantum hardware, which can serve as a building block for deep-learning architectures. We also provide a comparison to $k$-step contrastive divergence (CD-$k$) with $k$ up to 100. Although assuming a suitable fixed effective temperature also allows us to outperform one step contrastive divergence (CD-1), only when using an instance-dependent effective temperature do we find a performance close to that of CD-100 for the case studied here., New appendix and figure comparing to other temperature estimation techniques from the statistical physics community. 15 pages, 6 figures
- Published
- 2016
- Full Text
- View/download PDF
119. The Impact of High-End Computing on NASA Missions
- Author
-
J. Hardman, Lorien Wheeler, S. Rogers, F. R. Bailey, J. Dunbar, and Rupak Biswas
- Subjects
ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,business.industry ,Computer science ,Science and engineering ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Numerical models ,Supercomputer ,NASA spin-off technologies ,Computer Science Applications ,High end computing ,Aeronautics ,Hardware and Architecture ,Aerospace ,business ,History of computing ,Software ,Research center ,Simulation - Abstract
The NASA Advanced Supercomputing (NAS) facility at Ames Research Center has enabled remarkable breakthroughs in the space agency's science and engineering missions. For 30 years, NAS experts have influenced the state of the art in high-performance computing and related technologies.
- Published
- 2012
- Full Text
- View/download PDF
120. Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models
- Author
-
Rupak Biswas, John Realpe-Gómez, Alejandro Perdomo-Ortiz, and Marcello Benedetti
- Subjects
FOS: Computer and information sciences ,Quantum Physics ,Speedup ,Computer science ,Physics ,QC1-999 ,General Physics and Astronomy ,Sampling (statistics) ,FOS: Physical sciences ,01 natural sciences ,010305 fluids & plasmas ,Machine Learning (cs.LG) ,Computer Science - Learning ,Computer engineering ,ComputerSystemsOrganization_MISCELLANEOUS ,0103 physical sciences ,Probability distribution ,Graphical model ,010306 general physics ,Quantum Physics (quant-ph) ,Quantum ,Quantum computer - Abstract
Mainstream machine-learning techniques such as deep learning and probabilistic programming rely heavily on sampling from generally intractable probability distributions. There is increasing interest in the potential advantages of using quantum computing technologies as sampling engines to speed up these tasks or to make them more effective. However, some pressing challenges in state-of-the-art quantum annealers have to be overcome before we can assess their actual performance. The sparse connectivity, resulting from the local interaction between quantum bits in physical hardware implementations, is considered the most severe limitation to the quality of constructing powerful generative unsupervised machine-learning models. Here we use embedding techniques to add redundancy to data sets, allowing us to increase the modeling capacity of quantum annealers. We illustrate our findings by training hardware-embedded graphical models on a binarized data set of handwritten digits and two synthetic data sets in experiments with up to 940 quantum bits. Our model can be trained in quantum hardware without full knowledge of the effective parameters specifying the corresponding quantum Gibbs-like distribution; therefore, this approach avoids the need to infer the effective temperature at each iteration, speeding up learning; it also mitigates the effect of noise in the control parameters, making it robust to deviations from the reference Gibbs distribution. Our approach demonstrates the feasibility of using quantum annealers for implementing generative models, and it provides a suitable framework for benchmarking these quantum technologies on machine-learning-related tasks., Comment: 17 pages, 8 figures. Minor further revisions. As published in Phys. Rev. X
- Published
- 2016
- Full Text
- View/download PDF
121. A divide-and-conquer/cellular-decomposition framework for million-to-billion atom simulations of chemical reactions
- Author
-
Fuyuki Shimojo, William A. Goddard, Priya Vashishta, Ken-ichi Nomura, Rajiv K. Kalia, Rupak Biswas, Ashish Sharma, Aiichiro Nakano, Deepak Srivastava, and Adri C. T. van Duin
- Subjects
Physics ,Divide and conquer algorithms ,General Computer Science ,Computation ,General Physics and Astronomy ,General Chemistry ,Load balancing (computing) ,Supercomputer ,Grid ,Computational science ,Computational Mathematics ,Multigrid method ,Mechanics of Materials ,Quantum mechanics ,Benchmark (computing) ,General Materials Science ,Cellular decomposition - Abstract
To enable large-scale atomistic simulations of material processes involving chemical reactions, we have designed linear-scaling molecular dynamics (MD) algorithms based on an embedded divide-and-conquer (EDC) framework: first principles-based fast reactive forcefield (F-ReaxFF) MD; and quantum-mechanical MD in the framework of the density functional theory (DFT) on adaptive multigrids. To map these O(N) algorithms onto parallel computers with deep memory hierarchies, we have developed a tunable hierarchical cellulardecomposition (THCD) framework, which achieves performance tunability through a hierarchy of parameterized cell data/computation structures and adaptive load balancing through wavelet-based computational-space decomposition. Benchmark tests on 1920 Itanium2 processors of the NASA Columbia supercomputer have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations—0.56 billion-atom F-ReaxFF MD and 1.4 million-atom (0.12 trillion grid points) EDC–DFT MD—in addition to 18.9 billion-atom non reactive space–time multiresolution MD. The EDC and THCD frameworks expose maximal data localities, and consequently the isogranular parallel efficiency on 1920 processors is as high as 0.953. Chemically reactive MD simulations have been applied to shock-initiated detonation of energetic materials and stress-induced bond breaking in ceramics in corrosive environments.
- Published
- 2007
- Full Text
- View/download PDF
122. Unstructured adaptive meshes: bad for your memory?
- Author
-
Rob F. Van der Wijngaart, Rupak Biswas, and Huiyu Feng
- Subjects
Numerical Analysis ,business.industry ,Applied Mathematics ,Uniform memory access ,Parallel computing ,Bottleneck ,CAS latency ,Domain (software engineering) ,Computational Mathematics ,Software ,Feature (computer vision) ,Benchmark (computing) ,Polygon mesh ,business ,Algorithm ,Mathematics - Abstract
The most important performance bottleneck in modern high-end computers is access to memory. Many forms of hardware and software support for reducing memory latency exist, but certain important applications defy these. Examples of such applications are unstructured adaptive (UA) mesh problems, which feature irregular, dynamically changing memory access. We describe a new benchmark program, called UA, for measuring the performance of computer systems when solving such problems. It complements the existing NAS Parallel Benchmarks suite that deals mainly with static, regular-stride memory references. The UA benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh. We describe the numerical and implementation issues, and also present some interesting performance results.
- Published
- 2005
- Full Text
- View/download PDF
123. Message passing and shared address space parallelism on an SMP cluster
- Author
-
Rupak Biswas, Hongzhang Shan, Leonid Oliker, and Jaswinder Pal Singh
- Subjects
Distributed shared memory ,Computer Networks and Communications ,Computer science ,Address space ,Distributed computing ,Message passing ,Locality ,Message Passing Interface ,Parallel computing ,Computer Graphics and Computer-Aided Design ,Theoretical Computer Science ,Artificial Intelligence ,Hardware and Architecture ,Scalability ,Programming paradigm ,Implementation ,Software - Abstract
Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex commumcation patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI + SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
- Published
- 2003
- Full Text
- View/download PDF
124. Determination and correction of persistent biases in quantum annealers
- Author
-
Vadim Smelyanskiy, Joseph Fluegemann, Rupak Biswas, Alejandro Perdomo-Ortiz, and Bryan O'Gorman
- Subjects
Quantum Physics ,Multidisciplinary ,Computer science ,FOS: Physical sciences ,computer.software_genre ,01 natural sciences ,Article ,010305 fluids & plasmas ,0103 physical sciences ,Data mining ,Computational problem ,010306 general physics ,Quantum Physics (quant-ph) ,Quantum ,computer ,Algorithm ,Quantum computer - Abstract
Calibration of quantum computing technologies is essential to the effective utilization of their quantum resources. Specifically, the performance of quantum annealers is likely to be significantly impaired by noise in their programmable parameters, effectively misspecification of the computational problem to be solved, often resulting in spurious suboptimal solutions. We developed a strategy to determine and correct persistent, systematic biases between the actual values of the programmable parameters and their user-specified values. We applied the recalibration strategy to two D-Wave Two quantum annealers, one at NASA Ames Research Center in Moffett Field, California, and another at D-Wave Systems in Burnaby, Canada. We show that the recalibration procedure not only reduces the magnitudes of the biases in the programmable parameters but also enhances the performance of the device on a set of random benchmark instances., Comment: 12 pages, 5 figures
- Published
- 2015
- Full Text
- View/download PDF
125. Message from the CSE 2009 General Chairs
- Author
-
DI MARTINO, Beniamino, Hai xiang Lin, Rupak Biswas, DI MARTINO, Beniamino, Hai xiang, Lin, and Rupak, Biswas
- Published
- 2009
126. A Quantum Approach to Diagnosis of Multiple Faults in Electrical Power Systems
- Author
-
Rupak Biswas, Joseph Flueguemann, Vadim Smelyanskiy, Sriram Narasimhan, and Alejandro Perdomo-Ortiz
- Subjects
Set (abstract data type) ,Theoretical computer science ,Computer science ,Qubit ,Quantum annealing ,Embedding ,Quadratic unconstrained binary optimization ,D-Wave Two ,Quantum algorithm ,Algorithm ,Quantum computer - Abstract
Diagnosing the minimal set of faults capable of explaining a set of given observations, e.g., From sensor readouts, is a hard combinatorial optimization problem usually tackled with artificial intelligence techniques. We present the mapping of this combinatorial problem to quadratic unconstrained binary optimization (QUBO), and some preliminary experimental results of instances embedded onto the 509 qubit NASAGoogle-USRA quantum annealer. This is the first application with the route Problem > QUBO > Direct embedding into quantum hardware, where we are able to implement and tackle problem instances with sizes that go beyond previously reported toy-model proof-of-principle implementations. We believe that these results represent a significant leap in the solution of problems via direct-embedding quantum optimization.
- Published
- 2014
- Full Text
- View/download PDF
127. Parallel dynamic load balancing strategies for adaptive irregular applications
- Author
-
Leonid Oliker, Rupak Biswas, Daniel J. Harvey, and Sajal K. Das
- Subjects
Computer science ,Modeling and Simulation ,Computation ,Distributed computing ,Modelling and Simulation ,Applied Mathematics ,Dynamic load balancing ,Unstructured mesh ,Overhead (computing) ,Parallel computing ,Computational problem ,Adaptation (computer science) ,Mesh adaptation - Abstract
Dynamic unstructured mesh adaptation is a powerful technique for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult because of the load imbalance that mesh adaptation creates. To address this problem, we have developed two dynamic load balancing strategies for parallel adaptive irregular applications. The first, called PLUM, is an architecture-independent framework particularly geared toward adaptive numerical computations and requires that all data be globally redistributed after each adaptation to achieve load balance. The second is a more general-purpose topology-independent load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Results indicate that both PLUM and the SBN-based approach have their relative merits, and that they achieve excellent load balance at the cost of minimal extra overhead.
- Published
- 2000
- Full Text
- View/download PDF
128. Tetrahedral and hexahedral mesh adaptation for CFD problems
- Author
-
Rupak Biswas and Roger C. Strawn
- Subjects
Numerical Analysis ,business.industry ,Applied Mathematics ,Numerical analysis ,Computational fluid dynamics ,Grid ,Data structure ,Computational science ,Computational Mathematics ,Tetrahedron ,Polygon mesh ,Hexahedron ,business ,Algorithm ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics ,Subdivision - Abstract
This paper presents two unstructured mesh adaptation schemes for problems in computational fluid dynamics. The procedures allow localized grid refinement and coarsening to efficiently capture aerodynamic flow features of interest. The first procedure is for purely tetrahedral grids; unfortunately, repeated anisotropic adaptation may significantly deteriorate the quality of the mesh. Hexahedral elements, on the other hand, can be subdivided anisotropically without mesh quality problems. Furthermore, hexahedral meshes yield more accurate solutions than their tetrahedral counterparts for the same number of edges. Both the tetrahedral and hexahedral mesh adaptation procedures use edge-based data structures that facilitate efficient subdivision by allowing individual edges to be marked for refinement or coarsening. However, for hexahedral adaptation, pyramids, prisms, and tetrahedra are used as buffer elements between refined and unrefined regions to eliminate hanging vertices. Computational results indicate that the hexahedral adaptation procedure is a viable alternative to adaptive tetrahedral schemes.
- Published
- 1998
- Full Text
- View/download PDF
129. A Quantum Annealing Approach for Fault Detection and Diagnosis of Graph-Based Systems
- Author
-
Alejandro Perdomo-Ortiz, Joseph Fluegemann, Sriram Narasimhan, Rupak Biswas, and Vadim Smelyanskiy
- Subjects
Quantum Physics ,Computer science ,Quantum annealing ,General Physics and Astronomy ,FOS: Physical sciences ,Fault detection and isolation ,Set (abstract data type) ,Embedding ,General Materials Science ,D-Wave Two ,Quadratic unconstrained binary optimization ,Physical and Theoretical Chemistry ,Quantum information ,Quantum Physics (quant-ph) ,Algorithm ,Quantum - Abstract
Diagnosing the minimal set of faults capable of explaining a set of given observations, e.g., from sensor readouts, is a hard combinatorial optimization problem usually tackled with artificial intelligence techniques. We present the mapping of this combinatorial problem to quadratic unconstrained binary optimization (QUBO), and the experimental results of instances embedded onto a quantum annealing device with 509 quantum bits. Besides being the first time a quantum approach has been proposed for problems in the advanced diagnostics community, to the best of our knowledge this work is also the first research utilizing the route Problem $\rightarrow$ QUBO $\rightarrow$ Direct embedding into quantum hardware, where we are able to implement and tackle problem instances with sizes that go beyond previously reported toy-model proof-of-principle quantum annealing implementations; this is a significant leap in the solution of problems via direct-embedding adiabatic quantum optimization. We discuss some of the programmability challenges in the current generation of the quantum device as well as a few possible ways to extend this work to more complex arbitrary network graphs.
- Published
- 2014
- Full Text
- View/download PDF
130. Vitrectorhexis versus forceps posterior capsulorhexis in pediatric cataract surgery
- Author
-
Ajoy Paul, Rupak Biswas, Lav Kochgaway, Puspen Maity, Sourav Sinha, Partha Biswas, and Sumita Banerjee
- Subjects
ologen ,endovascular treatment ,Male ,retina ,Carotid-cavernous fistula ,Choroidal neovascular membrane ,genetic structures ,medicine.medical_treatment ,intraocular lens ,Vitrectomy ,melatonin ,pituitary adenoma ,Anxiety ,Graves′ ophthalmopathy ,lcsh:Ophthalmology ,health professionals ,pediatric cataract ,pain ,Prospective Studies ,Child ,Capsulorhexis ,mitomycin C ,Air ,trabeculectomy ,Follow up studies ,intravitreal injection ,cataract surgery ,Equipment Design ,photodynamic therapy ,Autologous stem cell rescue ,Doxycycline ,Child, Preschool ,conjunctival erosion ,Female ,Pediatric cataract ,topical anesthesia ,medicine.medical_specialty ,bone marrow ,visual symptoms ,Forceps ,rabbit ,Lens Capsule, Crystalline ,Posterior capsulorhexis ,Brief Communication ,idiopathic macular telangiectasia ,methotrexate ,retinoblastoma ,Cataract ,Surgical time ,Ophthalmology ,medicine ,Electron microscopy ,metastasis ,Humans ,ranibizumab ,pseudophakos ,near point of convergence ,Macular edema ,optical coherence tomography ,business.industry ,healthcare workers ,Ahmed Glaucoma Valve ,pseudoexfoliation ,Infant ,Glaucoma ,Contact lens ,eye diseases ,Surgery ,Collagen implant ,Vitreous Body ,inflammation ,lcsh:RE1-994 ,vitrectorhexis ,Implant ,sense organs ,Binocular vision ,business ,intraocular pressure ,light microscopy ,Follow-Up Studies - Abstract
This study was done to compare the results of posterior continuous curvilinear capsulorhexis created using forceps with those created using vitrector in eyes suffering from congenital cataract. Vitrectorhexis term was first used by Wilson et al in 1999. [1] Fifty eyes with congenital and developmental cataract were included in this study. The posterior capsulorhexis was created using utrata forceps in 17 eyes or through a vitrector in 33 eyes. Forceps capsulorhexis was performed before IOL implantation, while vitrectorhexis was performed after IOL implantation in the bag. The results of both the surgery were compared using the following criteria: incidence of extension of rhexis, ability to achieve posterior rhexis of appropriate size, ability to implant the IOL in the bag, the surgical time, and learning curve. Vitrectorhexis after IOL implantation was an easy to learn alternative to manual posterior continuous curvilinear capsulorhexis in pediatric cataract surgery. It was more predictable and reproducible, with a short learning curve and lesser surgical time.
- Published
- 2013
131. HELICOPTER NOISE PREDICTIONS USING KIRCHHOFF METHODS
- Author
-
Roger C. Strawn, Anastasios S. Lyrintzis, and Rupak Biswas
- Subjects
Surface (mathematics) ,Acoustics and Ultrasonics ,Rotor (electric) ,Computer science ,Applied Mathematics ,Acoustics ,Experimental data ,Near and far field ,Aerodynamics ,Solver ,law.invention ,Euler equations ,symbols.namesake ,Noise ,law ,symbols - Abstract
This paper presents two methods for predicting the noise from helicopter rotors in forward flight. Aerodynamic and acoustic solutions in the near field are computed with a finite-difference solver for the Euler equations. Two different Kirchhoff acoustics methods are then used to propagate the acoustic signals to the far field in a computationally-efficient manner. One of the methods uses a Kirchhoff surface that rotates with the rotor blades. The other uses a nonrotating Kirchhoff surface. Results from both methods are compared to experimental data for both high-speed impulsive noise and blade-vortex interaction noise. Agreement between experimental data and computational results is excellent for both cases. The rotating and nonrotating Kirchhoff methods are also compared for accuracy and efficiency. Both offer high accuracy with reasonable computer resource requirements. The Kirchhoff integrations efficiently extend the near-field finite-difference results to predict the far field helicopter noise.
- Published
- 1996
- Full Text
- View/download PDF
132. Mesh quality control for multiply-refined tetrahedral grids
- Author
-
Roger C. Strawn and Rupak Biswas
- Subjects
Numerical Analysis ,Theoretical computer science ,business.industry ,Applied Mathematics ,Aerodynamics ,Computational fluid dynamics ,T-vertices ,Grid ,Computational science ,Computational Mathematics ,Quality (physics) ,Flow (mathematics) ,Tetrahedron ,Fluid dynamics ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
A new algorithm for controlling the quality of multiply-refined tetrahedral meshes is presented in this paper. The basic dynamic mesh adaption procedure allows localized grid refinement and coarsening to efficiently capture aerodynamic flow features in computational fluid dynamics problems; however, repeated application of the procedure may significantly deteriorate the quality of the mesh. Results presented show the effectiveness of this mesh quality algorithm and its potential in the area of helicopter aerodynamics and acoustics.
- Published
- 1996
- Full Text
- View/download PDF
133. Numerical simulations of helicopter aerodynamics and acoustics
- Author
-
Roger C. Strawn and Rupak Biswas
- Subjects
Structured and unstructured grids ,Computer simulation ,business.industry ,Rotor (electric) ,Applied Mathematics ,Acoustics ,Numerical analysis ,Aerodynamics ,Computational fluid dynamics ,Dynamic mesh adaption ,Solver ,Rotor in hover and forward flight ,Euler equations ,law.invention ,Computational Mathematics ,symbols.namesake ,law ,Kirchhoff integration ,symbols ,Helicopter rotor ,Helicopter aerodynamics and acoustics ,business ,Mathematics - Abstract
This paper demonstrates several new methods for computing acoustic signals from helicopter rotors in hover and forward flight. Aerodynamic and acoustic solutions in the near field are computed with two different finite-volume flow solvers for the Euler equations. A solution-adaptive unstructured-grid Euler solver models a rotor in hover while a more conventional structured-grid solver is used for forward flight calculations. A nonrotating cylindrical surface is then placed around the entire rotor system. This surface moves subsonically with the rotor hub in forward flight. The finite-volume solution is interpolated onto this cylindrical surface at every time step and a Kirchhoff integration propagates the acoustic signal to the far field. Computed values for high-speed impulsive noise in hover and forward flight show excellent agreement with experimental data. Results from the combined finite-volume/Kirchhoff method offer high accuracy with reasonable computer resource requirements.
- Published
- 1996
- Full Text
- View/download PDF
134. Computation of Helicopter Rotor Acoustics in Forward Flight
- Author
-
Rupak Biswas and Roger C. Strawn
- Subjects
Computer science ,Rotor (electric) ,Computation ,Acoustics ,Near and far field ,Aerodynamics ,Solver ,law.invention ,Euler equations ,Noise ,symbols.namesake ,law ,symbols ,Helicopter rotor - Abstract
This paper presents a new method for computing acoustic signals from helicopter rotors in forward flight. The aerodynamic and acoustic solutions in the near field are computed with a finite-difference solver for the Euler equations. A nonrotating cylindrical Kirchhoff surface is then placed around the entire rotor system. This Kirchhoff surface moves subsonically with the rotor in forward flight. The finite-difference solution is interpolated onto this cylindrical surface at each time step and a Kirchhoff integration is used to carry the acoustic signal to the far field. Computed values for high-speed impulsive noise show excellent agreement with model-rotor and flight-test experimental data. Results from the new method offer high accuracy with reasonable computer resource requirements.
- Published
- 1995
- Full Text
- View/download PDF
135. Performance evaluation of Amazon EC2 for NASA HPC applications
- Author
-
Arthur Lazanoff, Rupak Biswas, Piyush Mehrotra, Subhash Saini, Steve Heistand, Robert Hood, Jahed Djomehri, and Haoqiang Jin
- Subjects
Computer science ,business.industry ,Computation ,Overhead (engineering) ,Usability ,Cloud computing ,Virtualization ,computer.software_genre ,Supercomputer ,HPC Challenge Benchmark ,Operating system ,Network performance ,business ,computer - Abstract
Cloud computing environments are now widely available and are being increasingly utilized for technical computing. They are also being touted for high-performance computing (HPC) applications in science and engineering. For example, Amazon EC2 Services offers a specialized Cluster Compute instance to run HPC applications. In this paper, we compare the performance characteristics of Amazon EC2 HPC instances to that of NASA's Pleiades supercomputer, an SGI ICE cluster. For this study, we utilized the HPCC kernels and the NAS Parallel Benchmarks along with four full-scale applications from the repertoire of codes that are being used by NASA scientists and engineers. We compare the total runtime of these codes for varying number of cores. We also break out the computation and communication times for a subset of these applications to explore the effect of interconnect differences on the two systems. In general, the single node performance of the two platforms is equivalent. However, for most of the codes when scaling to larger core counts, the performance of EC2 HPC instance generally lags that of Pleiades due to worse network performance of the former. In addition to analyzing application performance, we also briefly touch upon the overhead due to virtualization and the usability of cloud environments such as Amazon EC2.
- Published
- 2012
- Full Text
- View/download PDF
136. An Application-based Performance Evaluation of NASA's Nebula Cloud Computing Platform
- Author
-
Johnny Chang, Rupak Biswas, Subhash Saini, Piyush Mehrotra, Steve Heistand, Haoqiang Jin, and Robert Hood
- Subjects
Nebula ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,business.industry ,Computer science ,Usability ,Cloud computing ,Virtualization ,computer.software_genre ,Supercomputer ,Scalability ,Operating system ,Bandwidth (computing) ,business ,computer - Abstract
The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA's Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform.
- Published
- 2012
- Full Text
- View/download PDF
137. Parallel, adaptive finite element methods for conservation laws
- Author
-
Rupak Biswas, Joseph E. Flaherty, and Karen D. Devine
- Subjects
Numerical Analysis ,Partial differential equation ,Discretization ,Applied Mathematics ,Spectral element method ,Mathematical analysis ,MathematicsofComputing_NUMERICALANALYSIS ,Mixed finite element method ,Superconvergence ,Finite element method ,Computational Mathematics ,Discontinuous Galerkin method ,Temporal discretization ,Mathematics - Abstract
We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
- Published
- 1994
- Full Text
- View/download PDF
138. A new procedure for dynamic adaption of three-dimensional unstructured grids
- Author
-
Rupak Biswas and Roger C. Strawn
- Subjects
Numerical Analysis ,Mathematical optimization ,Series (mathematics) ,Computer science ,Applied Mathematics ,Aerodynamics ,Linked list ,Grid ,Data structure ,Euler equations ,Computational science ,Computational Mathematics ,symbols.namesake ,Flow (mathematics) ,symbols ,Transonic ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
A new procedure is presented for the simultaneous coarsening and refinement of three-dimensional unstructured tetrahedral meshes. This algorithm allows for localized grid adaption that is used to capture aerodynamic flow features such as vortices and shock waves in helicopter flowfield simulations. The mesh-adaption algorithm is implemented in the C programming language and uses a data structure consisting of a series of dynamically-allocated linked lists. These lists allow the mesh connectivity to be rapidly reconstructed when individual mesh points are added and/or deleted. The algorithm allows the mesh to change in an anisotropic manner in order to efficiently resolve directional flow features. The procedure has been successfully implemented on a single processor of a Cray Y-MP computer. Two sample cases are presented involving three-dimensional transonic flow. Computed results show good agreement with conventional structured-grid solutions for the Euler equations.
- Published
- 1994
- Full Text
- View/download PDF
139. The impact of hyper-threading on processor resource utilization in production applications
- Author
-
David Barker, Haoqiang Jin, Rupak Biswas, Piyush Mehrotra, Robert Hood, and Subhash Saini
- Subjects
CPU cache ,Computer science ,business.industry ,Pentium ,Hyper-threading ,Thread (computing) ,computer.software_genre ,Instruction set ,Data access ,Multithreading ,Embedded system ,Operating system ,Code (cryptography) ,Resource allocation ,business ,computer - Abstract
Intel provides Hyper-Threading (HT) in processors based on its Pentium and Nehalem micro-architecture such as the Westmere-EP. HT enables two threads to execute on each core in order to hide latencies related to data access. These two threads can execute simultaneously, filling unused stages in the functional unit pipelines. To aid better understanding of HT-related issues, we collect Performance Monitoring Unit (PMU) data (instructions retired; unhalted core cycles; L2 and L3 cache hits and misses; vector and scalar floating-point operations, etc.). We then use the PMU data to calculate a new metric of efficiency in order to quantify processor resource utilization and make comparisons of that utilization between single-threading (ST) and HT modes. We also study performance gain using unhalted core cycles, code efficiency of using vector units of the processor, and the impact of HT mode on various shared resources like L2 and L3 cache. Results using four full-scale, production-quality scientific applications from computational fluid dynamics (CFD) used by NASA scientists indicate that HT generally improves processor resource utilization efficiency, but does not necessarily translate into overall application performance gain.
- Published
- 2011
- Full Text
- View/download PDF
140. Performance Analysis of CFD Application Cart3D Using MPInside and Performance Monitor Unit Data on Nehalem and Westmere Based Supercomputers
- Author
-
Kenichi Taylor, Rupak Biswas, Piyush Mehrotra, Michael J. Aftosmis, and Subhash Saini
- Subjects
Profiling (computer programming) ,Measure (data warehouse) ,Computer science ,business.industry ,CPU cache ,Message passing ,Vectorization (mathematics) ,Hyper-threading ,Parallel computing ,Computational fluid dynamics ,business - Abstract
Cart3D is a computational fluid dynamics (CFD) application aimed at conceptual and preliminary design of aerospace vehicles with complex geometries. It is widely used by design engineers at NASA, Department of Defense and aerospace companies in the USA. We present detailed performance analysis of Cart3D using two tools SGI MPInside and op_scope that collects hardware counter data from Intel Performance Monitoring Unit (PMU) on supercomputers based on Nehalem micro-architecture. Using these tools, we have done dynamic profiling of Cart3D (compute time, communication time and I/O time), along with dynamic profiling of MPI functions (MPI_Sendrecv, MPI_Bcast, MPI_Isend, MPI_Irecv, MPI_Allreduce, MPI_Barrier, etc.) with respect to message size of each rank and time consumed by each function. MPI communication is further analyzed by studying the performance of MPI functions used in this application as a function of message size and number of cores. Using these tools we have also studied efficiency of the processor to measure its effective utilization, efficiency of the floating-point units, percentage of vectorization and percentage of data coming from L2 cache, L3 cache, and main memory. This study was performed on two computing sub-systems based on quad-core Nehalem-EP and hex-core West mere-EP processors that are part of Pleiades an SGI Altix ICE at NASA Ames Research Center.
- Published
- 2011
- Full Text
- View/download PDF
141. An adaptive mesh-moving and refinement procedure for one-dimensional conservation laws
- Author
-
David C. Arney, Rupak Biswas, and Joseph E. Flaherty
- Subjects
Numerical Analysis ,Adaptive mesh refinement ,Applied Mathematics ,Extrapolation ,Finite difference ,Finite difference method ,T-vertices ,Data structure ,Mathematics::Numerical Analysis ,Computational Mathematics ,Computer Science::Graphics ,Mesh generation ,Polygon mesh ,Algorithm ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
We examine the performance of an adaptive mesh-moving and /or local mesh refinement procedure for the finite difference solution of one-dimensional hyperbolic systems of conservation laws. Adaptive motion of a base mesh is designed to isolate spatially distinct phenomena, and recursive local refinement of the time step and cells of the stationary or moving base mesh is performed in regions where a refinement indicator exceeds a prescribed tolerance. These adaptive procedures are incorporated into a computer code that includes a MacCormack finite difference scheme wih Davis' artificial viscosity model and a discretization error estimate based on Richardson's extrapolation. Experiments are conducted on three problems in order to qualify the advantages of adaptive techniques relative to uniform mesh computations and the relative benefits of mesh moving and refinement. Key results indicate that local mesh refinement, with and without mesh moving, can provide reliable solutions at much lower computational cost than possible on uniform meshes; that mesh motion can be used to improve the results of uniform mesh solutions for a modest computational effort; that the cost of managing the tree data structure associated with refinement is small; and that a combination of mesh motion and refinement reliably produces solutions for the least cost per unit accuracy.
- Published
- 1993
- Full Text
- View/download PDF
142. Performance Analysis of Scientific and Engineering Applications Using MPInside and TAU
- Author
-
Piyush Mehrotra, Rupak Biswas, Kenichi Taylor, Subhash Saini, and Sameer Shende
- Subjects
ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,business.industry ,Computer science ,Message passing ,Rank (computer programming) ,Parallel computing ,Function (mathematics) ,Computational fluid dynamics ,Solver ,Navier–Stokes equations ,business ,Computational science - Abstract
In this paper, we present performance analysis of two NASA applications using performance tools like Tuning and Analysis Utilities (TAU) and SGI MP Inside. MITgcmUV and OVERFLOW are two production-quality applications used extensively by scientists and engineers at NASA. MITgcmUV is a global ocean simulation model, developed by the Estimating the Circulation and Climate of the Ocean (ECCO) Consortium, for solving the fluid equations of motion using the hydrostatic approximation. OVERFLOW is a general-purpose Navier-Stokes solver for computational fluid dynamics (CFD) problems. Using these tools, we analyze the MPI functions (MPI_Sendrecv, MPI_Bcast, MPI_Reduce, MPI_Allreduce, MPI_Barrier, etc.) with respect to message size of each rank, time consumed by each function, and how ranks communicate. MPI communication is further analyzed by studying the performance of MPI functions used in these two applications as a function of message size and number of cores. Finally, we present the compute time, communication time, and I/O time as a function of the number of cores.
- Published
- 2010
- Full Text
- View/download PDF
143. Performance impact of resource contention in multicore systems
- Author
-
Rupak Biswas, Jahed Djomehri, Piyush Mehrotra, Sharad Gavali, Johnny Chang, Haoqiang Jin, Dennis C. Jespersen, Robert Hood, and Kenichi Taylor
- Subjects
Multi-core processor ,Computer architecture ,Memory hierarchy ,Computer science ,Message passing ,Operating system ,Resource contention ,Resource allocation (computer) ,Differential (mechanical device) ,computer.software_genre ,computer ,Shared resource - Abstract
Resource sharing in commodity multicore processors can have a significant impact on the performance of production applications. In this paper we use a differential performance analysis methodology to quantify the costs of contention for resources in the memory hierarchy of several multicore processors used in high-end computers. In particular, by comparing runs that bind MPI processes to cores in different patterns, we can isolate the effects of resource sharing. We use this methodology to measure how such sharing affects the performance of four applications of interest to NASA — OVERFLOW, MITgcm, Cart3D, and NCC. We also use a subset of the HPCC benchmarks and hardware counter data to help interpret and validate our findings. We conduct our study on high-end computing platforms that use four different quad-core microprocessors — Intel Clovertown, Intel Harpertown, AMD Barcelona, and Intel Nehalem-EP. The results help further our understanding of the requirements these codes place on their production environments and also of each computer's ability to deliver performance.
- Published
- 2010
- Full Text
- View/download PDF
144. Role of High-End Computing in Meeting NASA’s Science and Engineering Challenges
- Author
-
Rupak Biswas, William R. Van Dalsem, and Eugene L. Tu
- Subjects
Engineering ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,business.industry ,Science and engineering ,Scientific discovery ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Supercomputer ,Space exploration ,High end computing ,Aeronautics ,Agency (sociology) ,Aerospace engineering ,Aerospace ,business ,Research center - Abstract
High-end computing (HEC) has always played a major role in meeting the modeling and simulation needs of various NASA missions. Two years ago, NASA was on the verge of dramatically enhancing its HEC capability and capacity by significantly increasing its computational and storage resources. With the 10,240-processor Columbia supercomputer in production since October 2004, HEC is having an even greater impact within the Agency and beyond. Advanced science and engineering simulations in space exploration, Shuttle operations, Earth sciences, and fundamental aeronautics research are occurring on Columbia, demonstrating its ability to accelerate NASA’s exploration vision. This paper describes how the integrated production environment fostered at the NASA Advanced Supercomputing (NAS) facility at Ames Research Center is reducing design cycle times, accelerating scientific discovery, achieving rapid parametric analyses of multiple scenarios, and enhancing safety for several NASA missions. We focus on Columbia’s impact on two key engineering and science disciplines: aerospace, and climate/weather. We also discuss future mission challenges and plans for NASA’s next-generation HEC environment.
- Published
- 2009
- Full Text
- View/download PDF
145. Advances in adaptive parallel processing for field applications
- Author
-
Messaoud Benantar, Joseph E. Flaherty, and Rupak Biswas
- Subjects
Partial differential equation ,Finite volume method ,Discretization ,Computer science ,MathematicsofComputing_NUMERICALANALYSIS ,Hyperbolic systems ,Finite element method ,Electronic, Optical and Magnetic Materials ,Polynomial basis ,Elliptic partial differential equation ,Conjugate gradient method ,Piecewise ,Quadtree ,Applied mathematics ,Electrical and Electronic Engineering ,Galerkin method - Abstract
Techniques for the adaptive solution of two-dimensional vector systems for hyperbolic and elliptic partial differential equations on shared-memory parallel computers are described. Hyperbolic systems are approximated by an explicit finite volume technique and solved by a recursive local mesh refinement procedure. Several computational procedures have been developed, and results comparing a variety of heuristic processor load-balancing techniques and refinement strategies are presented. For elliptic problems, the spatial domain is discretized using a finite quadtree mesh-generation procedure and the differential system is discretized by a finite-element Galerkin technique with a hierarchical piecewise polynomial basis. Resulting linear algebraic systems are solved in parallel on noncontiguous quadrants by a conjugate gradient technique with element-by-element and symmetric successive over-relaxation preconditioners. Noncontiguous regions are determined by using a linear-time complexity coloring procedure that requires a maximum of six colors. >
- Published
- 1991
- Full Text
- View/download PDF
146. Parallel computation with adaptive methods for elliptic and hyperbolic systems
- Author
-
Rupak Biswas, Messaoud Benantar, Joseph E. Flaherty, and Mark S. Shephard
- Subjects
Mechanical Engineering ,Linear system ,Computational Mechanics ,General Physics and Astronomy ,Parallel computing ,Grid ,Finite element method ,Computer Science Applications ,Tree traversal ,Tree structure ,Mechanics of Materials ,Mesh generation ,Quadtree ,Stiffness matrix ,Mathematics - Abstract
We consider the solution of two-dimensional vector systems of elliptic and hyperbolic partial differential equations on a shared memory parallel computer. For elliptic problems, the spatial domain is discretized using a finite quadtree mesh generation procedure and the differential system is discretized by a finite element-Galerkin technique with a piecewise linear polynomial basis. Resulting linear algebraic systems are solved using the conjugate gradient technique with element-by-element and symmetric successive over-relaxation preconditioners. Stiffness matrix assembly and linear system solutions are processed in parallel with computations schedules on noncontiguous quadrants of the tree in order to minimize process synchronization. Determining noncontiguous regions by coloring the regular finite quadtree structure is far simpler than coloring elements of the unstructured mesh that the finite quadtree procedure generates. We describe linear-time complexity coloring procedures that use six and eight colors. For hyperbolic problems, the rectangular spatial domain is discretized into a grid of rectangular cells, and the differential system is discretized by an explicit finite difference technique. Recursive local refinement of the time steps and spatial cells of a coarse base mesh is performed in regions where a refinement indicator exceeds a prescribed tolerance. Data management involves the use of a tree of grids with finer grids regarded as offspring of coarser ones. Computational procedures that sequentially traverse the tree structure while processing solutions on each grid in parallel and that process solutions at the same tree level in parallel have been developed. Computational results using the sequential tree traversal scheme are presented and compared with results using a non-adaptive strategy. Heuristic processor load balancing techniques are suggested for the parallel tree traversal procedure.
- Published
- 1990
- Full Text
- View/download PDF
147. Petascale Computing: Impact on Future NASA Missions
- Author
-
Michael J. Aftosmis, Rupak Biswas, Bo-Wen Shen, and Cetin C. Kiris
- Subjects
Petascale computing ,business.industry ,Computer science ,Aerospace engineering ,business - Published
- 2007
- Full Text
- View/download PDF
148. Petascale Computing
- Author
-
Michael Aftosmis, Cetin Kiris, Rupak Biswas, and Bo-Wen Shen
- Published
- 2007
- Full Text
- View/download PDF
149. Impact of the Columbia Supercomputer on NASA Space and Exploration Missions
- Author
-
Rupak Biswas, D. Kwak, S. Lawrence, and C. Kiris
- Subjects
Engineering ,Space technology ,Data visualization ,Aeronautics ,business.industry ,Space operations ,Space (commercial competition) ,business ,Supercomputer ,Space research ,Visualization - Abstract
NASA's 10,240-processor Columbia supercomputer gained worldwide recognition in 2004 for increasing the space agency's computing capability ten-fold, and enabling U.S. scientists and engineers to perform significant, breakthrough simulations. Columbia has amply demonstrated its capability to accelerate NASA 's key missions in space operations, exploration systems, science, and aeronautics. Columbia is part of an integrated high-end computing (HEC) environment comprised of massive storage and archive systems, highspeed networking, high-fidelity modeling and simulation tools, application performance optimization, and advanced data analysis and visualization. In this paper, we illustrate the impact Columbia is having on NASA's numerous space and exploration applications, such as the development of the Crew Exploration and Launch Vehicles (CEV/CLV), effects of long-duration human presence in space, and damage assessment and repair recommendations for remaining shuttle flights. We conclude by discussing HEC challenges that must be overcome to solve space-related science problems in the future.
- Published
- 2006
- Full Text
- View/download PDF
150. A Workload Partitioner for Heterogeneous Grids
- Author
-
Sajal K. Das, Rupak Biswas, and Daniel J. Harvey
- Subjects
Grid computing ,Computer science ,Distributed computing ,Dynamic load balancing ,Graph partition ,Workload ,Parallel computing ,computer.software_genre ,computer - Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.