Author: "Vishwanath, Venkatram" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Vishwanath, Venkatram"' showing total 291 results

Start Over Author "Vishwanath, Venkatram"

291 results on '"Vishwanath, Venkatram"'

1. BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference

Author: Gulhan, Ahmed Burak, Chitty-Venkata, Krishna Teja, Emani, Murali, Kandemir, Mahmut, and Vishwanath, Venkatram
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In Large Language Model (LLM) inference, Key-Value (KV) caches (KV-caches) are essential for reducing time complexity. However, they result in a linear increase in GPU memory as the context length grows. While recent work explores KV-cache eviction and compression policies to reduce memory usage, they often consider uniform KV-caches across all attention heads, leading to suboptimal performance. We introduce BaKlaVa, a method to allocate optimal memory for individual KV-caches across the model by estimating the importance of each KV-cache. Our empirical analysis demonstrates that not all KV-caches are equally critical for LLM performance. Using a one-time profiling approach, BaKlaVa assigns optimal memory budgets to each KV-cache. We evaluated our method on LLaMA-3-8B, and Qwen2.5-7B models, achieving up to a 70\% compression ratio while keeping baseline performance and delivering up to an order-of-magnitude accuracy improvement at higher compression levels.
Published: 2025

2. An Incremental Multi-Level, Multi-Scale Approach to Assessment of Multifidelity HPC Systems

Author: Shilpika, Shilpika, Lusch, Bethany, Vishwanath, Venkatram, and Papka, Michael E.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: With the growing complexity in architecture and the size of large-scale computing systems, monitoring and analyzing system behavior and events has become daunting. Monitoring data amounting to terabytes per day are collected by sensors housed in these massive systems at multiple fidelity levels and varying temporal resolutions. In this work, we develop an incremental version of multiresolution dynamic mode decomposition (mrDMD), which converts high-dimensional data to spatial-temporal patterns at varied frequency ranges. Our incremental implementation of the mrDMD algorithm (I-mrDMD) promptly reveals valuable information in the massive environment log dataset, which is then visually aligned with the processed hardware and job log datasets through our generalizable rack visualization using D3 visualization integrated into the Jupyter Notebook interface. We demonstrate the efficacy of our approach with two use scenarios on a real-world dataset from a Cray XC40 supercomputer, Theta.
Published: 2025
Full Text: View/download PDF

3. A Deep Probabilistic Framework for Continuous Time Dynamic Graph Generation

Author: Hosseini, Ryien, Simini, Filippo, Vishwanath, Venkatram, and Hoffmann, Henry
Subjects: Computer Science - Machine Learning
Abstract: Recent advancements in graph representation learning have shifted attention towards dynamic graphs, which exhibit evolving topologies and features over time. The increased use of such graphs creates a paramount need for generative models suitable for applications such as data augmentation, obfuscation, and anomaly detection. However, there are few generative techniques that handle continuously changing temporal graph data; existing work largely relies on augmenting static graphs with additional temporal information to model dynamic interactions between nodes. In this work, we propose a fundamentally different approach: We instead directly model interactions as a joint probability of an edge forming between two nodes at a given time. This allows us to autoregressively generate new synthetic dynamic graphs in a largely assumption free, scalable, and inductive manner. We formalize this approach as DG-Gen, a generative framework for continuous time dynamic graphs, and demonstrate its effectiveness over five datasets. Our experiments demonstrate that DG-Gen not only generates higher fidelity graphs compared to traditional methods but also significantly advances link prediction tasks., Comment: To appear at AAAI-25
Published: 2024

4. LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Author: Chitty-Venkata, Krishna Teja, Raskar, Siddhisanket, Kale, Bharat, Ferdaus, Farah, Tanikanti, Aditya, Raffenetti, Ken, Taylor, Valerie, Emani, Murali, and Vishwanath, Venkatram
Subjects: Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have propelled groundbreaking advancements across several domains and are commonly used for text generation applications. However, the computational demands of these complex models pose significant challenges, requiring efficient hardware acceleration. Benchmarking the performance of LLMs across diverse hardware platforms is crucial to understanding their scalability and throughput characteristics. We introduce LLM-Inference-Bench, a comprehensive benchmarking suite to evaluate the hardware inference performance of LLMs. We thoroughly analyze diverse hardware platforms, including GPUs from Nvidia and AMD and specialized AI accelerators, Intel Habana and SambaNova. Our evaluation includes several LLM inference frameworks and models from LLaMA, Mistral, and Qwen families with 7B and 70B parameters. Our benchmarking results reveal the strengths and limitations of various models, hardware platforms, and inference frameworks. We provide an interactive dashboard to help identify configurations for optimal performance for a given hardware platform.
Published: 2024

5. Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling

Author: Barwey, Shivam, Balin, Riccardo, Lusch, Bethany, Patel, Saumil, Balakrishnan, Ramesh, Pal, Pinaki, Maulik, Romit, and Vishwanath, Venkatram
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning, Physics - Computational Physics
Abstract: This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is arithmetically equivalent to evaluations on multiple ranks (a partitioned graph). This concept is demonstrated by interfacing GNNs with NekRS, a GPU-capable exascale CFD solver developed at Argonne National Laboratory. It is shown how the NekRS mesh partitioning can be linked to the distributed GNN training and inference routines, resulting in a scalable mesh-based data-driven modeling workflow. We study the impact of consistency on the scalability of mesh-based GNNs, demonstrating efficient scaling in consistent GNNs for up to O(1B) graph nodes on the Frontier exascale supercomputer.
Published: 2024

6. Mesh-based Super-Resolution of Fluid Flows with Multiscale Graph Neural Networks

Author: Barwey, Shivam, Pal, Pinaki, Patel, Saumil, Balin, Riccardo, Lusch, Bethany, Vishwanath, Venkatram, Maulik, Romit, and Balakrishnan, Ramesh
Subjects: Physics - Fluid Dynamics, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Machine Learning, Physics - Computational Physics
Abstract: A graph neural network (GNN) approach is introduced in this work which enables mesh-based three-dimensional super-resolution of fluid flows. In this framework, the GNN is designed to operate not on the full mesh-based field at once, but on localized meshes of elements (or cells) directly. To facilitate mesh-based GNN representations in a manner similar to spectral (or finite) element discretizations, a baseline GNN layer (termed a message passing layer, which updates local node properties) is modified to account for synchronization of coincident graph nodes, rendering compatibility with commonly used element-based mesh connectivities. The architecture is multiscale in nature, and is comprised of a combination of coarse-scale and fine-scale message passing layer sequences (termed processors) separated by a graph unpooling layer. The coarse-scale processor embeds a query element (alongside a set number of neighboring coarse elements) into a single latent graph representation using coarse-scale synchronized message passing over the element neighborhood, and the fine-scale processor leverages additional message passing operations on this latent graph to correct for interpolation errors. Demonstration studies are performed using hexahedral mesh-based data from Taylor-Green Vortex and backward-facing step flow simulations at Reynolds numbers of 1600 and 3200. Through analysis of both global and local errors, the results ultimately show how the GNN is able to produce accurate super-resolved fields compared to targets in both coarse-scale and multiscale model configurations. Reconstruction errors for fixed architectures were found to increase in proportion to the Reynolds number. Geometry extrapolation studies on a separate cavity flow configuration show promising cross-mesh capabilities of the super-resolution strategy.
Published: 2024

7. DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Author: Song, Shuaiwen Leon, Kruft, Bonnie, Zhang, Minjia, Li, Conglong, Chen, Shiyang, Zhang, Chengming, Tanaka, Masahiro, Wu, Xiaoxia, Rasley, Jeff, Awan, Ammar Ahmad, Holmes, Connor, Cai, Martin, Ghanem, Adam, Zhou, Zhongzhu, He, Yuxiong, Luferenko, Pete, Kumar, Divya, Weyn, Jonathan, Zhang, Ruixiong, Klocek, Sylwester, Vragov, Volodymyr, AlQuraishi, Mohammed, Ahdritz, Gustaf, Floristean, Christina, Negri, Cristina, Kotamarthi, Rao, Vishwanath, Venkatram, Ramanathan, Arvind, Foreman, Sam, Hippe, Kyle, Arcomano, Troy, Maulik, Romit, Zvyagin, Maxim, Brace, Alexander, Zhang, Bin, Bohorquez, Cindy Orozco, Clyde, Austin, Kale, Bharat, Perez-Rivera, Danilo, Ma, Heng, Mann, Carla M., Irvin, Michael, Pauloski, J. Gregory, Ward, Logan, Hayot, Valerie, Emani, Murali, Xie, Zhen, Lin, Diangen, Shukla, Maulik, Foster, Ian, Davis, James J., Papka, Michael E., Brettin, Thomas, Balaprakash, Prasanna, Tourassi, Gina, Gounley, John, Hanson, Heidi, Potok, Thomas E, Pasini, Massimiliano Lupo, Evans, Kate, Lu, Dan, Lunga, Dalton, Yin, Junqi, Dash, Sajal, Wang, Feiyi, Shankar, Mallikarjun, Lyngaas, Isaac, Wang, Xiao, Cong, Guojing, Zhang, Pei, Fan, Ming, Liu, Siyan, Hoisie, Adolfy, Yoo, Shinjae, Ren, Yihui, Tang, William, Felker, Kyle, Svyatkovskiy, Alexey, Liu, Hang, Aji, Ashwin, Dalton, Angela, Schulte, Michael, Schulz, Karl, Deng, Yuntian, Nie, Weili, Romero, Josh, Dallago, Christian, Vahdat, Arash, Xiao, Chaowei, Gibbs, Thomas, Anandkumar, Anima, and Stevens, Rick
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.
Published: 2023

8. A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators

Author: Emani, Murali, Foreman, Sam, Sastry, Varuni, Xie, Zhen, Raskar, Siddhisanket, Arnold, William, Thakur, Rajeev, Vishwanath, Venkatram, and Papka, Michael E.
Subjects: Computer Science - Performance, Computer Science - Artificial Intelligence, Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications is contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPUs and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT- 2 model, and (iii) an LLM-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, sparsity, and sensitivity to gradient accumulation steps.
Published: 2023

9. Parallel Multi-Objective Hyperparameter Optimization with Uniform Normalization and Bounded Objectives

Author: Egele, Romain, Chang, Tyler, Sun, Yixuan, Vishwanath, Venkatram, and Balaprakash, Prasanna
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Machine learning (ML) methods offer a wide range of configurable hyperparameters that have a significant influence on their performance. While accuracy is a commonly used performance objective, in many settings, it is not sufficient. Optimizing the ML models with respect to multiple objectives such as accuracy, confidence, fairness, calibration, privacy, latency, and memory consumption is becoming crucial. To that end, hyperparameter optimization, the approach to systematically optimize the hyperparameters, which is already challenging for a single objective, is even more challenging for multiple objectives. In addition, the differences in objective scales, the failures, and the presence of outlier values in objectives make the problem even harder. We propose a multi-objective Bayesian optimization (MoBO) algorithm that addresses these problems through uniform objective normalization and randomized weights in scalarization. We increase the efficiency of our approach by imposing constraints on the objective to avoid exploring unnecessary configurations (e.g., insufficient accuracy). Finally, we leverage an approach to parallelize the MoBO which results in a 5x speed-up when using 16x more workers., Comment: Preprint with appendices
Published: 2023

10. A Survey of Techniques for Optimizing Transformer Inference

Author: Chitty-Venkata, Krishna Teja, Mittal, Sparsh, Emani, Murali, Vishwanath, Venkatram, and Somani, Arun K.
Subjects: Computer Science - Machine Learning, Computer Science - Hardware Architecture, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent years have seen a phenomenal rise in performance and applications of transformer neural networks. The family of transformer networks, including Bidirectional Encoder Representations from Transformer (BERT), Generative Pretrained Transformer (GPT) and Vision Transformer (ViT), have shown their effectiveness across Natural Language Processing (NLP) and Computer Vision (CV) domains. Transformer-based networks such as ChatGPT have impacted the lives of common men. However, the quest for high predictive performance has led to an exponential increase in transformers' memory and compute footprint. Researchers have proposed techniques to optimize transformer inference at all levels of abstraction. This paper presents a comprehensive survey of techniques for optimizing the inference phase of transformer networks. We survey techniques such as knowledge distillation, pruning, quantization, neural architecture search and lightweight network design at the algorithmic level. We further review hardware-level optimization techniques and the design of novel hardware accelerators for transformers. We summarize the quantitative results on the number of parameters/FLOPs and accuracy of several models/techniques to showcase the tradeoff exercised by them. We also outline future directions in this rapidly evolving field of research. We believe that this survey will educate both novice and seasoned researchers and also spark a plethora of research efforts in this field.
Published: 2023

11. A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems

Author: Shilpika, Lusch, Bethany, Emani, Murali, Simini, Filippo, Vishwanath, Venkatram, Papka, Michael E., and Ma, Kwan-Liu
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Computer Vision and Pattern Recognition
Abstract: The ability to monitor and interpret of hardware system events and behaviors are crucial to improving the robustness and reliability of these systems, especially in a supercomputing facility. The growing complexity and scale of these systems demand an increase in monitoring data collected at multiple fidelity levels and varying temporal resolutions. In this work, we aim to build a holistic analytical system that helps make sense of such massive data, mainly the hardware logs, job logs, and environment logs collected from disparate subsystems and components of a supercomputer system. This end-to-end log analysis system, coupled with visual analytics support, allows users to glean and promptly extract supercomputer usage and error patterns at varying temporal and spatial resolutions. We use multiresolution dynamic mode decomposition (mrDMD), a technique that depicts high-dimensional data as correlated spatial-temporal variations patterns or modes, to extract variation patterns isolated at specified frequencies. Our improvements to the mrDMD algorithm help promptly reveal useful information in the massive environment log dataset, which is then associated with the processed hardware and job log datasets using our visual analytics system. Furthermore, our system can identify the usage and error patterns filtered at user, project, and subcomponent levels. We exemplify the effectiveness of our approach with two use scenarios with the Cray XC40 supercomputer.
Published: 2023

12. WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators

Author: Chitty-Venkata, Krishna Teja, Sastry, Varuni Katti, Emani, Murali, Vishwanath, Venkatram, Shanmugavelu, Sanjif, Howland, Sylvia, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Carretero, Jesus, editor, Shende, Sameer, editor, Garcia-Blas, Javier, editor, Brandic, Ivona, editor, Olcoz, Katzalin, editor, and Schreiber, Martin, editor
Published: 2024
Full Text: View/download PDF

13. Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific Applications

Author: Hosseini, Ryien, Simini, Filippo, and Vishwanath, Venkatram
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Hardware Architecture, Computer Science - Performance
Abstract: As Graph Neural Networks (GNNs) increase in popularity for scientific machine learning, their training and inference efficiency is becoming increasingly critical. Additionally, the deep learning field as a whole is trending towards wider and deeper networks, and ever increasing data sizes, to the point where hard hardware bottlenecks are often encountered. Emerging specialty hardware platforms provide an exciting solution to this problem. In this paper, we systematically profile and select low-level operations pertinent to GNNs for scientific computing implemented in the Pytorch Geometric software framework. These are then rigorously benchmarked on NVIDIA A100 GPUs for several various combinations of input values, including tensor sparsity. We then analyze these results for each operation. At a high level, we conclude that on NVIDIA systems: (1) confounding bottlenecks such as memory inefficiency often dominate runtime costs moreso than data sparsity alone, (2) native Pytorch operations are often as or more competitive than their Pytorch Geometric equivalents, especially at low to moderate levels of input data sparsity, and (3) many operations central to state-of-the-art GNN architectures have little to no optimization for sparsity. We hope that these results serve as a baseline for those developing these operations on specialized hardware and that our subsequent analysis helps to facilitate future software and hardware based optimizations of these operations and thus scalable GNN performance as a whole., Comment: Published as workshop paper at MLSys 2022 (MLBench)
Published: 2022

14. Asynchronous Decentralized Bayesian Optimization for Large Scale Hyperparameter Optimization

Author: Egele, Romain, Guyon, Isabelle, Vishwanath, Venkatram, and Balaprakash, Prasanna
Subjects: Computer Science - Machine Learning
Abstract: Bayesian optimization (BO) is a promising approach for hyperparameter optimization of deep neural networks (DNNs), where each model training can take minutes to hours. In BO, a computationally cheap surrogate model is employed to learn the relationship between parameter configurations and their performance such as accuracy. Parallel BO methods often adopt single manager/multiple workers strategies to evaluate multiple hyperparameter configurations simultaneously. Despite significant hyperparameter evaluation time, the overhead in such centralized schemes prevents these methods to scale on a large number of workers. We present an asynchronous-decentralized BO, wherein each worker runs a sequential BO and asynchronously communicates its results through shared storage. We scale our method without loss of computational efficiency with above 95% of worker's utilization to 1,920 parallel workers (full production queue of the Polaris supercomputer) and demonstrate improvement in model accuracy as well as faster convergence on the CANDLE benchmark from the Exascale computing project.
Published: 2022

15. Adding topology and memory awareness in data aggregation algorithms

Author: Tessier, François, Vishwanath, Venkatram, and Jeannot, Emmanuel
Published: 2024
Full Text: View/download PDF

16. MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

Author: Farrell, Steven, Emani, Murali, Balma, Jacob, Drescher, Lukas, Drozd, Aleksandr, Fink, Andreas, Fox, Geoffrey, Kanter, David, Kurth, Thorsten, Mattson, Peter, Mu, Dawei, Ruhela, Amit, Sato, Kento, Shirahata, Koichi, Tabaru, Tsuguchika, Tsaris, Aristeidis, Balewski, Jan, Cumming, Ben, Danjo, Takumi, Domke, Jens, Fukai, Takaaki, Fukumoto, Naoto, Fukushi, Tatsuya, Gerofi, Balazs, Honda, Takumi, Imamura, Toshiyuki, Kasagi, Akihiko, Kawakami, Kentaro, Kudo, Shuhei, Kuroda, Akiyoshi, Martinasso, Maxime, Matsuoka, Satoshi, Mendonça, Henrique, Minami, Kazuki, Ram, Prabhat, Sawada, Takashi, Shankar, Mallikarjun, John, Tom St., Tabuchi, Akihiro, Vishwanath, Venkatram, Wahib, Mohamed, Yamazaki, Masafumi, and Yin, Junqi
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications driven by the MLCommons Association. We present the results from the first submission round, including a diverse set of some of the world's largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence, and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization, and communication scheduling, enabling overall $>10 \times$ (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system's memory hierarchy, and training convergence that underlines the importance of near-compute storage. To overcome the data-parallel scalability challenge at large batch sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O, and network behavior to parameterize extended roofline performance models in future rounds.
Published: 2021

17. PythonFOAM: In-situ data analyses with OpenFOAM and Python

Author: Maulik, Romit, Fytanidis, Dimitrios, Lusch, Bethany, Vishwanath, Venkatram, and Patel, Saumil
Subjects: Physics - Computational Physics, Computer Science - Distributed, Parallel, and Cluster Computing, Physics - Fluid Dynamics
Abstract: We outline the development of a general-purpose Python-based data analysis tool for OpenFOAM. Our implementation relies on the construction of OpenFOAM applications that have bindings to data analysis libraries in Python. Double precision data in OpenFOAM is cast to a NumPy array using the NumPy C-API and Python modules may then be used for arbitrary data analysis and manipulation on flow-field information. We highlight how the proposed wrapper may be used for an in-situ online singular value decomposition (SVD) implemented in Python and accessed from the OpenFOAM solver PimpleFOAM. Here, `in-situ' refers to a programming paradigm that allows for a concurrent computation of the data analysis on the same computational resources utilized for the partial differential equation solver. In addition, to demonstrate data-parallel analyses, we deploy a distributed SVD, which collects snapshot data across the ranks of a distributed simulation to compute the global left singular vectors. Crucially, both OpenFOAM and Python share the same message passing interface (MPI) communicator for this deployment which allows Python objects and functions to exchange NumPy arrays across ranks. Subsequently, we provide scaling assessments of this distributed SVD on multiple nodes of Intel Broadwell and KNL architectures for canonical test cases such as the large eddy simulations of a backward facing step and a channel flow at friction Reynolds number of 395. Finally, we demonstrate the deployment of a deep neural network for compressing the flow-field information using an autoencoder to demonstrate an ability to use state-of-the-art machine learning tools in the Python ecosystem.
Published: 2021

18. AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data

Author: Egele, Romain, Balaprakash, Prasanna, Vishwanath, Venkatram, Guyon, Isabelle, and Liu, Zhengying
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning
Abstract: Developing high-performing predictive models for large tabular data sets is a challenging task. The state-of-the-art methods are based on expert-developed model ensembles from different supervised learning methods. Recently, automated machine learning (AutoML) is emerging as a promising approach to automate predictive model development. Neural architecture search (NAS) is an AutoML approach that generates and evaluates multiple neural network architectures concurrently and improves the accuracy of the generated models iteratively. A key issue in NAS, particularly for large data sets, is the large computation time required to evaluate each generated architecture. While data-parallel training is a promising approach that can address this issue, its use within NAS is difficult. For different data sets, the data-parallel training settings such as the number of parallel processes, learning rate, and batch size need to be adapted to achieve high accuracy and reduction in training time. To that end, we have developed AgEBO-Tabular, an approach to combine aging evolution (AgE), a parallel NAS method that searches over neural architecture space, and an asynchronous Bayesian optimization method for tuning the hyperparameters of the data-parallel training simultaneously. We demonstrate the efficacy of the proposed method to generate high-performing neural network models for large tabular benchmark data sets. Furthermore, we demonstrate that the automatically discovered neural network models using our method outperform the state-of-the-art AutoML ensemble models in inference speed by two orders of magnitude while reaching similar accuracy values.
Published: 2020

19. Exploring the Use of Dataflow Architectures for Graph Neural Network Workloads

Author: Hosseini, Ryien, Simini, Filippo, Vishwanath, Venkatram, Sivakumar, Ramakrishnan, Shanmugavelu, Sanjif, Chen, Zhengyu, Zlotnik, Lev, Wang, Mingran, Colangelo, Philip, Deng, Andrew, Lassen, Philip, Pathan, Shukur, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bienz, Amanda, editor, Weiland, Michèle, editor, Baboulin, Marc, editor, and Kruse, Carola, editor
Published: 2023
Full Text: View/download PDF

20. TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators

Author: Xie, Zhen, Raskar, Siddhisanket, Emani, Murali, Vishwanath, Venkatram, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cano, José, editor, Dikaiakos, Marios D., editor, Papadopoulos, George A., editor, Pericàs, Miquel, editor, and Sakellariou, Rizos, editor
Published: 2023
Full Text: View/download PDF

21. In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms, a State-of-the-art (STAR) Report

Author: Bethel, EW, Bauer, Andrew, Abbasi, Hasan, Ahrens, James, Childs, Hank, Geveci, Berk, Klasky, Scott, Moreland, Kenneth, O'Leary, Patrick, Vishwanath, Venkatram, and Whitlock, Brad
Subjects: in situ processing, scientific visualization, high performance computing, parallel computing
Abstract: The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization datawithout first writing to disk, i.e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed/visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, wherefine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Thirdis the ability to use all available resources, CPU’s and accelerators, in the computation of analysis products. This STAR paperbrings together researchers, developers and practitioners using in situ methods in extreme-scale HPC with the goal to presentexisting methods, infrastructures, and a range of computational science and engineering applications using in situ analysis andvisualization.
Published: 2021

22. Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows

Author: Salim, Michael A., Uram, Thomas D., Childers, J. Taylor, Balaprakash, Prasanna, Vishwanath, Venkatram, and Papka, Michael E.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically packages tasks into ensemble jobs and manages their scheduling lifecycle. The ensembles execute in a pilot "launcher" which (i) ensures concurrent, load-balanced execution of arbitrary serial and parallel programs with heterogeneous processor requirements, (ii) requires no modification of user applications, (iii) is tolerant of task-level faults and provides several options for error recovery, (iv) stores provenance data (e.g task history, error logs) in the database, (v) supports dynamic workflows, in which tasks are created or killed at runtime. Here, we present the design and Python implementation of the Balsam service and launcher. The efficacy of this system is illustrated using two case studies: hyperparameter optimization of deep neural networks, and high-throughput single-point quantum chemistry calculations. We find that the unique combination of flexible job-packing and automated scheduling with dynamic (pilot-managed) execution facilitates excellent resource utilization. The scripting overheads typically needed to manage resources and launch workflows on supercomputers are substantially reduced, accelerating workflow development and execution., Comment: SC '18: 8th Workshop on Python for High-Performance and Scientific Computing (PyHPC 2018)
Published: 2019

23. Scalable Reinforcement-Learning-Based Neural Architecture Search for Cancer Deep Learning Research

Author: Balaprakash, Prasanna, Egele, Romain, Salim, Misha, Wild, Stefan, Vishwanath, Venkatram, Xia, Fangfang, Brettin, Tom, and Stevens, Rick
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Cancer is a complex disease, the understanding and treatment of which are being aided through increases in the volume of collected data and in the scale of deployed computing power. Consequently, there is a growing need for the development of data-driven and, in particular, deep learning methods for various tasks such as cancer diagnosis, detection, prognosis, and prediction. Despite recent successes, however, designing high-performing deep learning models for nonimage and nontext cancer data is a time-consuming, trial-and-error, manual task that requires both cancer domain and deep learning expertise. To that end, we develop a reinforcement-learning-based neural architecture search to automate deep-learning-based predictive model development for a class of representative cancer data. We develop custom building blocks that allow domain experts to incorporate the cancer-data-specific characteristics. We show that our approach discovers deep neural network architectures that have significantly fewer trainable parameters, shorter training time, and accuracy similar to or higher than those of manually designed architectures. We study and demonstrate the scalability of our approach on up to 1,024 Intel Knights Landing nodes of the Theta supercomputer at the Argonne Leadership Computing Facility., Comment: SC '19: IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis, November 17--22, 2019, Denver, CO
Published: 2019
Full Text: View/download PDF

24. Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping

Author: Dong, Wushi, Keceli, Murat, Vescovi, Rafael, Li, Hanyu, Adams, Corey, Jennings, Elise, Flender, Samuel, Uram, Tom, Vishwanath, Venkatram, Ferrier, Nicola, Kasthuri, Narayanan, and Littlewood, Peter
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing, Quantitative Biology - Neurons and Cognition
Abstract: Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling network (FFN) architecture has demonstrated leading performance for segmenting structures from this data. However, the training of the network is computationally expensive. In order to reduce the training time, we implemented synchronous and data-parallel distributed training using the Horovod library, which is different from the asynchronous training scheme used in the published FFN code. We demonstrated that our distributed training scaled well up to 2048 Intel Knights Landing (KNL) nodes on the Theta supercomputer. Our trained models achieved similar level of inference performance, but took less training time compared to previous methods. Our study on the effects of different batch sizes on FFN training suggests ways to further improve training efficiency. Our findings on optimal learning rate and batch sizes agree with previous works., Comment: 9 pages, 10 figures
Published: 2019

25. A Benchmarking Study to Evaluate Apache Spark on Large-Scale Supercomputers

Author: Thiruvathukal, George K., Christensen, Cameron, Jin, Xiaoyong, Tessier, François, and Vishwanath, Venkatram
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Software Engineering
Abstract: As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly dependent on sophisticated dataflows and out-of-core methods for efficient system utilization. In addition, as HPC systems grow, memory access and data sharing are becoming performance bottlenecks. Cloud computing employs a data processing paradigm typically built on a loosely connected group of low-cost computing nodes without relying upon shared storage and/or memory. Apache Spark is a popular engine for large-scale data analysis in the cloud, which we have successfully deployed via job submission scripts on production clusters. In this paper, we describe common parallel analysis dataflows for both Message Passing Interface (MPI) and cloud based applications. We developed an effective benchmark to measure the performance characteristics of these tasks using both types of systems, specifically comparing MPI/C-based analyses with Spark. The benchmark is a data processing pipeline representative of a typical analytics framework implemented using map-reduce. In the case of Spark, we also consider whether language plays a role by writing tests using both Python and Scala, a language built on the Java Virtual Machine (JVM). We include performance results from two large systems at Argonne National Laboratory including Theta, a Cray XC40 supercomputer on which our experiments run with 65,536 cores (1024 nodes with 64 cores each). The results of our experiments are discussed in the context of their applicability to future HPC architectures. Beyond understanding performance, our work demonstrates that technologies such as Spark, while typically aimed at multi-tenant cloud-based environments, show promise for data analysis needs in a traditional clustering/supercomputing environment., Comment: Submitted to IEEE Cloud 2019
Published: 2019

26. A terminology for in situ visualization and analysis systems

Author: Childs, Hank, Ahern, Sean D, Ahrens, James, Bauer, Andrew C, Bennett, Janine, Bethel, E Wes, Bremer, Peer-Timo, Brugger, Eric, Cottam, Joseph, Dorier, Matthieu, Dutta, Soumya, Favre, Jean M, Fogal, Thomas, Frey, Steffen, Garth, Christoph, Geveci, Berk, Godoy, William F, Hansen, Charles D, Harrison, Cyrus, Hentschel, Bernd, Insley, Joseph, Johnson, Chris R, Klasky, Scott, Knoll, Aaron, Kress, James, Larsen, Matthew, Lofstead, Jay, Ma, Kwan-Liu, Malakar, Preeti, Meredith, Jeremy, Moreland, Kenneth, Navrátil, Paul, O’Leary, Patrick, Parashar, Manish, Pascucci, Valerio, Patchett, John, Peterka, Tom, Petruzza, Steve, Podhorszki, Norbert, Pugmire, David, Rasquin, Michel, Rizzi, Silvio, Rogers, David H, Sane, Sudhanshu, Sauer, Franz, Sisneros, Robert, Shen, Han-Wei, Usher, Will, Vickery, Rhonda, Vishwanath, Venkatram, Wald, Ingo, Wang, Ruonan, Weber, Gunther H, Whitlock, Brad, Wolf, Matthew, Yu, Hongfeng, and Ziegeler, Sean B
Subjects: Information and Computing Sciences, Graphics, Augmented Reality and Games, In situ processing, scientific visualization, Distributed Computing, Applied computing, Distributed computing and systems software
Abstract: The term “in situ processing” has evolved over the last decade to mean both a specific strategy for visualizing and analyzing data and an umbrella term for a processing paradigm. The resulting confusion makes it difficult for visualization and analysis scientists to communicate with each other and with their stakeholders. To address this problem, a group of over 50 experts convened with the goal of standardizing terminology. This paper summarizes their findings and proposes a new terminology for describing in situ systems. An important finding from this group was that in situ systems are best described via multiple, distinct axes: integration type, proximity, access, division of execution, operation controls, and output type. This paper discusses these axes, evaluates existing systems within the axes, and explores how currently used terms relate to the axes.
Published: 2020

27. AI Benchmarking for Science: Efforts from the MLCommons Science Working Group

Author: Thiyagalingam, Jeyan, von Laszewski, Gregor, Yin, Junqi, Emani, Murali, Papay, Juri, Barrett, Gregg, Luszczek, Piotr, Tsaris, Aristeidis, Kirkpatrick, Christine, Wang, Feiyi, Gibbs, Tom, Vishwanath, Venkatram, Shankar, Mallikarjun, Fox, Geoffrey, Hey, Tony, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Anzt, Hartwig, editor, Bienz, Amanda, editor, Luszczek, Piotr, editor, and Baboulin, Marc, editor
Published: 2022
Full Text: View/download PDF

28. Resource-Aware Optimal Scheduling of In Situ Analysis

Author: Malakar, Preeti, Vishwanath, Venkatram, Knight, Christopher, Munson, Todd, Papka, Michael E., Hege, Hans-Christian, Series Editor, Hoffman, David, Series Editor, Johnson, Christopher R., Series Editor, Polthier, Konrad, Series Editor, Childs, Hank, editor, Bennett, Janine C., editor, and Garth, Christoph, editor
Published: 2022
Full Text: View/download PDF

29. TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators

Author: Xie, Zhen, primary, Raskar, Siddhisanket, additional, Emani, Murali, additional, and Vishwanath, Venkatram, additional
Published: 2023
Full Text: View/download PDF

30. ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

Author: Byna, Surendra, Breitenfeld, Scot, Dong, Bin, Koziol, Quincey, Pourmal, Elena, Robinson, Dana, Soumagne, Jerome, Tang, Houjun, Vishwanath, Venkatram, and Warren, Richard
Subjects: HDF5, Parallel I/O, Data querying, Data Management and Data Science, Distributed Computing and Systems Software, Information and Computing Sciences, Affordable and Clean Energy, parallel I, O, Hierarchical Data Format version 5, I, O performance, virtual object layer, HDF5 optimizations, Software Engineering, Information and computing sciences
Abstract: Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.
Published: 2020

31. PythonFOAM: In-situ data analyses with OpenFOAM and Python

Author: Maulik, Romit, Fytanidis, Dimitrios K., Lusch, Bethany, Vishwanath, Venkatram, and Patel, Saumil
Published: 2022
Full Text: View/download PDF

32. In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms, a State-of-the-art (STAR) Report

Author: Bethel, EW, Bauer, Andrew, Abbasi, Hasan, Ahrens, James, Childs, Hank, Geveci, Berk, Klasky, Scott, Moreland, Kenneth, O'Leary, Patrick, Vishwanath, Venkatram, and Whitlock, Brad
Subjects: in situ processing, scientific visualization, high performance computing, parallel computing, Artificial Intelligence and Image Processing, Software Engineering, Graphics, augmented reality and games
Abstract: The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization datawithout first writing to disk, i.e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed/visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, wherefine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Thirdis the ability to use all available resources, CPU’s and accelerators, in the computation of analysis products. This STAR paperbrings together researchers, developers and practitioners using in situ methods in extreme-scale HPC with the goal to presentexisting methods, infrastructures, and a range of computational science and engineering applications using in situ analysis andvisualization.
Published: 2016

33. HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures

Author: Habib, Salman, Pope, Adrian, Finkel, Hal, Frontiere, Nicholas, Heitmann, Katrin, Daniel, David, Fasel, Patricia, Morozov, Vitali, Zagaris, George, Peterka, Tom, Vishwanath, Venkatram, Lukic, Zarija, Sehrish, Saba, and Liao, Wei-keng
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Cosmology and Nongalactic Astrophysics
Abstract: Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles., Comment: 23 pages, 20 figures
Published: 2014
Full Text: View/download PDF

34. TECA: Petascale Pattern Recognition for Climate Science

Author: Prabhat, Byna, Surendra, Vishwanath, Venkatram, Dart, Eli, Wehner, Michael, and Collins, William D
Subjects: Information and Computing Sciences, Climate Action, Pattern detection, Climate science, High performance computing, Parallel I/O, Data mining, Petascale, Artificial Intelligence & Image Processing, Information and computing sciences
Abstract: Climate Change is one of the most pressing challenges facing humanity in the 21st century. Climate simulations provide us with a unique opportunity to examine effects of anthropogenic emissions. Highresolution climate simulations produce “Big Data”: contemporary climate archives are ≈ 5PB in size and we expect future archives to measure on the order of Exa-Bytes. In this work, we present the successful application of TECA (Toolkit for Extreme Climate Analysis) framework, for extracting extreme weather patterns such as Tropical Cyclones, Atmospheric Rivers and Extra-Tropical Cyclones from TB-sized simulation datasets. TECA has been run at full-scale on Cray XE6 and IBM BG/Q systems, and has reduced the runtime for pattern detection tasks from years to hours. TECA has been utilized to evaluate the performance of various computational models in reproducing the statistics of extreme weather events, and for characterizing the change in frequency of storm systems in the future.
Published: 2015

35. Thorough Characterization and Analysis of Large Transformer Model Training At-Scale

Author: Cheng, Scott, primary, Lin, Jun-Liang, additional, Emani, Murali, additional, Raskar, Siddhisanket, additional, Foreman, Sam, additional, Xie, Zhen, additional, Vishwanath, Venkatram, additional, and Kandemir, Mahmut Taylan, additional
Published: 2024
Full Text: View/download PDF

36. Harnessing synthetic data to address fraud in cross-border payments.

Author: Bryssinck, Johan, Jacobs, Tom, Simini, Filippo, Doddasomayajula, Ravi, Koder, Martin, Curbera, Francisco, Vishwanath, Venkatram, and Neti, Chalapathy
Subjects: FRAUD, ARTIFICIAL intelligence, ALGORITHMS, FRAUD investigation, INFORMATION sharing
Abstract: The sharing of data between financial institutions is widely recognised as a key component in the industry's efforts to combat fraud. Broader access to multiple sources of financial data is also critical to the development of high-quality fraud detection mechanisms based on artificial intelligence (AI). Given the challenges relating to sharing real financial data across countries and institutions, the use of synthetic data has recently become critical to enabling the exploration of broader data sharing and supporting open collaboration in AI model development. To generate synthetic data that can substitute for real data, computer algorithms closely mimic the key statistical properties of genuine data, while strictly preserving the privacy and sovereignty of the source data. This paper presents the results of an ongoing exploration into the generation of high-utility synthetic datasets of cross-border payment transactions using transformer models and discusses its application to the development of AI-based fraud prevention solutions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Characterizing the Performance of Triangle Counting on Graphcore's IPU Architecture

Author: Barik, Reet, primary, Raskar, Siddhisanket, additional, Emani, Murali, additional, and Vishwanath, Venkatram, additional
Published: 2023
Full Text: View/download PDF

38. Demonstration of Portable Performance of Scientific Machine Learning on High Performance Computing Systems

Author: Hossain, Khalid, primary, Balin, Riccardo, additional, Adams, Corey, additional, Uram, Thomas, additional, Kumaran, Kalyan, additional, Vishwanath, Venkatram, additional, Dey, Tanima, additional, Goswami, Subrata, additional, Lee, Janghaeng, additional, Ramer, Rebecca, additional, and Yamada, Koichi, additional
Published: 2023
Full Text: View/download PDF

39. Scalable Lead Prediction with Transformers using HPC resources

Author: Vasan, Archit, primary, Brettin, Thomas, additional, Stevens, Rick, additional, Ramanathan, Arvind, additional, and Vishwanath, Venkatram, additional
Published: 2023
Full Text: View/download PDF

40. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics

Author: Zvyagin, Maxim, primary, Brace, Alexander, additional, Hippe, Kyle, additional, Deng, Yuntian, additional, Zhang, Bin, additional, Bohorquez, Cindy Orozco, additional, Clyde, Austin, additional, Kale, Bharat, additional, Perez-Rivera, Danilo, additional, Ma, Heng, additional, Mann, Carla M., additional, Irvin, Michael, additional, Ozgulbas, Defne G., additional, Vassilieva, Natalia, additional, Pauloski, James Gregory, additional, Ward, Logan, additional, Hayot-Sasson, Valerie, additional, Emani, Murali, additional, Foreman, Sam, additional, Xie, Zhen, additional, Lin, Diangen, additional, Shukla, Maulik, additional, Nie, Weili, additional, Romero, Josh, additional, Dallago, Christian, additional, Vahdat, Arash, additional, Xiao, Chaowei, additional, Gibbs, Thomas, additional, Foster, Ian, additional, Davis, James J., additional, Papka, Michael E., additional, Brettin, Thomas, additional, Stevens, Rick, additional, Anandkumar, Anima, additional, Vishwanath, Venkatram, additional, and Ramanathan, Arvind, additional
Published: 2023
Full Text: View/download PDF

41. Asynchronous Decentralized Bayesian Optimization for Large Scale Hyperparameter Optimization

Author: Egelé, Romain, primary, Guyon, Isabelle, additional, Vishwanath, Venkatram, additional, and Balaprakash, Prasanna, additional
Published: 2023
Full Text: View/download PDF

42. Theta and Mira at Argonne National Laboratory

Author: Fahey, Mark R., primary, Alexeev, Yuri, additional, Allcock, Bill, additional, Allen, Benjamin S., additional, Balakrishnan, Ramesh, additional, Benali, Anouar, additional, Booker, Liza, additional, Boyle, Ashley, additional, Briggs, Laural, additional, Brooks, Edouard, additional, Carns, Phil, additional, Cerny, Beth, additional, Cherry, Andrew, additional, Childers, Lisa, additional, Chunduri, Sudheer, additional, Coffey, Richard, additional, Collins, James, additional, Coffman, Paul, additional, Coghlan, Susan, additional, DiBennardi, Kathy, additional, Doyle, Ginny, additional, Finkel, Hal, additional, Fletcher, Graham, additional, Garcia, Marta, additional, Goldberg, Ira, additional, Goletz, Cheetah, additional, Gregurich, Susan, additional, Harms, Kevin, additional, Holohan, Carissa, additional, Insley, Joseph A., additional, Jackson, Tommie, additional, Jaseckas, Janet, additional, Jennings, Elise, additional, Jensen, Derek, additional, Jiang, Wei, additional, Kaczmarski, Margaret, additional, Knight, Chris, additional, Knowles, Janet, additional, Kumaran, Kalyan, additional, Leggett, Ti, additional, Lenard, Ben, additional, Liu, Anping, additional, Loy, Ray, additional, Malakar, Preeti, additional, Mantrala, Avanthi, additional, Martin, David E., additional, Mayorga, Guillermo, additional, McPheeters, Gordon, additional, Messina, Paul, additional, Milner, Ryan, additional, Morozov, Vitali, additional, Nault, Zachary, additional, Nelson, Denise, additional, O’Connell, Jack, additional, Osborn, James, additional, Papka, Michael E., additional, Parker, Scott, additional, Patel, Pragnesh, additional, Patel, Saumil, additional, Pershey, Eric, additional, Plzak, Renee, additional, Pope, Adrian, additional, Punzel, Jared, additional, Ramprakash, Sreeranjani, additional, ‘Skip’ Reddy, John, additional, Rich, Paul, additional, Riley, Katherine, additional, Rizzi, Silvio, additional, Rojas, George, additional, Romero, Nichols A., additional, Scott, Robert, additional, Scovel, Adam, additional, Scullin, William, additional, Shemon, Emily, additional, Som, Haritha Siddabathuni, additional, Stover, Joan, additional, Suliba, Mirek, additional, Toonen, Brian, additional, Uram, Tom, additional, Vazquez-Mayagoitia, Alvaro, additional, Vishwanath, Venkatram, additional, Waldron, R. Douglas, additional, West, Gabe, additional, Williams, Timothy J., additional, Wills, Darin, additional, Wolf, Laura, additional, Woods, Wanda, additional, and Zhang, Michael, additional
Published: 2019
Full Text: View/download PDF

43. Data movement optimizations for independent MPI I/O on the Blue Gene/Q

Author: Malakar, Preeti and Vishwanath, Venkatram
Published: 2017
Full Text: View/download PDF

44. A survey of techniques for optimizing transformer inference

Author: Chitty-Venkata, Krishna Teja, primary, Mittal, Sparsh, additional, Emani, Murali, additional, Vishwanath, Venkatram, additional, and Somani, Arun K., additional
Published: 2023
Full Text: View/download PDF

45. Application power profiling on IBM Blue Gene/Q

Author: Wallace, Sean, Zhou, Zhou, Vishwanath, Venkatram, Coghlan, Susan, Tramm, John, Lan, Zhiling, and Papka, Michael E.
Published: 2016
Full Text: View/download PDF

46. Improving sparse data movement performance using multiple paths on the Blue Gene/Q supercomputer

Author: Bui, Huy, Jung, Eun-Sung, Vishwanath, Venkatram, Johnson, Andrew, Leigh, Jason, and Papka, Michael E.
Published: 2016
Full Text: View/download PDF

47. Workflow performance improvement using model-based scheduling over multiple clusters and clouds

Author: Maheshwari, Ketan, Jung, Eun-Sung, Meng, Jiayuan, Morozov, Vitali, Vishwanath, Venkatram, and Kettimuthu, Rajkumar
Published: 2016
Full Text: View/download PDF

48. Comparative dataset of experimental and computational attributes of UV/vis absorption spectra

Author: Beard, Edward J., Sivaraman, Ganesh, Vázquez-Mayagoitia, Álvaro, Vishwanath, Venkatram, and Cole, Jacqueline M.
Published: 2019
Full Text: View/download PDF

49. Fast Multiresolution Reads of Massive Simulation Datasets

Author: Kumar, Sidharth, Christensen, Cameron, Schmidt, John A., Bremer, Peer-Timo, Brugger, Eric, Vishwanath, Venkatram, Carns, Philip, Kolla, Hemanth, Grout, Ray, Chen, Jacqueline, Berzins, Martin, Scorzelli, Giorgio, Pascucci, Valerio, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, Kunkel, Julian Martin, editor, Ludwig, Thomas, editor, and Meuer, Hans Werner, editor
Published: 2014
Full Text: View/download PDF

50. A Generic High-Performance Method for Deinterleaving Scientific Data

Author: Schendel, Eric R., Harenberg, Steve, Tang, Houjun, Vishwanath, Venkatram, Papka, Michael E., Samatova, Nagiza F., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Wolf, Felix, editor, Mohr, Bernd, editor, and an Mey, Dieter, editor
Published: 2013
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

291 results on '"Vishwanath, Venkatram"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources