Author: "Arcand, William" / Search Limiters: Academic (Peer-Reviewed) Journals - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Arcand, William"' showing total 190 results

Start Over Author "Arcand, William" Search Limiters Academic (Peer-Reviewed) Journals

190 results on '"Arcand, William"'

1. GPU Sharing with Triples Mode

Author: Byun, Chansup, Reuther, Albert, Anderson, LaToya, Arcand, William, Bergeron, Bill, Bestor, David, Bonn, Alexander, Burrill, Daniel, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jananthan, Hayden, Jones, Michael, Luszczek, Piotr, Michaleas, Peter, Milechin, Lauren, Morales, Guillermo, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: There is a tremendous amount of interest in AI/ML technologies due to the proliferation of generative AI applications such as ChatGPT. This trend has significantly increased demand on GPUs, which are the workhorses for training AI models. Due to the high costs of GPUs and lacking supply, it has become of interest to optimize GPU usage in HPC centers. MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed an easy-to-use GPU sharing feature supported by LLSC-developed tools including LLsub and LLMapReduce. This approach overcomes some of the limitations with the existing methods for GPU sharing. This allows users to apply GPU sharing whenever possible while they are developing their AI/ML models and/or doing parametric study on their AI models or executing other GPU applications. Based on our initial experimental results with GPU sharing, GPU sharing with triples mode is easy to use and achieved significant improvement in GPU usage and throughput performance for certain types of AI applications.
Published: 2024

2. LLload: An Easy-to-Use HPC Utilization Tool

Author: Byun, Chansup, Reuther, Albert, Mullen, Julie, Anderson, LaToya, Arcand, William, Bergeron, Bill, Bestor, David, Bonn, Alexander, Burrill, Daniel, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jananthan, Hayden, Jones, Michael, Luszczek, Piotr, Michaleas, Peter, Milechin, Lauren, Morales, Guillermo, Prout, Andrew, Rosa, Antonio, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Performance
Abstract: The increasing use and cost of high performance computing (HPC) requires new easy-to-use tools to enable HPC users and HPC systems engineers to transparently understand the utilization of resources. The MIT Lincoln Laboratory Supercomputing Center (LLSC) has developed a simple command, LLload, to monitor and characterize HPC workloads. LLload plays an important role in identifying opportunities for better utilization of compute resources. LLload can be used to monitor jobs both programmatically and interactively. LLload can characterize users' jobs using various LLload options to achieve better efficiency. This information can be used to inform the user to optimize HPC workloads and improve both CPU and GPU utilization. This includes improvements using judicious oversubscription of the computing resources. Preliminary results suggest significant improvement in GPU utilization and overall throughput performance with GPU overloading in some cases. By enabling users to observe and fix incorrect job submission and/or inappropriate execution setups, LLload can increase the resource usage and improve the overall throughput performance. LLload is a light-weight, easy-to-use tool for both HPC users and HPC systems engineers to monitor HPC workloads to improve system utilization and efficiency.
Published: 2024

3. Supercomputer 3D Digital Twin for User Focused Real-Time Monitoring

Author: Bergeron, William, Hubbell, Matthew, Mojica, Daniel, Reuther, Albert, Arcand, William, Bestor, David, Burrill, Daniel, Chansup, Byun, Gadepally, Vijay, Houle, Michael, Jananthan, Hayden, Jones, Michael, Luszczek, Piotr, Michaleas, Peter, Milechin, Lauren, Prout, Julie Mullen Andrew, Rosa, Antonio, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Real-time supercomputing performance analysis is a critical aspect of evaluating and optimizing computational systems in a dynamic user environment. The operation of supercomputers produce vast quantities of analytic data from multiple sources and of varying types so compiling this data in an efficient matter is critical to the process. MIT Lincoln Laboratory Supercomputing Center has been utilizing the Unity 3D game engine to create a Digital Twin of our supercomputing systems for several years to perform system monitoring. Unity offers robust visualization capabilities making it ideal for creating a sophisticated representation of the computational processes. As we scale the systems to include a diversity of resources such as accelerators and the addition of more users, we need to implement new analysis tools for the monitoring system. The workloads in research continuously change, as does the capability of Unity, and this allows us to adapt our monitoring tools to scale and incorporate features enabling efficient replay of system wide events, user isolation, and machine level granularity. Our system fully takes advantage of the modern capabilities of the Unity Engine in a way that intuitively represents the real time workload performed on a supercomputer. It allows HPC system engineers to quickly diagnose usage related errors with its responsive user interface which scales efficiently with large data sets.
Published: 2024

4. HPC with Enhanced User Separation

Author: Prout, Andrew, Reuther, Albert, Houle, Michael, Jones, Michael, Michaleas, Peter, Anderson, LaToya, Arcand, William, Bergeron, Bill, Bestor, David, Bonn, Alex, Burrill, Daniel, Byun, Chansup, Gadepally, Vijay, Hubbell, Matthew, Jananthan, Hayden, Luszczek, Piotr, Milechin, Lauren, Morales, Guillermo, Mullen, Julie, Rosa, Antonio, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: HPC systems used for research run a wide variety of software and workflows. This software is often written or modified by users to meet the needs of their research projects, and rarely is built with security in mind. In this paper we explore several of the key techniques that MIT Lincoln Laboratory Supercomputing Center has deployed on its systems to manage the security implications of these workflows by providing enforced separation for processes, filesystem access, network traffic, and accelerators to make every user feel like they are running on a personal HPC.
Published: 2024

5. Anonymized Network Sensing Graph Challenge

Author: Jananthan, Hayden, Jones, Michael, Arcand, William, Bestor, David, Bergeron, William, Burrill, Daniel, Buluc, Aydin, Byun, Chansup, Davis, Timothy, Gadepally, Vijay, Grant, Daniel, Houle, Michael, Hubbell, Matthew, Luszczek, Piotr, Michaleas, Peter, Milechin, Lauren, Milner, Chasen, Morales, Guillermo, Morris, Andrew, Mullen, Julie, Patel, Ritesh, Pentland, Alex, Pisharody, Sandeep, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Wachman, Gabriel, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Discrete Mathematics, Computer Science - Performance, Computer Science - Software Engineering, Mathematics - Combinatorics
Abstract: The MIT/IEEE/Amazon GraphChallenge encourages community approaches to developing new solutions for analyzing graphs and sparse data derived from social media, sensor feeds, and scientific data to discover relationships between events as they unfold in the field. The anonymized network sensing Graph Challenge seeks to enable large, open, community-based approaches to protecting networks. Many large-scale networking problems can only be solved with community access to very broad data sets with the highest regard for privacy and strong community buy-in. Such approaches often require community-based data sharing. In the broader networking community (commercial, federal, and academia) anonymized source-to-destination traffic matrices with standard data sharing agreements have emerged as a data product that can meet many of these requirements. This challenge provides an opportunity to highlight novel approaches for optimizing the construction and analysis of anonymized traffic matrices using over 100 billion network packets derived from the largest Internet telescope in the world (CAIDA). This challenge specifies the anonymization, construction, and analysis of these traffic matrices. A GraphBLAS reference implementation is provided, but the use of GraphBLAS is not required in this Graph Challenge. As with prior Graph Challenges the goal is to provide a well-defined context for demonstrating innovation. Graph Challenge participants are free to select (with accompanying explanation) the Graph Challenge elements that are appropriate for highlighting their innovations., Comment: Accepted to IEEE HPEC 2024
Published: 2024

6. What is Normal? A Big Data Observational Science Model of Anonymized Internet Traffic

Author: Kepner, Jeremy, Jananthan, Hayden, Jones, Michael, Arcand, William, Bestor, David, Bergeron, William, Burrill, Daniel, Buluc, Aydin, Byun, Chansup, Davis, Timothy, Gadepally, Vijay, Grant, Daniel, Houle, Michael, Hubbell, Matthew, Luszczek, Piotr, Milechin, Lauren, Milner, Chasen, Morales, Guillermo, Morris, Andrew, Mullen, Julie, Patel, Ritesh, Pentland, Alex, Pisharody, Sandeep, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Wachman, Gabriel, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Cryptography and Security, Computer Science - Computers and Society, Computer Science - Social and Information Networks
Abstract: Understanding what is normal is a key aspect of protecting a domain. Other domains invest heavily in observational science to develop models of normal behavior to better detect anomalies. Recent advances in high performance graph libraries, such as the GraphBLAS, coupled with supercomputers enables processing of the trillions of observations required. We leverage this approach to synthesize low-parameter observational models of anonymized Internet traffic with a high regard for privacy., Comment: Accepted to IEEE HPEC, 7 pages, 6 figures, 1 table, 41 references
Published: 2024

7. LLload: Simplifying Real-Time Job Monitoring for HPC Users

Author: Byun, Chansup, Mullen, Julia, Reuther, Albert, Arcand, William, Bergeron, William, Bestor, David, Burrill, Daniel, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jananthan, Hayden, Jones, Michael, Michaleas, Peter, Morales, Guillermo, Prout, Andrew, Rosa, Antonio, Yee, Charles, Kepner, Jeremy, and Milechin, Lauren
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance
Abstract: One of the more complex tasks for researchers using HPC systems is performance monitoring and tuning of their applications. Developing a practice of continuous performance improvement, both for speed-up and efficient use of resources is essential to the long term success of both the HPC practitioner and the research project. Profiling tools provide a nice view of the performance of an application but often have a steep learning curve and rarely provide an easy to interpret view of resource utilization. Lower level tools such as top and htop provide a view of resource utilization for those familiar and comfortable with Linux but a barrier for newer HPC practitioners. To expand the existing profiling and job monitoring options, the MIT Lincoln Laboratory Supercomputing Center created LLoad, a tool that captures a snapshot of the resources being used by a job on a per user basis. LLload is a tool built from standard HPC tools that provides an easy way for a researcher to track resource usage of active jobs. We explain how the tool was designed and implemented and provide insight into how it is used to aid new researchers in developing their performance monitoring skills as well as guide researchers in their resource requests.
Published: 2024

8. Mapping of Internet 'Coastlines' via Large Scale Anonymized Network Source Correlations

Author: Jananthan, Hayden, Kepner, Jeremy, Jones, Michael, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Davis, Timothy, Gadepally, Vijay, Grant, Daniel, Houle, Michael, Hubbell, Matthew, Klein, Anna, Milechin, Lauren, Morales, Guillermo, Morris, Andrew, Mullen, Julie, Patel, Ritesh, Pentland, Alex, Pisharody, Sandeep, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Trigg, Tyler, Wachman, Gabriel, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Social and Information Networks
Abstract: Expanding the scientific tools available to protect computer networks can be aided by a deeper understanding of the underlying statistical distributions of network traffic and their potential geometric interpretations. Analyses of large scale network observations provide a unique window into studying those underlying statistics. Newly developed GraphBLAS hypersparse matrices and D4M associative array technologies enable the efficient anonymized analysis of network traffic on the scale of trillions of events. This work analyzes over 100,000,000,000 anonymized packets from the largest Internet telescope (CAIDA) and over 10,000,000 anonymized sources from the largest commercial honeyfarm (GreyNoise). Neither CAIDA nor GreyNoise actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Analysis of these observations confirms the previously observed Cauchy-like distributions describing temporal correlations between Internet sources. The Gull lighthouse problem is a well-known geometric characterization of the standard Cauchy distribution and motivates a potential geometric interpretation for Internet observations. This work generalizes the Gull lighthouse problem to accommodate larger classes of coastlines, deriving a closed-form solution for the resulting probability distributions, stating and examining the inverse problem of identifying an appropriate coastline given a continuous probability distribution, identifying a geometric heuristic for solving this problem computationally, and applying that heuristic to examine the temporal geometry of different subsets of network observations. Application of this method to the CAIDA and GreyNoise data reveals a several orders of magnitude difference between known benign and other traffic which can lead to potentially novel ways to protect networks., Comment: 9 pages, 7 figures, IEEE HPEC 2023 (accepted)
Published: 2023

9. pPython Performance Study

Author: Byun, Chansup, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jananthan, Hayden, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Morales, Guillermo, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance, Computer Science - Programming Languages
Abstract: pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any combination of heterogeneous systems that support Python, including on a cluster through a Slurm scheduler interface so that pPython can be executed in a massively parallel computing environment. It is interesting to see what performance pPython can achieve compared to the traditional socket-based MPI communication because of its unique file-based messaging implementation. In this paper, we present the point-to-point and collective communication performances of pPython and compare them with those obtained by using mpi4py with OpenMPI. For large messages, pPython demonstrates comparable performance as compared to mpi4py., Comment: arXiv admin note: substantial text overlap with arXiv:2208.14908
Published: 2023
Full Text: View/download PDF

10. Deployment of Real-Time Network Traffic Analysis using GraphBLAS Hypersparse Matrices and D4M Associative Arrays

Author: Jones, Michael, Kepner, Jeremy, Prout, Andrew, Davis, Timothy, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Gadepally, Vijay, Houle, Micheal, Hubbell, Matthew, Jananthan, Hayden, Klein, Anna, Milechin, Lauren, Morales, Guillermo, Mullen, Julie, Patel, Ritesh, Pisharody, Sandeep, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Social and Information Networks
Abstract: Matrix/array analysis of networks can provide significant insight into their behavior and aid in their operation and protection. Prior work has demonstrated the analytic, performance, and compression capabilities of GraphBLAS (graphblas.org) hypersparse matrices and D4M (d4m.mit.edu) associative arrays (a mathematical superset of matrices). Obtaining the benefits of these capabilities requires integrating them into operational systems, which comes with its own unique challenges. This paper describes two examples of real-time operational implementations. First, is an operational GraphBLAS implementation that constructs anonymized hypersparse matrices on a high-bandwidth network tap. Second, is an operational D4M implementation that analyzes daily cloud gateway logs. The architectures of these implementations are presented. Detailed measurements of the resources and the performance are collected and analyzed. The implementations are capable of meeting their operational requirements using modest computational resources (a couple of processing cores). GraphBLAS is well-suited for low-level analysis of high-bandwidth connections with relatively structured network data. D4M is well-suited for higher-level analysis of more unstructured data. This work demonstrates that these technologies can be implemented in operational settings., Comment: Accepted to IEEE HPEC, 8 pages, 8 figures, 1 table, 69 references. arXiv admin note: text overlap with arXiv:2203.13934. text overlap with arXiv:2309.01806
Published: 2023
Full Text: View/download PDF

11. Focusing and Calibration of Large Scale Network Sensors using GraphBLAS Anonymized Hypersparse Matrices

Author: Kepner, Jeremy, Jones, Michael, Dykstra, Phil, Byun, Chansup, Davis, Timothy, Jananthan, Hayden, Arcand, William, Bestor, David, Bergeron, William, Gadepally, Vijay, Houle, Micheal, Hubbell, Matthew, Klein, Anna, Milechin, Lauren, Morales, Guillermo, Mullen, Julie, Patel, Ritesh, Pentland, Alex, Pisharody, Sandeep, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Trigg, Tyler, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Social and Information Networks, Mathematics - Probability
Abstract: Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibration procedures on a multi-billion packet dataset using high-performance GraphBLAS anonymized hypersparse matrices. The run-time performance on a real-world data set confirms previously observed real-time processing rates for high-bandwidth links while achieving significant data compression. The output of the analysis demonstrates the effectiveness of these procedures at focusing the traffic matrix and revealing the underlying stable heavy-tail statistical distributions that are necessary for anomaly detection. A simple model of the corresponding probability of detection ($p_{\rm d}$) and probability of false alarm ($p_{\rm fa}$) for these distributions highlights the criticality of network sensor focusing and calibration. Once a sensor is properly focused and calibrated it is then in a position to carry out two of the central tenets of good cybersecurity: (1) continuous observation of the network and (2) minimizing unbrokered network connections., Comment: Accepted to IEEE HPEC, 9 pages, 12 figures, 1 table, 63 references, 2 appendices
Published: 2023
Full Text: View/download PDF

12. Hypersparse Network Flow Analysis of Packets with GraphBLAS

Author: Trigg, Tyler, Meiners, Chad, Pisharody, Sandeep, Jananthan, Hayden, Jones, Michael, Michaleas, Adam, Davis, Timothy, Welch, Erik, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Gadepally, Vijay, Houle, Micheal, Hubbell, Matthew, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Stetson, Doug, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multitemporal spatial analyses are then performed on each subrange to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link)., Comment: arXiv admin note: text overlap with arXiv:2203.13934, arXiv:2108.06653, arXiv:2008.00307
Published: 2022

13. Python Implementation of the Dynamic Distributed Dimensional Data Model

Author: Jananthan, Hayden, Milechin, Lauren, Jones, Michael, Arcand, William, Bergeron, William, Bestor, David, Byun, Chansup, Houle, Michael, Hubbell, Matthew, Gadepally, Vijay, Klein, Anna, Michaleas, Peter, Morales, Guillermo, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a highly composable, unified data model with strong performance built to handle big data fast and efficiently. In this work we present an implementation of D4M in Python. $D4M.py$ implements all foundational functionality of D4M and includes Accumulo and SQL database support via Graphulo. We describe the mathematical background and motivation, an explanation of the approaches made for its fundamental functions and building blocks, and performance results which compare $D4M.py$'s performance to D4M-MATLAB and D4M.jl., Comment: 8 pages, 7 figures, accepted to HPEC 2022
Published: 2022
Full Text: View/download PDF

14. pPython for Parallel Python Programming

Author: Byun, Chansup, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jananthan, Hayden, Jones, Michael, Keville, Kurt, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Morales, Guillermo, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. The core data structure in pPython is a distributed numerical array whose distribution onto multiple processors is specified with a map construct. Communication operations between distributed arrays are abstracted away from the user and pPython transparently supports redistribution between any block-cyclic-overlapped distributions in up to four dimensions. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on any combination of heterogeneous systems that support Python, including Windows, Linux, and MacOS operating systems. In addition to running transparently on single-node (e.g., a laptop), pPython provides a scheduler interface, so that pPython can be executed in a massively parallel computing environment. The initial implementation uses the Slurm scheduler. Performance of pPython on the HPC Challenge benchmark suite demonstrates both ease of programming and scalability., Comment: arXiv admin note: substantial text overlap with arXiv:astro-ph/0606464
Published: 2022
Full Text: View/download PDF

15. The MIT Supercloud Workload Classification Challenge

Author: Tang, Benny J., Chen, Qiqi, Weiss, Matthew L., Frey, Nathan, McDonald, Joseph, Bestor, David, Yee, Charles, Arcand, William, Byun, Chansup, Edelman, Daniel, Hubbell, Matthew, Jones, Michael, Kepner, Jeremy, Klein, Anna, Michaleas, Adam, Michaleas, Peter, Milechin, Lauren, Mullen, Julia, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Bowne, Andrew, McEvoy, Lindsey, Li, Baolin, Tiwari, Devesh, Gadepally, Vijay, and Samsi, Siddharth
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute workloads and their utilization characteristics, HPC systems may be able to better match available resources with the application demand. By leveraging datacenter instrumentation, it may be possible to develop AI-based approaches that can identify workloads and provide feedback to researchers and datacenter operators for improving operational efficiency. To enable this research, we released the MIT Supercloud Dataset, which provides detailed monitoring logs from the MIT Supercloud cluster. This dataset includes CPU and GPU usage by jobs, memory usage, and file system logs. In this paper, we present a workload classification challenge based on this dataset. We introduce a labelled dataset that can be used to develop new approaches to workload classification and present initial results based on existing approaches. The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads that can achieve higher accuracy than existing methods. Data and code will be made publicly available via the Datacenter Challenge website : https://dcc.mit.edu., Comment: Accepted at IPDPS ADOPT'22
Published: 2022
Full Text: View/download PDF

16. GraphBLAS on the Edge: Anonymized High Performance Streaming of Network Traffic

Author: Jones, Michael, Kepner, Jeremy, Andersen, Daniel, Buluc, Aydin, Byun, Chansup, Claffy, K, Davis, Timothy, Arcand, William, Bernays, Jonathan, Bestor, David, Bergeron, William, Gadepally, Vijay, Houle, Micheal, Hubbell, Matthew, Jananthan, Hayden, Klein, Anna, Meiners, Chad, Milechin, Lauren, Mullen, Julie, Pisharody, Sandeep, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Sreekanth, Jon, Stetson, Doug, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Operating Systems, Computer Science - Social and Information Networks
Abstract: Long range detection is a cornerstone of defense in many operating domains (land, sea, undersea, air, space, ..,). In the cyber domain, long range detection requires the analysis of significant network traffic from a variety of observatories and outposts. Construction of anonymized hypersparse traffic matrices on edge network devices can be a key enabler by providing significant data compression in a rapidly analyzable format that protects privacy. GraphBLAS is ideally suited for both constructing and analyzing anonymized hypersparse traffic matrices. The performance of GraphBLAS on an Accolade Technologies edge network device is demonstrated on a near worse case traffic scenario using a continuous stream of CAIDA Telescope darknet packets. The performance for varying numbers of traffic buffers, threads, and processor cores is explored. Anonymized hypersparse traffic matrices can be constructed at a rate of over 50,000,000 packets per second; exceeding a typical 400 Gigabit network link. This performance demonstrates that anonymized hypersparse traffic matrices are readily computable on edge network devices with minimal compute resources and can be a viable data product for such devices., Comment: Accepted to IEEE HPEC, Outstanding Paper Award, 8 pages, 8 figures, 1 table, 70 references. arXiv admin note: text overlap with arXiv:2108.06653, arXiv:2008.00307, arXiv:2203.10230
Published: 2022
Full Text: View/download PDF

17. Temporal Correlation of Internet Observatories and Outposts

Author: Kepner, Jeremy, Jones, Michael, Andersen, Daniel, Buluç, Aydın, Byun, Chansup, Claffy, K, Davis, Timothy, Arcand, William, Bernays, Jonathan, Bestor, David, Bergeron, William, Gadepally, Vijay, Grant, Daniel, Houle, Micheal, Hubbell, Matthew, Jananthan, Hayden, Klein, Anna, Meiners, Chad, Milechin, Lauren, Morris, Andrew, Mullen, Julie, Pisharody, Sandeep, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Stetson, Doug, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Social and Information Networks
Abstract: The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gaining insights into these distributions. This work compares observed sources from the largest Internet telescope (the CAIDA darknet telescope) with those from a commercial outpost (the GreyNoise honeyfarm). Neither of these locations actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Newly developed GraphBLAS hyperspace matrices and D4M associative array technologies enable the efficient analysis of these data on significant scales. The CAIDA sources are well approximated by a Zipf-Mandelbrot distribution. Over a 6-month period 70\% of the brightest (highest frequency) sources in the CAIDA telescope are consistently detected by coeval observations in the GreyNoise honeyfarm. This overlap drops as the sources dim (reduce frequency) and as the time difference between the observations grows. The probability of seeing a CAIDA source is proportional to the logarithm of the brightness. The temporal correlations are well described by a modified Cauchy distribution. These observations are consistent with a correlated high frequency beam of sources that drifts on a time scale of a month., Comment: 8 pages, 8 figures, 2 tables, 59 references; accepted to GrAPL 2022. arXiv admin note: substantial text overlap with arXiv:2108.06653
Published: 2022
Full Text: View/download PDF

18. New Phenomena in Large-Scale Internet Traffic

Author: Kepner, Jeremy, Cho, Kenjiro, Claffy, KC, Gadepally, Vijay, McGuire, Sarah, Milechin, Lauren, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Hubbell, Matthew, Houle, Michael, Jones, Michael, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Computers and Society, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Social and Information Networks
Abstract: The Internet is transforming our society, necessitating a quantitative understanding of Internet traffic. Our team collects and curates the largest publicly available Internet traffic data sets. An analysis of 50 billion packets using 10,000 processors in the MIT SuperCloud reveals a new phenomenon: the importance of otherwise unseen leaf nodes and isolated links in Internet traffic. Our analysis further shows that a two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide variety of source/destination statistics on moving sample windows ranging from 100{,}000 to 100{,}000{,}000 packets over collections that span years and continents. The measured model parameters distinguish different network streams, and the model leaf parameter strongly correlates with the fraction of the traffic in different underlying network topologies., Comment: 53 pages, 27 figures, 8 tables, 121 references. Portions of this work originally appeared as arXiv:1904.04396v1 which has been split for publication in the book "Massive Graph Analytics" (edited by David Bader)
Published: 2022
Full Text: View/download PDF

19. 3D Real-Time Supercomputer Monitoring

Author: Bergeron, Bill, Hubbell, Matthew, Sequeira, Dylan, Williams, Winter, Arcand, William, Bestor, David, Chansup, Byun, Gadepally, Vijay, Houle, Michael, Jones, Michael, Klien, Anna, Michaleas, Peter, Milechin, Lauren, Prout, Julie Mullen Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Supercomputers are complex systems producing vast quantities of performance data from multiple sources and of varying types. Performance data from each of the thousands of nodes in a supercomputer tracks multiple forms of storage, memory, networks, processors, and accelerators. Optimization of application performance is critical for cost effective usage of a supercomputer and requires efficient methods for effectively viewing performance data. The combination of supercomputing analytics and 3D gaming visualization enables real-time processing and visual data display of massive amounts of information that humans can process quickly with little training. Our system fully utilizes the capabilities of modern 3D gaming environments to create novel representations of computing hardware which intuitively represent the physical attributes of the supercomputer while displaying real-time alerts and component utilization. This system allows operators to quickly assess how the supercomputer is being used, gives users visibility into the resources they are consuming, and provides instructors new ways to interactively teach the computing architecture concepts necessary for efficient computing
Published: 2021
Full Text: View/download PDF

20. Supercomputing Enabled Deployable Analytics for Disaster Response

Author: Samuel, Kaira, Kepner, Jeremy, Jones, Michael, Milechin, Lauren, Gadepally, Vijay, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Hubbell, Matthew, Houle, Michael, Klein, Anna, Lopez, Victor, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Sid, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Databases, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Graphics, Computer Science - Human-Computer Interaction, Computer Science - Multimedia
Abstract: First responders and other forward deployed essential workers can benefit from advanced analytics. Limited network access and software security requirements prevent the usage of standard cloud based microservice analytic platforms that are typically used in industry. One solution is to precompute a wide range of analytics as files that can be used with standard preinstalled software that does not require network access or additional software and can run on a wide range of legacy hardware. In response to the COVID-19 pandemic, this approach was tested for providing geo-spatial census data to allow quick analysis of demographic data for better responding to emergencies. These data were processed using the MIT SuperCloud to create several thousand Google Earth and Microsoft Excel files representative of many advanced analytics. The fast mapping of census data using Google Earth and Microsoft Excel has the potential to give emergency responders a powerful tool to improve emergency preparedness. Our approach displays relevant census data (total population, population under 15, population over 65, median age) per census block, sorted by county, through a Microsoft Excel spreadsheet (xlsx file) and Google Earth map (kml file). The spreadsheet interface includes features that allow users to convert between different longitude and latitude coordinate units. For the Google Earth files, a variety of absolute and relative colors maps of population density have been explored to provide an intuitive and meaningful interface. Using several hundred cores on the MIT SuperCloud, new analytics can be generated in a few minutes., Comment: 5 pages, 11 figures, 17 references, accepted to IEEE HPEC 2021
Published: 2021
Full Text: View/download PDF

21. Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

Author: Byun, Chansup, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations of short running jobs on MIT SuperCloud systems, that allows the resources to be fully utilized for both long running batch jobs while simultaneously providing fast launch and release of large-scale short running jobs. The node-based scheduling approach has demonstrated up to 100 times faster scheduler performance that other state-of-the-art systems., Comment: IEEE HPEC 2021
Published: 2021
Full Text: View/download PDF

22. Spatial Temporal Analysis of 40,000,000,000,000 Internet Darkspace Packets

Author: Kepner, Jeremy, Jones, Michael, Andersen, Daniel, Buluc, Aydin, Byun, Chansup, Claffy, K, Davis, Timothy, Arcand, William, Bernays, Jonathan, Bestor, David, Bergeron, William, Gadepally, Vijay, Houle, Micheal, Hubbell, Matthew, Klein, Anna, Meiners, Chad, Milechin, Lauren, Mullen, Julie, Pisharody, Sandeep, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Stetson, Doug, Tse, Adam, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance, Computer Science - Social and Information Networks
Abstract: The Internet has never been more important to our society, and understanding the behavior of the Internet is essential. The Center for Applied Internet Data Analysis (CAIDA) Telescope observes a continuous stream of packets from an unsolicited darkspace representing 1/256 of the Internet. During 2019 and 2020 over 40,000,000,000,000 unique packets were collected representing the largest ever assembled public corpus of Internet traffic. Using the combined resources of the Supercomputing Centers at UC San Diego, Lawrence Berkeley National Laboratory, and MIT, the spatial temporal structure of anonymized source-destination pairs from the CAIDA Telescope data has been analyzed with GraphBLAS hierarchical hypersparse matrices. These analyses provide unique insight on this unsolicited Internet darkspace traffic with the discovery of many previously unseen scaling relations. The data show a significant sustained increase in unsolicited traffic corresponding to the start of the COVID19 pandemic, but relatively little change in the underlying scaling relations associated with unique sources, source fan-outs, unique links, destination fan-ins, and unique destinations. This work provides a demonstration of the practical feasibility and benefit of the safe collection and analysis of significant quantities of anonymized Internet traffic., Comment: 8 pages, 9 figures, 2 tables, 43 references, accepted to IEEE HPEC 2021. arXiv admin note: substantial text overlap with arXiv:2008.00307
Published: 2021
Full Text: View/download PDF

23. Vertical, Temporal, and Horizontal Scaling of Hierarchical Hypersparse GraphBLAS Matrices

Author: Kepner, Jeremy, Davis, Tim, Byun, Chansup, Arcand, William, Bestor, David, Bergeron, William, Gadepally, Vijay, Hubbell, Matthew, Houle, Michael, Jones, Michael, Klein, Anna, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Discrete Mathematics, Computer Science - Mathematical Software, Computer Science - Networking and Internet Architecture, Computer Science - Performance
Abstract: Hypersparse matrices are a powerful enabler for a variety of network, health, finance, and social applications. Hierarchical hypersparse GraphBLAS matrices enable rapid streaming updates while preserving algebraic analytic power and convenience. In many contexts, the rate of these updates sets the bounds on performance. This paper explores hierarchical hypersparse update performance on a variety of hardware with identical software configurations. The high-level language bindings of the GraphBLAS readily enable performance experiments on simultaneous diverse hardware. The best single process performance measured was 4,000,000 updates per second. The best single node performance measured was 170,000,000 updates per second. The hardware used spans nearly a decade and allows a direct comparison of hardware improvements for this computation over this time range; showing a 2x increase in single-core performance, a 3x increase in single process performance, and a 5x increase in single node performance. Running on nearly 2,000 MIT SuperCloud nodes simultaneously achieved a sustained update rate of over 200,000,000,000 updates per second. Hierarchical hypersparse GraphBLAS allows the MIT SuperCloud to analyze extremely large streaming network data sets., Comment: 6 pages, 5 figures, 32 references, accepted to IEEE HPEC 2021. arXiv admin note: text overlap with arXiv:2001.06935
Published: 2021
Full Text: View/download PDF

24. The MIT Supercloud Dataset

Author: Samsi, Siddharth, Weiss, Matthew L, Bestor, David, Li, Baolin, Jones, Michael, Reuther, Albert, Edelman, Daniel, Arcand, William, Byun, Chansup, Holodnack, John, Hubbell, Matthew, Kepner, Jeremy, Klein, Anna, McDonald, Joseph, Michaleas, Adam, Michaleas, Peter, Milechin, Lauren, Mullen, Julia, Yee, Charles, Price, Benjamin, Prout, Andrew, Rosa, Antonio, Vanterpool, Allan, McEvoy, Lindsey, Cheng, Anson, Tiwari, Devesh, and Gadepally, Vijay
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Artificial intelligence (AI) and Machine learning (ML) workloads are an increasingly larger share of the compute workloads in traditional High-Performance Computing (HPC) centers and commercial cloud systems. This has led to changes in deployment approaches of HPC clusters and the commercial cloud, as well as a new focus on approaches to optimized resource usage, allocations and deployment of new AI frame- works, and capabilities such as Jupyter notebooks to enable rapid prototyping and deployment. With these changes, there is a need to better understand cluster/datacenter operations with the goal of developing improved scheduling policies, identifying inefficiencies in resource utilization, energy/power consumption, failure prediction, and identifying policy violations. In this paper we introduce the MIT Supercloud Dataset which aims to foster innovative AI/ML approaches to the analysis of large scale HPC and datacenter/cloud operations. We provide detailed monitoring logs from the MIT Supercloud system, which include CPU and GPU usage by jobs, memory usage, file system logs, and physical monitoring data. This paper discusses the details of the dataset, collection methodology, data availability, and discusses potential challenge problems being developed using this data. Datasets and future challenge announcements will be available via https://dcc.mit.edu.
Published: 2021

25. Accuracy and Performance Comparison of Video Action Recognition Approaches

Author: Hutchinson, Matthew, Samsi, Siddharth, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Houle, Micheal, Hubbell, Matthew, Jones, Micheal, Kepner, Jeremy, Kirby, Andrew, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Reuther, Albert, Yee, Charles, and Gadepally, Vijay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Performance
Abstract: Over the past few years, there has been significant interest in video action recognition systems and models. However, direct comparison of accuracy and computational performance results remain clouded by differing training environments, hardware specifications, hyperparameters, pipelines, and inference methods. This article provides a direct comparison between fourteen off-the-shelf and state-of-the-art models by ensuring consistency in these training characteristics in order to provide readers with a meaningful comparison across different types of video action recognition algorithms. Accuracy of the models is evaluated using standard Top-1 and Top-5 accuracy metrics in addition to a proposed new accuracy metric. Additionally, we compare computational performance of distributed training from two to sixty-four GPUs on a state-of-the-art HPC system., Comment: Accepted for publication at IEEE HPEC 2020
Published: 2020
Full Text: View/download PDF

26. Best of Both Worlds: High Performance Interactive and Batch Launching

Author: Byun, Chansup, Kepner, Jeremy, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Kirby, Andrew, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Rapid launch of thousands of jobs is essential for effective interactive supercomputing, big data analysis, and AI algorithm development. Achieving thousands of launches per second has required hardware to be available to receive these jobs. This paper presents a novel preemptive approach to implement spot jobs on MIT SuperCloud systems allowing the resources to be fully utilized for both long running batch jobs while still providing fast launch for interactive jobs. The new approach separates the job preemption and scheduling operations and can achieve 100 times faster performance in the scheduling of a job with preemption when compared to using the standard scheduler-provided automatic preemption-based capability. The results demonstrate that the new approach can schedule interactive jobs preemptively at a performance comparable to when the required computing resources are idle and available. The spot job capability can be deployed without disrupting the interactive user experience while increasing the overall system utilization.
Published: 2020
Full Text: View/download PDF

27. Multi-Temporal Analysis and Scaling Relations of 100,000,000,000 Network Packets

Author: Kepner, Jeremy, Meiners, Chad, Byun, Chansup, McGuire, Sarah, Davis, Timothy, Arcand, William, Bernays, Jonathan, Bestor, David, Bergeron, William, Gadepally, Vijay, Harnasch, Raul, Hubbell, Matthew, Houle, Micheal, Jones, Micheal, Kirby, Andrew, Klein, Anna, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Stetson, Doug, Tse, Adam, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Social and Information Networks
Abstract: Our society has never been more dependent on computer networks. Effective utilization of networks requires a detailed understanding of the normal background behaviors of network traffic. Large-scale measurements of networks are computationally challenging. Building on prior work in interactive supercomputing and GraphBLAS hypersparse hierarchical traffic matrices, we have developed an efficient method for computing a wide variety of streaming network quantities on diverse time scales. Applying these methods to 100,000,000,000 anonymized source-destination pairs collected at a network gateway reveals many previously unobserved scaling relationships. These observations provide new insights into normal network background traffic that could be used for anomaly detection, AI feature engineering, and testing theoretical models of streaming networks., Comment: 6 pages, 6 figures,3 tables, 49 references, accepted to IEEE HPEC 2020
Published: 2020
Full Text: View/download PDF

28. Fast Mapping onto Census Blocks

Author: Kepner, Jeremy, Kipf, Andreas, Engwirda, Darren, Vembar, Navin, Jones, Michael, Milechin, Lauren, Gadepally, Vijay, Hill, Chris, Kraska, Tim, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Hubbell, Matthew, Houle, Michael, Kirby, Andrew, Klein, Anna, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Sid, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Databases, Computer Science - Performance
Abstract: Pandemic measures such as social distancing and contact tracing can be enhanced by rapidly integrating dynamic location data and demographic data. Projecting billions of longitude and latitude locations onto hundreds of thousands of highly irregular demographic census block polygons is computationally challenging in both research and deployment contexts. This paper describes two approaches labeled "simple" and "fast". The simple approach can be implemented in any scripting language (Matlab/Octave, Python, Julia, R) and is easily integrated and customized to a variety of research goals. This simple approach uses a novel combination of hierarchy, sparse bounding boxes, polygon crossing-number, vectorization, and parallel processing to achieve 100,000,000+ projections per second on 100 servers. The simple approach is compact, does not increase data storage requirements, and is applicable to any country or region. The fast approach exploits the thread, vector, and memory optimizations that are possible using a low-level language (C++) and achieves similar performance on a single server. This paper details these approaches with the goal of enabling the broader community to quickly integrate location and demographic data., Comment: 8 pages, 7 figures, 55 references; accepted to IEEE HPEC 2020
Published: 2020
Full Text: View/download PDF

29. 75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices

Author: Kepner, Jeremy, Davis, Tim, Byun, Chansup, Arcand, William, Bestor, David, Bergeron, William, Gadepally, Vijay, Hubbell, Matthew, Houle, Michael, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Databases, Computer Science - Data Structures and Algorithms, Computer Science - Performance, Computer Science - Social and Information Networks
Abstract: The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets., Comment: 4 pages, 4 figures, 28 references, accepted to IPDPS GrAPL 2020. arXiv admin note: substantial text overlap with arXiv:1907.04217
Published: 2020
Full Text: View/download PDF

30. Large Scale Parallelization Using File-Based Communications

Author: Byun, Chansup, Kepner, Jeremy, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Michaleas, Peter, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: In this paper, we present a novel and new file-based communication architecture using the local filesystem for large scale parallelization. This new approach eliminates the issues with filesystem overload and resource contention when using the central filesystem for large parallel jobs. The new approach incurs additional overhead due to inter-node message file transfers when both the sending and receiving processes are not on the same node. However, even with this additional overhead cost, its benefits are far greater for the overall cluster operation in addition to the performance enhancement in message communications for large scale parallel jobs. For example, when running a 2048-process parallel job, it achieved about 34 times better performance with MPI_Bcast() when using the local filesystem. Furthermore, since the security for transferring message files is handled entirely by using the secure copy protocol (scp) and the file system permissions, no additional security measures or ports are required other than those that are typically required on an HPC system.
Published: 2019
Full Text: View/download PDF

31. Securing HPC using Federated Authentication

Author: Prout, Andrew, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, Reuther, Albert, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Cryptography and Security
Abstract: Federated authentication can drastically reduce the overhead of basic account maintenance while simultaneously improving overall system security. Integrating with the user's more frequently used account at their primary organization both provides a better experience to the end user and makes account compromise or changes in affiliation more likely to be noticed and acted upon. Additionally, with many organizations transitioning to multi-factor authentication for all account access, the ability to leverage external federated identity management systems provides the benefit of their efforts without the additional overhead of separately implementing a distinct multi-factor authentication process. This paper describes our experiences and the lessons we learned by enabling federated authentication with the U.S. Government PKI and InCommon Federation, scaling it up to the user base of a production HPC system, and the motivations behind those choices. We have received only positive feedback from our users.
Published: 2019
Full Text: View/download PDF

32. Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

Author: Kepner, Jeremy, Gadepally, Vijay, Milechin, Lauren, Samsi, Siddharth, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Hubbell, Matthew, Houle, Michael, Jones, Michael, Klein, Anne, Michaleas, Peter, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Databases, Computer Science - Data Structures and Algorithms, Computer Science - Information Retrieval, Computer Science - Performance
Abstract: The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets, databases, matrices, graphs, and networks, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of D4M associative arrays put enormous pressure on the memory hierarchy. This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array. The parameters of hierarchical associative arrays rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical arrays achieve over 40,000 updates per second in a single instance. Scaling to 34,000 instances of hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets., Comment: 6 pages; 6 figures; accepted to IEEE High Performance Extreme Computing (HPEC) Conference 2019. arXiv admin note: text overlap with arXiv:1807.05308, arXiv:1902.00846
Published: 2019
Full Text: View/download PDF

33. Optimizing Xeon Phi for Interactive Data Analysis

Author: Byun, Chansup, Kepner, Jeremy, Arcand, William, Bestor, David, Bergeron, William, Hubbell, Matthew, Gadepally, Vijay, Houle, Michael, Jones, Michael, Klein, Anne, Milechin, Lauren, Michaleas, Peter, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software
Abstract: The Intel Xeon Phi manycore processor is designed to provide high performance matrix computations of the type often performed in data analysis. Common data analysis environments include Matlab, GNU Octave, Julia, Python, and R. Achieving optimal performance of matrix operations within data analysis environments requires tuning the Xeon Phi OpenMP settings, process pinning, and memory modes. This paper describes matrix multiplication performance results for Matlab and GNU Octave over a variety of combinations of process counts and OpenMP threads and Xeon Phi memory modes. These results indicate that using KMP_AFFINITY=granlarity=fine, taskset pinning, and all2all cache memory mode allows both Matlab and GNU Octave to achieve 66% of the practical peak performance for process counts ranging from 1 to 64 and OpenMP threads ranging from 1 to 64. These settings have resulted in generally improved performance across a range of applications and has enabled our Xeon Phi system to deliver significant results in a number of real-world applications., Comment: 6 pages, 5 figures, accepted in IEEE High Performance Extreme Computing (HPEC) conference 2019
Published: 2019
Full Text: View/download PDF

34. Lessons Learned from a Decade of Providing Interactive, On-Demand High Performance Computing to Scientists and Engineers

Author: Mullen, Julia, Reuther, Albert, Arcand, William, Bergeron, Bill, Bestor, David, Byun, Chansup, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Prout, Andrew, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, D.2.6
Abstract: For decades, the use of HPC systems was limited to those in the physical sciences who had mastered their domain in conjunction with a deep understanding of HPC architectures and algorithms. During these same decades, consumer computing device advances produced tablets and smartphones that allow millions of children to interactively develop and share code projects across the globe. As the HPC community faces the challenges associated with guiding researchers from disciplines using high productivity interactive tools to effective use of HPC systems, it seems appropriate to revisit the assumptions surrounding the necessary skills required for access to large computational systems. For over a decade, MIT Lincoln Laboratory has been supporting interactive, on-demand high performance computing by seamlessly integrating familiar high productivity tools to provide users with an increased number of design turns, rapid prototyping capability, and faster time to insight. In this paper, we discuss the lessons learned while supporting interactive, on-demand high performance computing from the perspectives of the users and the team supporting the users and the system. Building on these lessons, we present an overview of current needs and the technical solutions we are building to lower the barrier to entry for new users from the humanities, social, and biological sciences., Comment: 15 pages, 3 figures, First Workshop on Interactive High Performance Computing (WIHPC) 2018 held in conjunction with ISC High Performance 2018 in Frankfurt, Germany
Published: 2019
Full Text: View/download PDF

35. A Billion Updates per Second Using 30,000 Hierarchical In-Memory D4M Databases

Author: Kepner, Jeremy, Gadepally, Vijay, Milechin, Lauren, Samsi, Siddharth, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Hubbell, Matthew, Houle, Micheal, Jones, Micheal, Klein, Anne, Michaleas, Peter, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Databases, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Data Structures and Algorithms, Computer Science - Networking and Internet Architecture
Abstract: Analyzing large scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are well-suited for representing and analyzing streaming network data. The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database. Associative arrays are designed for block updates. Streaming updates to a large associative array requires a hierarchical implementation to optimize the performance of the memory hierarchy. Running 34,000 instances of a hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets., Comment: Northeast Database Data 2019 (MIT)
Published: 2019

36. Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud

Author: Gadepally, Vijay, Kepner, Jeremy, Milechin, Lauren, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Hubbell, Matthew, Houle, Micheal, Jones, Micheal, Michaleas, Peter, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Yee, Charles, Samsi, Siddharth, and Reuther, Albert
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Cryptography and Security
Abstract: Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for "at scale" algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000., Comment: Accepted to IEEE HPEC 2018
Published: 2018

37. Interactive Launch of 16,000 Microsoft Windows Instances on a Supercomputer

Author: Jones, Michael, Kepner, Jeremy, Orchard, Bradley, Reuther, Albert, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Klein, Anna, Milechin, Lauren, Mullen, Julia, Prout, Andrew, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Simulation, machine learning, and data analysis require a wide range of software which can be dependent upon specific operating systems, such as Microsoft Windows. Running this software interactively on massively parallel supercomputers can present many challenges. Traditional methods of scaling Microsoft Windows applications to run on thousands of processors have typically relied on heavyweight virtual machines that can be inefficient and slow to launch on modern manycore processors. This paper describes a unique approach using the Lincoln Laboratory LLMapReduce technology in combination with the Wine Windows compatibility layer to rapidly and simultaneously launch and run Microsoft Windows applications on thousands of cores on a supercomputer. Specifically, this work demonstrates launching 16,000 Microsoft Windows applications in 5 minutes running on 16,000 processor cores. This capability significantly broadens the range of applications that can be run at large scale on a supercomputer.
Published: 2018

38. Measuring the Impact of Spectre and Meltdown

Author: Prout, Andrew, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, Reuther, Albert, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Cryptography and Security
Abstract: The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we measure the performance impact on several workloads relevant to HPC systems. We show that the impact can be significant on both synthetic and realistic workloads. We also show that the performance penalties are difficult to avoid even in dedicated systems where security is a lesser concern.
Published: 2018
Full Text: View/download PDF

39. Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

Author: Reuther, Albert, Kepner, Jeremy, Byun, Chansup, Samsi, Siddharth, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Milechin, Lauren, Mullen, Julia, Prout, Andrew, Rosa, Antonio, Yee, Charles, and Michaleas, Peter
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, C.4, D.4.1
Abstract: Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges - in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave processes in 40 seconds. These capabilities allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms., Comment: 6 pages, 7 figures, IEEE High Performance Extreme Computing Conference 2018
Published: 2018
Full Text: View/download PDF

40. Design, Generation, and Validation of Extreme Scale Power-Law Graphs

Author: Kepner, Jeremy, Samsi, Siddharth, Arcand, William, Bestor, David, Bergeron, Bill, Davis, Tim, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jananthan, Hayden, Jones, Michael, Klein, Anna, Michaleas, Peter, Pearce, Roger, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Sanders, Geoff, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Discrete Mathematics, Computer Science - Data Structures and Algorithms, Computer Science - Performance, Mathematics - Combinatorics
Abstract: Massive power-law graphs drive many fields: metagenomics, brain mapping, Internet-of-things, cybersecurity, and sparse machine learning. The development of novel algorithms and systems to process these data requires the design, generation, and validation of enormous graphs with exactly known properties. Such graphs accelerate the proper testing of new algorithms and systems and are a prerequisite for success on real applications. Many random graph generators currently exist that require realizing a graph in order to know its exact properties: number of vertices, number of edges, degree distribution, and number of triangles. Designing graphs using these random graph generators is a time-consuming trial-and-error process. This paper presents a novel approach that uses Kronecker products to allow the exact computation of graph properties prior to graph generation. In addition, when a real graph is desired, it can be generated quickly in memory on a parallel computer with no-interprocessor communication. To test this approach, graphs with $10^{12}$ edges are generated on a 40,000+ core supercomputer in 1 second and exactly agree with those predicted by the theory. In addition, to demonstrate the extensibility of this approach, decetta-scale graphs with up to $10^{30}$ edges are simulated in a few minutes on a laptop., Comment: 8 pages, 6 figures, IEEE IPDPS 2018 Graph Algorithm Building Blocks (GABB) workshop
Published: 2018
Full Text: View/download PDF

41. Performance Measurements of Supercomputing and Cloud Storage Solutions

Author: Jones, Michael, Kepner, Jeremy, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Michaleas, Peter, Prout, Andrew, Reuther, Albert, Samsi, Siddharth, and Monticiollo, Paul
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Networking and Internet Architecture, Computer Science - Operating Systems, Computer Science - Performance
Abstract: Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an open-source, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally., Comment: 5 pages, 4 figures, to appear in IEEE HPEC 2017
Published: 2017
Full Text: View/download PDF

42. MIT SuperCloud Portal Workspace: Enabling HPC Web Application Deployment

Author: Prout, Andrew, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Gadepally, Vijay, Hubbell, Matthew, Houle, Michael, Jones, Michael, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Rosa, Antonio, Samsi, Siddharth, Reuther, Albert, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Human-Computer Interaction, Computer Science - Software Engineering
Abstract: The MIT SuperCloud Portal Workspace enables the secure exposure of web services running on high performance computing (HPC) systems. The portal allows users to run any web application as an HPC job and access it from their workstation while providing authentication, encryption, and access control at the system level to prevent unintended access. This capability permits users to seamlessly utilize existing and emerging tools that present their user interface as a website on an HPC system creating a portal workspace. Performance measurements indicate that the MIT SuperCloud Portal Workspace incurs marginal overhead when compared to a direct connection of the same service., Comment: 6 pages, 3 figures, to appear in IEEE HPEC 2017
Published: 2017
Full Text: View/download PDF

43. Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

Author: Byun, Chansup, Kepner, Jeremy, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Samsi, Siddharth, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Performance, Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Distributed, Parallel, and Cluster Computing, Physics - Computational Physics
Abstract: Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and Octave. More recently, machine learning applications, such as the UC Berkeley Caffe deep learning framework, have become increasingly important to LLSC users. Thus, the performance of these applications on KNL systems is of high interest to LLSC users and the broader data analysis and machine learning communities. Our data analysis benchmarks of these application on the Intel KNL processor indicate that single-core double-precision generalized matrix multiply (DGEMM) performance on KNL systems has improved by ~3.5x compared to prior Intel Xeon technologies. Our data analysis applications also achieved ~60% of the theoretical peak performance. Also a performance comparison of a machine learning application, Caffe, between the two different Intel CPUs, Xeon E5 v3 and Xeon Phi 7210, demonstrated a 2.7x improvement on a KNL node., Comment: 6 pages; 9 figures; accepted to IEEE HPEC 2017
Published: 2017
Full Text: View/download PDF

44. Scalable System Scheduling for HPC and Big Data

Author: Reuther, Albert, Byun, Chansup, Arcand, William, Bestor, David, Bergeron, Bill, Hubbell, Matthew, Jones, Michael, Michaleas, Peter, Prout, Andrew, Rosa, Antonio, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: In the rapidly expanding field of parallel processing, job schedulers are the "operating systems" of modern big data architectures and supercomputing systems. Job schedulers allocate computing resources and control the execution of processes on those resources. Historically, job schedulers were the domain of supercomputers, and job schedulers were designed to run massive, long-running computations over days and weeks. More recently, big data workloads have created a need for a new class of computations consisting of many short computations taking seconds or minutes that process enormous quantities of data. For both supercomputers and big data systems, the efficiency of the job scheduler represents a fundamental limit on the efficiency of the system. Detailed measurement and modeling of the performance of schedulers are critical for maximizing the performance of a large-scale computing system. This paper presents a detailed feature analysis of 15 supercomputing and big data schedulers. For big data workloads, the scheduler latency is the most important performance characteristic of the scheduler. A theoretical model of the latency of these schedulers is developed and used to design experiments targeted at measuring scheduler latency. Detailed benchmarking of four of the most popular schedulers (Slurm, Son of Grid Engine, Mesos, and Hadoop YARN) are conducted. The theoretical model is compared with data and demonstrates that scheduler performance can be characterized by two key parameters: the marginal latency of the scheduler $t_s$ and a nonlinear exponent $\alpha_s$. For all four schedulers, the utilization of the computing system decreases to < 10\% for computations lasting only a few seconds. Multilevel schedulers that transparently aggregate short computations can improve utilization for these short computations to > 90\% for all four of the schedulers that were tested., Comment: 34 pages, 7 figures
Published: 2017
Full Text: View/download PDF

45. New Phenomena in Large-Scale Internet Traffic

Author: Kepner, Jeremy, primary, Cho, Kenjiro, additional, Claffy, KC, additional, Gadepally, Vijay, additional, McGuire, Sarah, additional, Milechin, Lauren, additional, Arcand, William, additional, Bestor, David, additional, Bergeron, William, additional, Byun, Chansup, additional, Hubbell, Matthew, additional, Houle, Michael, additional, Jones, Michael, additional, Prout, Andrew, additional, Reuther, Albert, additional, Rosa, Antonio, additional, Samsi, Siddharth, additional, Yee, Charles, additional, and Michaleas, Peter, additional
Published: 2022
Full Text: View/download PDF

46. Benchmarking SciDB Data Import on HPC Systems

Author: Samsi, Siddharth, Brattain, Laura, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Yee, Charles, Kepner, Jeremy, and Reuther, Albert
Subjects: Computer Science - Databases, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance, Quantitative Biology - Quantitative Methods
Abstract: SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. It is designed to be massively parallel and can run on commodity hardware in a high performance computing (HPC) environment. In this paper, we present the performance of SciDB using simulated image data. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a cluster running the MIT SuperCloud software stack. A peak performance of 2.2M database inserts per second was achieved on a single node of this system. We also show that SciDB and the D4M toolbox provide more efficient ways to access random sub-volumes of massive datasets compared to the traditional approaches of reading volumetric data from individual files. This work describes the D4M and SciDB tools we developed and presents the initial performance results. This performance was achieved by using parallel inserts, a in-database merging of arrays as well as supercomputing techniques, such as distributed arrays and single-program-multiple-data programming., Comment: 5 pages, 4 figures, IEEE High Performance Extreme Computing (HPEC) 2016, best paper finalist
Published: 2016
Full Text: View/download PDF

47. LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

Author: Byun, Chansup, Kepner, Jeremy, Arcand, William, Bestor, David, Bergeron, Bill, Gadepally, Vijay, Hubbell, Matthew, Michaleas, Peter, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming capability in one line of code. LLMapReduce supports all programming languages and many schedulers. LLMapReduce can work with any application without the need to modify the application. Furthermore, LLMapReduce can overcome scaling limits in the map-reduce parallel programming model via options that allow the user to switch to the more efficient single-program-multiple-data (SPMD) parallel programming model. These features allow users to reduce the computational overhead by more than 10x compared to standard map-reduce for certain applications. LLMapReduce is widely used by hundreds of users at MIT. Currently LLMapReduce works with several schedulers such as SLURM, Grid Engine and LSF., Comment: 8 pages; 19 figures; IEEE HPEC 2016
Published: 2016
Full Text: View/download PDF

48. Scheduler Technologies in Support of High Performance Data Analysis

Author: Reuther, Albert, Byun, Chansup, Arcand, William, Bestor, David, Bergeron, Bill, Hubbell, Matthew, Jones, Michael, Michaleas, Peter, Prout, Andrew, Rosa, Antonio, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Job schedulers are a key component of scalable computing infrastructures. They orchestrate all of the work executed on the computing infrastructure and directly impact the effectiveness of the system. Recently, job workloads have diversified from long-running, synchronously-parallel simulations to include short-duration, independently parallel high performance data analysis (HPDA) jobs. Each of these job types requires different features and scheduler tuning to run efficiently. A number of schedulers have been developed to address both job workload and computing system heterogeneity. High performance computing (HPC) schedulers were designed to schedule large-scale scientific modeling and simulations on supercomputers. Big Data schedulers were designed to schedule data processing and analytic jobs on clusters. This paper compares and contrasts the features of HPC and Big Data schedulers with a focus on accommodating both scientific computing and high performance data analytic workloads. Job latency is critical for the efficient utilization of scalable computing infrastructures, and this paper presents the results of job launch benchmarking of several current schedulers: Slurm, Son of Grid Engine, Mesos, and Yarn. We find that all of these schedulers have low utilization for short-running jobs. Furthermore, employing multilevel scheduling significantly improves the utilization across all schedulers., Comment: 6 pages, 5 figures, IEEE High Performance Extreme Computing Conference 2016
Published: 2016
Full Text: View/download PDF

49. Enhancing HPC Security with a User-Based Firewall

Author: Prout, Andrew, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Gadepally, Vijay, Hubbell, Matthew, Houle, Michael, Jones, Michael, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Rosa, Antonio, Samsi, Siddharth, Reuther, Albert, and Kepner, Jeremy
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Cryptography and Security
Abstract: HPC systems traditionally allow their users unrestricted use of their internal network. While this network is normally controlled enough to guarantee privacy without the need for encryption, it does not provide a method to authenticate peer connections. Protocols built upon this internal network must provide their own authentication. Many methods have been employed to perform this authentication. However, support for all of these methods requires the HPC application developer to include support and the user to configure and enable these services. The user-based firewall capability we have prototyped enables a set of rules governing connections across the HPC internal network to be put into place using Linux netfilter. By using an operating system-level capability, the system is not reliant on any developer or user actions to enable security. The rules we have chosen and implemented are crafted to not impact the vast majority of users and be completely invisible to them.
Published: 2016
Full Text: View/download PDF

50. D4M: Bringing Associative Arrays to Database Engines

Author: Gadepally, Vijay, Kepner, Jeremy, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Edwards, Lauren, Hubbell, Matthew, Michaleas, Peter, Mullen, Julie, Prout, Andrew, Rosa, Antonio, Yee, Charles, and Reuther, Albert
Subjects: Computer Science - Databases
Abstract: The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Numerous tools exist that allow users to store, query and index these massive quantities of data. Each storage or database engine comes with the promise of dealing with complex data. Scientists and engineers who wish to use these systems often quickly find that there is no single technology that offers a panacea to the complexity of information. When using multiple technologies, however, there is significant trouble in designing the movement of information between storage and database engines to support an end-to-end application along with a steep learning curve associated with learning the nuances of each underlying technology. In this article, we present the Dynamic Distributed Dimensional Data Model (D4M) as a potential tool to unify database and storage engine operations. Previous articles on D4M have showcased the ability of D4M to interact with the popular NoSQL Accumulo database. Recently however, D4M now operates on a variety of backend storage or database engines while providing a federated look to the end user through the use of associative arrays. In order to showcase how new databases may be supported by D4M, we describe the process of building the D4M-SciDB connector and present performance of this connection.
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

190 results on '"Arcand, William"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources