21 results on '"Cojean, Terry"'
Search Results
2. Ginkgo - A math library designed to accelerate Exascale Computing Project science applications.
- Author
-
Cojean, Terry, Nayak, Pratik, Ribizel, Tobias, Beams, Natalie, Mike Tsai, Yu-Hsiang, Koch, Marcel, Göbel, Fritz, Grützmacher, Thomas, and Anzt, Hartwig
- Subjects
- *
SOFTWARE libraries (Computer programming) , *MATHEMATICAL forms , *LIBRARY design & construction , *SCIENCE projects , *ELECTRIC power distribution grids - Abstract
Large-scale simulations require efficient computation across the entire computing hierarchy. A challenge of the Exascale Computing Project (ECP) was to reconcile highly heterogeneous hardware with the myriad of applications that were required to run on these supercomputers. Mathematical software forms the backbone of almost all scientific applications, providing efficient abstractions and operations that are crucial to harness the performance of computing systems. G inkgo is one such mathematical software library, nurtured by ECP, providing high-performance, user-friendly, and performance portable interfaces for applications in ECP and beyond. In this paper, we elaborate on G inkgo 's philosophy of high-performance software that is sustainable, reproducible, and easy to use. We showcase the wide feature set of solvers and preconditioners available in G inkgo and the central concepts involved in their design. We elaborate on four different ECP software integrations: MFEM, PeleLM + SUNDIALS, XGC, and ExaSGD that use G inkgo to accelerate their science runs. Performance studies of different problems from these applications highlight the effectiveness of G inkgo and the benefits incurred by these ECP applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Providing performance portable numerics for Intel GPUs.
- Author
-
Tsai, Yu‐Hsiang M., Cojean, Terry, and Anzt, Hartwig
- Subjects
GINKGO ,LINEAR algebra - Abstract
Summary: With discrete Intel GPUs entering the high‐performance computing landscape, there is an urgent need for production‐ready software stacks for these platforms. In this article, we report how we enable the Ginkgo math library to execute on Intel GPUs by developing a kernel backed based on the DPC++ programming environment. We discuss conceptual differences between the CUDA and DPC++ programming models and describe workflows for simplified code conversion. We evaluate the performance of basic and advanced sparse linear algebra routines available in Ginkgo's DPC++ backend in the hardware‐specific performance bounds and compare against routines providing the same functionality that ship with Intel's oneMKL vendor library. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP
- Author
-
Tsai, Yuhsiang M., Cojean, Terry, Ribizel, Tobias, and Anzt, Hartwig
- Subjects
FOS: Computer and information sciences ,HIP ,GPU ,Computer Science - Mathematical Software ,Portability ,CUDA ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Mathematical Software (cs.MS) ,Article - Abstract
With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for AMD architectures, and the design of a library providing native backends for NVIDIA and AMD GPUs while minimizing code duplication by using a shared code base., Preprint submitted to HeteroPar
- Published
- 2021
5. Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On
- Author
-
Tsai, Yuhsiang M., Cojean, Terry, and Anzt, Hartwig
- Subjects
GPUs ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Software_PROGRAMMINGTECHNIQUES ,Sparse matrix vector product (SpMV) ,AMD ,NVIDIA ,Article - Abstract
Efficiently processing sparse matrices is a central and performance-critical part of many scientific simulation codes. Recognizing the adoption of manycore accelerators in HPC, we evaluate in this paper the performance of the currently best sparse matrix-vector product (SpMV) implementations on high-end GPUs from AMD and NVIDIA. Specifically, we optimize SpMV kernels for the CSR, COO, ELL, and HYB format taking the hardware characteristics of the latest GPU technologies into account. We compare for 2,800 test matrices the performance of our kernels against AMD’s hipSPARSE library and NVIDIA’s cuSPARSE library, and ultimately assess how the GPU technologies from AMD and NVIDIA compare in terms of SpMV performance.
- Published
- 2020
6. Porting a sparse linear algebra math library to Intel GPUs
- Author
-
Tsai, Yuhsiang M., Cojean, Terry, and Anzt, Hartwig
- Subjects
Performance (cs.PF) ,FOS: Computer and information sciences ,Computer Science - Performance ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Mathematical Software ,Distributed, Parallel, and Cluster Computing (cs.DC) ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Mathematical Software (cs.MS) - Abstract
With the announcement that the Aurora Supercomputer will be composed of general purpose Intel CPUs complemented by discrete high performance Intel GPUs, and the deployment of the oneAPI ecosystem, Intel has committed to enter the arena of discrete high performance GPUs. A central requirement for the scientific computing community is the availability of production-ready software stacks and a glimpse of the performance they can expect to see on Intel high performance GPUs. In this paper, we present the first platform-portable open source math library supporting Intel GPUs via the DPC++ programming environment. We also benchmark some of the developed sparse linear algebra functionality on different Intel GPUs to assess the efficiency of the DPC++ programming ecosystem to translate raw performance into application performance. Aside from quantifying the efficiency within the hardware-specific roofline model, we also compare against routines providing the same functionality that ship with Intel's oneMKL vendor library., preprint, not submitted
- Published
- 2021
7. Ginkgo -- A Math Library designed for Platform Portability
- Author
-
Cojean, Terry, Tsai, Yu-Hsiang \\'Mike\\', and Anzt, Hartwig
- Subjects
Performance (cs.PF) ,Software Engineering (cs.SE) ,FOS: Computer and information sciences ,Computer Science - Software Engineering ,Computer Science - Performance ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Mathematical Software ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Mathematical Software (cs.MS) - Abstract
The first associations to software sustainability might be the existence of a continuous integration (CI) framework; the existence of a testing framework composed of unit tests, integration tests, and end-to-end tests; and also the existence of software documentation. However, when asking what is a common deathblow for a scientific software product, it is often the lack of platform and performance portability. Against this background, we designed the Ginkgo library with the primary focus on platform portability and the ability to not only port to new hardware architectures, but also achieve good performance. In this paper we present the Ginkgo library design, radically separating algorithms from hardware-specific kernels forming the distinct hardware executors, and report our experience when adding execution backends for NVIDIA, AMD, and Intel GPUs. We also comment on the different levels of performance portability, and the performance we achieved on the distinct hardware backends., Submitted to Parallel Computing Journal (PARCO)
- Published
- 2020
8. Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations
- Author
-
Tsai, Yuhsiang Mike, Cojean, Terry, and Anzt, Hartwig
- Subjects
Performance (cs.PF) ,FOS: Computer and information sciences ,Computer Science - Performance ,Computer Science - Mathematical Software ,Mathematical Software (cs.MS) - Abstract
GPU accelerators have become an important backbone for scientific high performance computing, and the performance advances obtained from adopting new GPU hardware are significant. In this paper we take a first look at NVIDIA's newest server line GPU, the A100 architecture part of the Ampere generation. Specifically, we assess its performance for sparse linear algebra operations that form the backbone of many scientific applications and assess the performance improvements over its predecessor.
- Published
- 2020
9. A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic
- Author
-
Abdelfattah, Ahmad, Anzt, Hartwig, Boman, Erik G., Carson, Erin, Cojean, Terry, Dongarra, Jack, Gates, Mark, Gr��tzmacher, Thomas, Higham, Nicholas J., Li, Sherry, Lindquist, Neil, Liu, Yang, Loe, Jennifer, Luszczek, Piotr, Nayak, Pratik, Pranesh, Sri, Rajamanickam, Siva, Ribizel, Tobias, Smith, Barry, Swirydowicz, Kasia, Thomas, Stephen, Tomov, Stanimire, Tsai, Yaohung M., Yamazaki, Ichitaro, and Yang, Urike Meier
- Subjects
FOS: Computer and information sciences ,G.4 ,G.1.3 ,FOS: Mathematics ,Computer Science - Mathematical Software ,Numerical Analysis (math.NA) ,Mathematics - Numerical Analysis ,Mathematical Software (cs.MS) - Abstract
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more than an order of magnitude higher performance than what is available in IEEE double precision. At the same time, the gap between the compute power on the one hand and the memory bandwidth on the other hand keeps increasing, making data access and communication prohibitively expensive compared to arithmetic operations. To start the multiprecision focus effort, we survey the numerical linear algebra community and summarize all existing multiprecision knowledge, expertise, and software capabilities in this landscape analysis report. We also include current efforts and preliminary results that may not yet be considered "mature technology," but have the potential to grow into production quality within the multiprecision focus effort. As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision technology can help improving the performance of these methods and present highlights of application significantly outperforming the traditional fixed precision methods., Technical report as a part of the Exascale computing project (ECP)
- Published
- 2020
10. Preparing Ginkgo for AMD GPUs ��� A Testimonial on Porting CUDA Code to HIP
- Author
-
Tsai, Yuhsiang M., Cojean, Terry, Ribizel, Tobias, and Anzt, Hartwig
- Subjects
DATA processing & computer science ,ddc:004 ,ComputerSystemsOrganization_PROCESSORARCHITECTURES - Abstract
With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for AMD architectures, and the design of a library providing native backends for NVIDIA and AMD GPUs while minimizing code duplication by using a shared code base.
- Published
- 2020
- Full Text
- View/download PDF
11. Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing.
- Author
-
ANZT, HARTWIG, COJEAN, TERRY, FLEGAR, GORAN, GÖBEL, FRITZ, GRÜTZMACHER, THOMAS, NAYAK, PRATIK, RIBIZEL, TOBIAS, TSAI, YUHSIANG MIKE, and QUINTANA-ORTÍ, ENRIQUE S.
- Subjects
- *
LINEAR algebra , *OPERATOR algebras , *GINKGO , *HIGH performance computing , *GRAPHICS processing units - Abstract
In this article, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo's design principle abstracts all functionality as "linear operators," motivating the notation of a "linear operator algebra library." Ginkgo's current focus is oriented toward providing sparse linear algebra functionality for high performance graphics processing unit (GPU) architectures, but given the library design, this focus can be easily extended to accommodate other algorithms and hardware architectures. We introduce this sophisticated software architecture that separates core algorithms from architecture-specific backends and provide details on extensibility and sustainability measures. We also demonstrate Ginkgo's usability by providing examples on how to use its functionality inside the MFEM and deal.ii finite element ecosystems. Finally, we offer a practical demonstration of Ginkgo's high performance on state-of-the-art GPU architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.
- Author
-
Abdelfattah, Ahmad, Anzt, Hartwig, Boman, Erik G., Carson, Erin, Cojean, Terry, Dongarra, Jack, Fox, Alyson, Gates, Mark, Higham, Nicholas J., Li, Xiaoye S., Loe, Jennifer, Luszczek, Piotr, Pranesh, Srikara, Rajamanickam, Siva, Ribizel, Tobias, Smith, Barry F., Swirydowicz, Kasia, Thomas, Stephen, Tomov, Stanimire, and Tsai, Yaohung M.
- Subjects
NUMERICAL solutions for linear algebra ,SCIENTIFIC computing ,MACHINE learning ,MACHINE design ,ALGORITHMS - Abstract
The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of low-precision special-function units designed for machine learning applications, the traditional numerical algorithms community urgently needs to reconsider the floating point formats used in the distinct operations to efficiently leverage the available compute power. In this work, we provide a comprehensive survey of mixed-precision numerical linear algebra routines, including the underlying concepts, theoretical background, and experimental results for both dense and sparse linear algebra problems. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. Evaluating asynchronous Schwarz solvers on GPUs.
- Author
-
Nayak, Pratik, Cojean, Terry, and Anzt, Hartwig
- Subjects
- *
NUMERICAL solutions for linear algebra , *GRAPHICS processing units , *COMMUNICATION patterns , *COPROCESSORS , *MULTICORE processors - Abstract
With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel. Even a single node can contain multiple co-processors such as GPUs and multiple CPU cores. For example, ORNL's Summit accumulates six NVIDIA Tesla V100 GPUs and 42 IBM Power9 cores on each node. Synchronizing across compute resources of multiple nodes can be prohibitively expensive. Hence, it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver. We do not explicitly synchronize, but allow the communication between the sub-domains to be completely asynchronous, thereby removing the bulk synchronous nature of the algorithm. We accomplish this by using the one-sided Remote Memory Access (RMA) functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart. We also study the communication patterns governed by the partitioning and the overlap between the sub-domains on the global solver. Finally, we show that this concept can render attractive performance benefits over the synchronous counterparts even for a well-balanced problem. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
14. Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software.
- Author
-
Flegar, Goran, Anzt, Hartwig, Cojean, Terry, and Quintana-Ortí, Enrique S.
- Subjects
LINEAR algebra ,GINKGO ,ARITHMETIC ,ALGORITHMS ,GRAPHICS processing units - Abstract
The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
15. The StarPU Runtime System at Exascale ?
- Author
-
Cojean, Terry and Cojean, Terry
- Subjects
Runtime Systems ,High Performance Computing ,[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,StarPU ,[INFO] Computer Science [cs] - Abstract
In this talk, we present the StarPU runtime system and its programming model, the Sequential Task Flow (STF) which aims in providing easy (performance) portability of code. We propose two extensions to StarPU and its programming model in order to allow better scalability of the runtime system, namely: parallel tasks and hierarchical tasks. These two extensions aim in tackling some limitations of the StarPU runtime system and we highlight the performance benefits of these techniques.
- Published
- 2016
16. Exploiting Two-Level Parallelism by Aggregating Computing Resources in Task-Based Applications Over Accelerator-Based Machines
- Author
-
Cojean, Terry, Cojean, Terry, Modèles Numériques - Solveurs pour architectures hétérogènes utilisant des supports d'exécution - - SOLHAR2013 - ANR-13-MONU-0007 - MN - VALID, STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB), Plafrim, ANR-13-MONU-0007,SOLHAR,Solveurs pour architectures hétérogènes utilisant des supports d'exécution(2013), and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
- Subjects
[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] - Abstract
International audience; Computing platforms are now extremely complex providing an increasing number of CPUs and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we tackle the task granularity problem and we propose aggregating several CPUs in order to execute larger parallel tasks and thus find a better equilibrium between the workload assigned to the CPUs and the one assigned to the GPUs. To this end, we rely on the notion of scheduling contexts in order to isolate the parallel tasks and thus delegate the management of the task parallelism to the inner scheduling strategy. We demonstrate the relevance of our approach through the dense Cholesky factorization kernel implemented on top of the StarPU task-based runtime system. We allow having parallel elementary tasks and using Intel MKL parallel implementation optimized through the use of the OpenMP runtime system. We show how our approach handles the interaction between the StarPU and the OpenMP runtime systems and how it exploits the parallelism of modern accelerator-based machines. We present experimental results showing that our solution outperforms state of the art implementations to reach a peak performance of 4.5 TFlop/s on a platform equipped with 20 CPU cores and 4 GPU devices.
- Published
- 2016
17. A customized precision format based on mantissa segmentation for accelerating sparse linear algebra.
- Author
-
Grützmacher, Thomas, Cojean, Terry, Flegar, Goran, Göbel, Fritz, and Anzt, Hartwig
- Subjects
JACOBI method ,LINEAR algebra ,TEST methods ,ALGORITHMS - Abstract
Summary: In this work, we pursue the idea of radically decoupling the floating point format used for arithmetic operations from the format used to store the data in memory. We complement this idea with a customized precision memory format derived by splitting the mantissa (significand) of standard IEEE formats into segments, such that values can be accessed faster if lower accuracy is acceptable. Combined with precision‐aware algorithms that dynamically adapt the data access accuracy to the numerical requirements, the customized precision memory format can render attractive runtime savings without impacting the memory footprint of the data or the accuracy of the final result. In an experimental analysis using the adaptive precision Jacobi method on diagonalizable test problems, we assess the benefits of the mantissa‐segmenting customized precision format on recent multi‐ and manycore architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
18. Resource aggregation in task-based applications over accelerator-based multicore machines
- Author
-
Cojean, Terry, Guermouche, Abdou, Hugo, Andra-Ecaterina, Namyst, Raymond, Wacrenier, Pierre-André, STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Uppsala University, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Giraud, Luc, Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2016
19. Dynamic Allocations in a Hierarchical Parallel Context : A Study on Performance, Memory Footprint, and Portability Using SYCL
- Author
-
Millan, Aymeric, Padioleau, Thomas, Bigot, Julien, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Zeinalipour, Demetris, editor, Blanco Heras, Dora, editor, Pallis, George, editor, Herodotou, Herodotos, editor, Trihinas, Demetris, editor, Balouek, Daniel, editor, Diehl, Patrick, editor, Cojean, Terry, editor, Fürlinger, Karl, editor, Kirkeby, Maja Hanne, editor, Nardelli, Matteo, editor, and Di Sanzo, Pierangelo, editor
- Published
- 2024
- Full Text
- View/download PDF
20. Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility.
- Author
-
Anzt, Hartwig, Cojean, Terry, and Kühn, Eileen
- Subjects
- *
COMPUTER software quality control , *COMPUTER software development , *SUSTAINABILITY , *COMPUTER software , *SCIENTIFIC community - Abstract
In this position paper we argue for implementing an alternative peer review process for scientific computing contributions that promotes high quality scientific software developments as fully‐recognized conference submission. The idea is based on leveraging the code reviewers' feedback on scientific software contributions to community software developments as a third‐party review involvement. Providing open access to this technical review would complement the scientific review of the contribution, efficiently reduce the workload of the undisclosed reviewers, improve the algorithm implementation quality and software sustainability, and ensure full reproducibility of the reported results. Using this process creates incentives to publish scientific algorithms in open source software – instead of designing prototype algorithms with the unique purpose of publishing a paper. In addition, the comments and suggestions of the community being archived in the versioning control systems ensure that also community reviewers are receiving credit for the review contributions – unlike reviewers in the traditional peer review process. Finally, it reflects the particularity of the scientific computing community using conferences rather than journals as the main publication venue. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
21. Resource aggregation for task-based Cholesky Factorization on top of modern architectures
- Author
-
Andra Hugo, Pierre-André Wacrenier, Terry Cojean, Raymond Namyst, Abdou Guermouche, STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Uppsala University, ANR Solhar, INRIA, PLAFRIM, ANR-13-MONU-0007,SOLHAR,Solveurs pour architectures hétérogènes utilisant des supports d'exécution(2013), Wacrenier, Pierre André, Modèles Numériques - Solveurs pour architectures hétérogènes utilisant des supports d'exécution - - SOLHAR2013 - ANR-13-MONU-0007 - MN - VALID, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Uppsala Universitet [Uppsala], and Cojean, Terry
- Subjects
Intel Xeon-Phi KNL ,accelerator ,Computer Networks and Communications ,Computer science ,GPU ,Task parallelism ,Symmetric multiprocessor system ,010103 numerical & computational mathematics ,Parallel computing ,[INFO] Computer Science [cs] ,01 natural sciences ,Theoretical Computer Science ,Runtime system ,Artificial Intelligence ,[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,[INFO]Computer Science [cs] ,0101 mathematics ,dense linear algebra ,Multi-core processor ,Load balancing (computing) ,runtime system ,Computer Graphics and Computer-Aided Design ,heterogeneous computing ,010101 applied mathematics ,Hardware and Architecture ,Multicore ,Graph (abstract data type) ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Software ,Xeon Phi ,task DAG ,Cholesky decomposition ,Cholesky factorization - Abstract
This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 workshops; International audience; Hybrid computing platforms are now commonplace, featuring a large number of CPU cores and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we propose ag-gregating several CPU cores in order to execute larger parallel tasks and improve load balancing between CPUs and accelerators. Additionally, we present our approach to exploit internal parallelism within tasks, by combining two runtime system schedulers: a global runtime system to schedule the main task graph and a local one one to cope with internal task parallelism. We demonstrate the relevance of our approach in the context of the dense Cholesky factorization kernel implemented on top of the StarPU task-based runtime system. We present experimental results showing that our solution outperforms state of the art implementations on two architectures: a modern heterogeneous machine and the Intel Xeon Phi Knights Landing.
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.