Author: "Cojean, Terry" / Language: english - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cojean, Terry"' showing total 21 results

Start Over Author "Cojean, Terry" Language english

21 results on '"Cojean, Terry"'

1. Ginkgo—A math library designed for platform portability

Author: Cojean, Terry, Tsai, Yu-Hsiang Mike, and Anzt, Hartwig
Published: 2022
Full Text: View/download PDF

2. Ginkgo - A math library designed to accelerate Exascale Computing Project science applications.

Author: Cojean, Terry, Nayak, Pratik, Ribizel, Tobias, Beams, Natalie, Mike Tsai, Yu-Hsiang, Koch, Marcel, Göbel, Fritz, Grützmacher, Thomas, and Anzt, Hartwig
Subjects: *SOFTWARE libraries (Computer programming), *MATHEMATICAL forms, *LIBRARY design & construction, *SCIENCE projects, *ELECTRIC power distribution grids
Abstract: Large-scale simulations require efficient computation across the entire computing hierarchy. A challenge of the Exascale Computing Project (ECP) was to reconcile highly heterogeneous hardware with the myriad of applications that were required to run on these supercomputers. Mathematical software forms the backbone of almost all scientific applications, providing efficient abstractions and operations that are crucial to harness the performance of computing systems. G inkgo is one such mathematical software library, nurtured by ECP, providing high-performance, user-friendly, and performance portable interfaces for applications in ECP and beyond. In this paper, we elaborate on G inkgo 's philosophy of high-performance software that is sustainable, reproducible, and easy to use. We showcase the wide feature set of solvers and preconditioners available in G inkgo and the central concepts involved in their design. We elaborate on four different ECP software integrations: MFEM, PeleLM + SUNDIALS, XGC, and ExaSGD that use G inkgo to accelerate their science runs. Performance studies of different problems from these applications highlight the effectiveness of G inkgo and the benefits incurred by these ECP applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Providing performance portable numerics for Intel GPUs.

Author: Tsai, Yu‐Hsiang M., Cojean, Terry, and Anzt, Hartwig
Subjects: GINKGO, LINEAR algebra
Abstract: Summary: With discrete Intel GPUs entering the high‐performance computing landscape, there is an urgent need for production‐ready software stacks for these platforms. In this article, we report how we enable the Ginkgo math library to execute on Intel GPUs by developing a kernel backed based on the DPC++ programming environment. We discuss conceptual differences between the CUDA and DPC++ programming models and describe workflows for simplified code conversion. We evaluate the performance of basic and advanced sparse linear algebra routines available in Ginkgo's DPC++ backend in the hardware‐specific performance bounds and compare against routines providing the same functionality that ship with Intel's oneMKL vendor library. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP

Author: Tsai, Yuhsiang M., Cojean, Terry, Ribizel, Tobias, and Anzt, Hartwig
Subjects: FOS: Computer and information sciences, HIP, GPU, Computer Science - Mathematical Software, Portability, CUDA, ComputerSystemsOrganization_PROCESSORARCHITECTURES, Mathematical Software (cs.MS), Article
Abstract: With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for AMD architectures, and the design of a library providing native backends for NVIDIA and AMD GPUs while minimizing code duplication by using a shared code base., Preprint submitted to HeteroPar
Published: 2021

5. Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

Author: Tsai, Yuhsiang M., Cojean, Terry, and Anzt, Hartwig
Subjects: GPUs, ComputerSystemsOrganization_PROCESSORARCHITECTURES, Software_PROGRAMMINGTECHNIQUES, Sparse matrix vector product (SpMV), AMD, NVIDIA, Article
Abstract: Efficiently processing sparse matrices is a central and performance-critical part of many scientific simulation codes. Recognizing the adoption of manycore accelerators in HPC, we evaluate in this paper the performance of the currently best sparse matrix-vector product (SpMV) implementations on high-end GPUs from AMD and NVIDIA. Specifically, we optimize SpMV kernels for the CSR, COO, ELL, and HYB format taking the hardware characteristics of the latest GPU technologies into account. We compare for 2,800 test matrices the performance of our kernels against AMD’s hipSPARSE library and NVIDIA’s cuSPARSE library, and ultimately assess how the GPU technologies from AMD and NVIDIA compare in terms of SpMV performance.
Published: 2020

6. Porting a sparse linear algebra math library to Intel GPUs

Author: Tsai, Yuhsiang M., Cojean, Terry, and Anzt, Hartwig
Subjects: Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software, Distributed, Parallel, and Cluster Computing (cs.DC), ComputerSystemsOrganization_PROCESSORARCHITECTURES, Mathematical Software (cs.MS)
Abstract: With the announcement that the Aurora Supercomputer will be composed of general purpose Intel CPUs complemented by discrete high performance Intel GPUs, and the deployment of the oneAPI ecosystem, Intel has committed to enter the arena of discrete high performance GPUs. A central requirement for the scientific computing community is the availability of production-ready software stacks and a glimpse of the performance they can expect to see on Intel high performance GPUs. In this paper, we present the first platform-portable open source math library supporting Intel GPUs via the DPC++ programming environment. We also benchmark some of the developed sparse linear algebra functionality on different Intel GPUs to assess the efficiency of the DPC++ programming ecosystem to translate raw performance into application performance. Aside from quantifying the efficiency within the hardware-specific roofline model, we also compare against routines providing the same functionality that ship with Intel's oneMKL vendor library., preprint, not submitted
Published: 2021

7. Ginkgo -- A Math Library designed for Platform Portability

Author: Cojean, Terry, Tsai, Yu-Hsiang \\'Mike\\', and Anzt, Hartwig
Subjects: Performance (cs.PF), Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software, Distributed, Parallel, and Cluster Computing (cs.DC), Mathematical Software (cs.MS)
Abstract: The first associations to software sustainability might be the existence of a continuous integration (CI) framework; the existence of a testing framework composed of unit tests, integration tests, and end-to-end tests; and also the existence of software documentation. However, when asking what is a common deathblow for a scientific software product, it is often the lack of platform and performance portability. Against this background, we designed the Ginkgo library with the primary focus on platform portability and the ability to not only port to new hardware architectures, but also achieve good performance. In this paper we present the Ginkgo library design, radically separating algorithms from hardware-specific kernels forming the distinct hardware executors, and report our experience when adding execution backends for NVIDIA, AMD, and Intel GPUs. We also comment on the different levels of performance portability, and the performance we achieved on the distinct hardware backends., Submitted to Parallel Computing Journal (PARCO)
Published: 2020

8. Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations

Author: Tsai, Yuhsiang Mike, Cojean, Terry, and Anzt, Hartwig
Subjects: Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Computer Science - Mathematical Software, Mathematical Software (cs.MS)
Abstract: GPU accelerators have become an important backbone for scientific high performance computing, and the performance advances obtained from adopting new GPU hardware are significant. In this paper we take a first look at NVIDIA's newest server line GPU, the A100 architecture part of the Ampere generation. Specifically, we assess its performance for sparse linear algebra operations that form the backbone of many scientific applications and assess the performance improvements over its predecessor.
Published: 2020

9. A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

Author: Abdelfattah, Ahmad, Anzt, Hartwig, Boman, Erik G., Carson, Erin, Cojean, Terry, Dongarra, Jack, Gates, Mark, Gr��tzmacher, Thomas, Higham, Nicholas J., Li, Sherry, Lindquist, Neil, Liu, Yang, Loe, Jennifer, Luszczek, Piotr, Nayak, Pratik, Pranesh, Sri, Rajamanickam, Siva, Ribizel, Tobias, Smith, Barry, Swirydowicz, Kasia, Thomas, Stephen, Tomov, Stanimire, Tsai, Yaohung M., Yamazaki, Ichitaro, and Yang, Urike Meier
Subjects: FOS: Computer and information sciences, G.4, G.1.3, FOS: Mathematics, Computer Science - Mathematical Software, Numerical Analysis (math.NA), Mathematics - Numerical Analysis, Mathematical Software (cs.MS)
Abstract: Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more than an order of magnitude higher performance than what is available in IEEE double precision. At the same time, the gap between the compute power on the one hand and the memory bandwidth on the other hand keeps increasing, making data access and communication prohibitively expensive compared to arithmetic operations. To start the multiprecision focus effort, we survey the numerical linear algebra community and summarize all existing multiprecision knowledge, expertise, and software capabilities in this landscape analysis report. We also include current efforts and preliminary results that may not yet be considered "mature technology," but have the potential to grow into production quality within the multiprecision focus effort. As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision technology can help improving the performance of these methods and present highlights of application significantly outperforming the traditional fixed precision methods., Technical report as a part of the Exascale computing project (ECP)
Published: 2020

10. Preparing Ginkgo for AMD GPUs �� A Testimonial on Porting CUDA Code to HIP

Author: Tsai, Yuhsiang M., Cojean, Terry, Ribizel, Tobias, and Anzt, Hartwig
Subjects: DATA processing & computer science, ddc:004, ComputerSystemsOrganization_PROCESSORARCHITECTURES
Abstract: With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for AMD architectures, and the design of a library providing native backends for NVIDIA and AMD GPUs while minimizing code duplication by using a shared code base.
Published: 2020
Full Text: View/download PDF

11. Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing.

Author: ANZT, HARTWIG, COJEAN, TERRY, FLEGAR, GORAN, GÖBEL, FRITZ, GRÜTZMACHER, THOMAS, NAYAK, PRATIK, RIBIZEL, TOBIAS, TSAI, YUHSIANG MIKE, and QUINTANA-ORTÍ, ENRIQUE S.
Subjects: *LINEAR algebra, *OPERATOR algebras, *GINKGO, *HIGH performance computing, *GRAPHICS processing units
Abstract: In this article, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo's design principle abstracts all functionality as "linear operators," motivating the notation of a "linear operator algebra library." Ginkgo's current focus is oriented toward providing sparse linear algebra functionality for high performance graphics processing unit (GPU) architectures, but given the library design, this focus can be easily extended to accommodate other algorithms and hardware architectures. We introduce this sophisticated software architecture that separates core algorithms from architecture-specific backends and provide details on extensibility and sustainability measures. We also demonstrate Ginkgo's usability by providing examples on how to use its functionality inside the MFEM and deal.ii finite element ecosystems. Finally, we offer a practical demonstration of Ginkgo's high performance on state-of-the-art GPU architectures. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

12. A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.

Author: Abdelfattah, Ahmad, Anzt, Hartwig, Boman, Erik G., Carson, Erin, Cojean, Terry, Dongarra, Jack, Fox, Alyson, Gates, Mark, Higham, Nicholas J., Li, Xiaoye S., Loe, Jennifer, Luszczek, Piotr, Pranesh, Srikara, Rajamanickam, Siva, Ribizel, Tobias, Smith, Barry F., Swirydowicz, Kasia, Thomas, Stephen, Tomov, Stanimire, and Tsai, Yaohung M.
Subjects: NUMERICAL solutions for linear algebra, SCIENTIFIC computing, MACHINE learning, MACHINE design, ALGORITHMS
Abstract: The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of low-precision special-function units designed for machine learning applications, the traditional numerical algorithms community urgently needs to reconsider the floating point formats used in the distinct operations to efficiently leverage the available compute power. In this work, we provide a comprehensive survey of mixed-precision numerical linear algebra routines, including the underlying concepts, theoretical background, and experimental results for both dense and sparse linear algebra problems. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

13. Evaluating asynchronous Schwarz solvers on GPUs.

Author: Nayak, Pratik, Cojean, Terry, and Anzt, Hartwig
Subjects: *NUMERICAL solutions for linear algebra, *GRAPHICS processing units, *COMMUNICATION patterns, *COPROCESSORS, *MULTICORE processors
Abstract: With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel. Even a single node can contain multiple co-processors such as GPUs and multiple CPU cores. For example, ORNL's Summit accumulates six NVIDIA Tesla V100 GPUs and 42 IBM Power9 cores on each node. Synchronizing across compute resources of multiple nodes can be prohibitively expensive. Hence, it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver. We do not explicitly synchronize, but allow the communication between the sub-domains to be completely asynchronous, thereby removing the bulk synchronous nature of the algorithm. We accomplish this by using the one-sided Remote Memory Access (RMA) functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart. We also study the communication patterns governed by the partitioning and the overlap between the sub-domains on the global solver. Finally, we show that this concept can render attractive performance benefits over the synchronous counterparts even for a well-balanced problem. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

14. Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software.

Author: Flegar, Goran, Anzt, Hartwig, Cojean, Terry, and Quintana-Ortí, Enrique S.
Subjects: LINEAR algebra, GINKGO, ARITHMETIC, ALGORITHMS, GRAPHICS processing units
Abstract: The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

15. The StarPU Runtime System at Exascale ?

Author: Cojean, Terry and Cojean, Terry
Subjects: Runtime Systems, High Performance Computing, [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], StarPU, [INFO] Computer Science [cs]
Abstract: In this talk, we present the StarPU runtime system and its programming model, the Sequential Task Flow (STF) which aims in providing easy (performance) portability of code. We propose two extensions to StarPU and its programming model in order to allow better scalability of the runtime system, namely: parallel tasks and hierarchical tasks. These two extensions aim in tackling some limitations of the StarPU runtime system and we highlight the performance benefits of these techniques.
Published: 2016

16. Exploiting Two-Level Parallelism by Aggregating Computing Resources in Task-Based Applications Over Accelerator-Based Machines

Author: Cojean, Terry, Cojean, Terry, Modèles Numériques - Solveurs pour architectures hétérogènes utilisant des supports d'exécution - - SOLHAR2013 - ANR-13-MONU-0007 - MN - VALID, STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB), Plafrim, ANR-13-MONU-0007,SOLHAR,Solveurs pour architectures hétérogènes utilisant des supports d'exécution(2013), and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
Subjects: [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
Abstract: International audience; Computing platforms are now extremely complex providing an increasing number of CPUs and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we tackle the task granularity problem and we propose aggregating several CPUs in order to execute larger parallel tasks and thus find a better equilibrium between the workload assigned to the CPUs and the one assigned to the GPUs. To this end, we rely on the notion of scheduling contexts in order to isolate the parallel tasks and thus delegate the management of the task parallelism to the inner scheduling strategy. We demonstrate the relevance of our approach through the dense Cholesky factorization kernel implemented on top of the StarPU task-based runtime system. We allow having parallel elementary tasks and using Intel MKL parallel implementation optimized through the use of the OpenMP runtime system. We show how our approach handles the interaction between the StarPU and the OpenMP runtime systems and how it exploits the parallelism of modern accelerator-based machines. We present experimental results showing that our solution outperforms state of the art implementations to reach a peak performance of 4.5 TFlop/s on a platform equipped with 20 CPU cores and 4 GPU devices.
Published: 2016

17. A customized precision format based on mantissa segmentation for accelerating sparse linear algebra.

Author: Grützmacher, Thomas, Cojean, Terry, Flegar, Goran, Göbel, Fritz, and Anzt, Hartwig
Subjects: JACOBI method, LINEAR algebra, TEST methods, ALGORITHMS
Abstract: Summary: In this work, we pursue the idea of radically decoupling the floating point format used for arithmetic operations from the format used to store the data in memory. We complement this idea with a customized precision memory format derived by splitting the mantissa (significand) of standard IEEE formats into segments, such that values can be accessed faster if lower accuracy is acceptable. Combined with precision‐aware algorithms that dynamically adapt the data access accuracy to the numerical requirements, the customized precision memory format can render attractive runtime savings without impacting the memory footprint of the data or the accuracy of the final result. In an experimental analysis using the adaptive precision Jacobi method on diagonalizable test problems, we assess the benefits of the mantissa‐segmenting customized precision format on recent multi‐ and manycore architectures. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

18. Resource aggregation in task-based applications over accelerator-based multicore machines

Author: Cojean, Terry, Guermouche, Abdou, Hugo, Andra-Ecaterina, Namyst, Raymond, Wacrenier, Pierre-André, STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Uppsala University, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Giraud, Luc, Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], ComputingMilieux_MISCELLANEOUS
Abstract: International audience
Published: 2016

19. Dynamic Allocations in a Hierarchical Parallel Context : A Study on Performance, Memory Footprint, and Portability Using SYCL

Author: Millan, Aymeric, Padioleau, Thomas, Bigot, Julien, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Zeinalipour, Demetris, editor, Blanco Heras, Dora, editor, Pallis, George, editor, Herodotou, Herodotos, editor, Trihinas, Demetris, editor, Balouek, Daniel, editor, Diehl, Patrick, editor, Cojean, Terry, editor, Fürlinger, Karl, editor, Kirkeby, Maja Hanne, editor, Nardelli, Matteo, editor, and Di Sanzo, Pierangelo, editor
Published: 2024
Full Text: View/download PDF

20. Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility.

Author: Anzt, Hartwig, Cojean, Terry, and Kühn, Eileen
Subjects: *COMPUTER software quality control, *COMPUTER software development, *SUSTAINABILITY, *COMPUTER software, *SCIENTIFIC community
Abstract: In this position paper we argue for implementing an alternative peer review process for scientific computing contributions that promotes high quality scientific software developments as fully‐recognized conference submission. The idea is based on leveraging the code reviewers' feedback on scientific software contributions to community software developments as a third‐party review involvement. Providing open access to this technical review would complement the scientific review of the contribution, efficiently reduce the workload of the undisclosed reviewers, improve the algorithm implementation quality and software sustainability, and ensure full reproducibility of the reported results. Using this process creates incentives to publish scientific algorithms in open source software – instead of designing prototype algorithms with the unique purpose of publishing a paper. In addition, the comments and suggestions of the community being archived in the versioning control systems ensure that also community reviewers are receiving credit for the review contributions – unlike reviewers in the traditional peer review process. Finally, it reflects the particularity of the scientific computing community using conferences rather than journals as the main publication venue. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

21. Resource aggregation for task-based Cholesky Factorization on top of modern architectures

Author: Andra Hugo, Pierre-André Wacrenier, Terry Cojean, Raymond Namyst, Abdou Guermouche, STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Uppsala University, ANR Solhar, INRIA, PLAFRIM, ANR-13-MONU-0007,SOLHAR,Solveurs pour architectures hétérogènes utilisant des supports d'exécution(2013), Wacrenier, Pierre André, Modèles Numériques - Solveurs pour architectures hétérogènes utilisant des supports d'exécution - - SOLHAR2013 - ANR-13-MONU-0007 - MN - VALID, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Uppsala Universitet [Uppsala], and Cojean, Terry
Subjects: Intel Xeon-Phi KNL, accelerator, Computer Networks and Communications, Computer science, GPU, Task parallelism, Symmetric multiprocessor system, 010103 numerical & computational mathematics, Parallel computing, [INFO] Computer Science [cs], 01 natural sciences, Theoretical Computer Science, Runtime system, Artificial Intelligence, [INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [INFO]Computer Science [cs], 0101 mathematics, dense linear algebra, Multi-core processor, Load balancing (computing), runtime system, Computer Graphics and Computer-Aided Design, heterogeneous computing, 010101 applied mathematics, Hardware and Architecture, Multicore, Graph (abstract data type), [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Software, Xeon Phi, task DAG, Cholesky decomposition, Cholesky factorization
Abstract: This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 workshops; International audience; Hybrid computing platforms are now commonplace, featuring a large number of CPU cores and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we propose ag-gregating several CPU cores in order to execute larger parallel tasks and improve load balancing between CPUs and accelerators. Additionally, we present our approach to exploit internal parallelism within tasks, by combining two runtime system schedulers: a global runtime system to schedule the main task graph and a local one one to cope with internal task parallelism. We demonstrate the relevance of our approach in the context of the dense Cholesky factorization kernel implemented on top of the StarPU task-based runtime system. We present experimental results showing that our solution outperforms state of the art implementations on two architectures: a modern heterogeneous machine and the Intel Xeon Phi Knights Landing.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

21 results on '"Cojean, Terry"'

1. Ginkgo—A math library designed for platform portability

2. Ginkgo - A math library designed to accelerate Exascale Computing Project science applications.

3. Providing performance portable numerics for Intel GPUs.

4. Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP

5. Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On

6. Porting a sparse linear algebra math library to Intel GPUs

7. Ginkgo -- A Math Library designed for Platform Portability

8. Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations

9. A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

10. Preparing Ginkgo for AMD GPUs �� A Testimonial on Porting CUDA Code to HIP

11. Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing.

12. A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.

13. Evaluating asynchronous Schwarz solvers on GPUs.

14. Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software.

15. The StarPU Runtime System at Exascale ?

16. Exploiting Two-Level Parallelism by Aggregating Computing Resources in Task-Based Applications Over Accelerator-Based Machines

17. A customized precision format based on mantissa segmentation for accelerating sparse linear algebra.

18. Resource aggregation in task-based applications over accelerator-based multicore machines

19. Dynamic Allocations in a Hierarchical Parallel Context : A Study on Performance, Memory Footprint, and Portability Using SYCL

20. Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility.

21. Resource aggregation for task-based Cholesky Factorization on top of modern architectures

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Database

Publisher

21 results on '"Cojean, Terry"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources