Author: "Théo Mary" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Théo Mary"' showing total 44 results

Start Over Author "Théo Mary"

44 results on '"Théo Mary"'

1. Mixed Precision Randomized Low-Rank Approximation with GPU Tensor Cores.

Author: Marc Baboulin, Simplice Donfack, Oguz Kaya, Théo Mary, and Matthieu Robeyns
Published: 2024
Full Text: View/download PDF

2. Reduced-Precision and Reduced-Exponent Formats for Accelerating Adaptive Precision Sparse Matrix-Vector Product.

Author: Stef Graillat, Fabienne Jézéquel, Théo Mary, Roméo Molina, and Daichi Mukunoki
Published: 2024
Full Text: View/download PDF

3. Five-Precision GMRES-Based Iterative Refinement.

Author: Patrick Amestoy, Alfredo Buttari, Nicholas J. Higham, Jean-Yves L'Excellent, Théo Mary, and Bastien Vieuble
Published: 2024
Full Text: View/download PDF

4. Communication Avoiding Block Low-Rank Parallel Multifrontal Triangular Solve with Many Right-Hand Sides.

Author: Patrick Amestoy, Olivier Boiteau, Alfredo Buttari, Matthieu Gerest, Fabienne Jézéquel, Jean-Yves L'Excellent, and Théo Mary
Published: 2024
Full Text: View/download PDF

5. Mixed precision LU factorization on GPU tensor cores: reducing data movement and memory footprint.

Author: Florent Lopez and Théo Mary
Published: 2023
Full Text: View/download PDF

6. Modular Matrix Multiplication on GPU for Polynomial System Solving.

Author: Jérémy Berthomieu, Stef Graillat, Dimitri Lesnoff, and Théo Mary
Published: 2023
Full Text: View/download PDF

7. Combining Sparse Approximate Factorizations with Mixed-precision Iterative Refinement.

Author: Patrick Amestoy, Alfredo Buttari, Nicholas J. Higham, Jean-Yves L'Excellent, Théo Mary, and Bastien Vieuble
Published: 2023
Full Text: View/download PDF

8. Adaptive Precision Sparse Matrix-Vector Product and Its Application to Krylov Solvers.

Author: Stef Graillat, Fabienne Jézéquel, Théo Mary, and Roméo Molina
Published: 2024
Full Text: View/download PDF

9. Mixed precision algorithms in numerical linear algebra.

Author: Nicholas J. Higham and Théo Mary
Published: 2022
Full Text: View/download PDF

10. Stochastic Rounding and Its Probabilistic Backward Error Analysis.

Author: Michael P. Connolly, Nicholas J. Higham, and Théo Mary
Published: 2021
Full Text: View/download PDF

11. Block Low-Rank Matrices with Shared Bases: Potential and Limitations of the BLR2 Format.

Author: Cleve Ashcraft, Alfredo Buttari, and Théo Mary
Published: 2021
Full Text: View/download PDF

12. Sharper Probabilistic Backward Error Analysis for Basic Linear Algebra Kernels with Random Data.

Author: Nicholas J. Higham and Théo Mary
Published: 2020
Full Text: View/download PDF

13. A Class of Fast and Accurate Summation Algorithms.

Author: Pierre Blanchard, Nicholas J. Higham, and Théo Mary
Published: 2020
Full Text: View/download PDF

14. Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores.

Author: Pierre Blanchard, Nicholas J. Higham, Florent Lopez, Théo Mary, and Srikara Pranesh
Published: 2020
Full Text: View/download PDF

15. Block low-rank single precision coarse grid solvers for extreme scale multigrid methods.

Author: Alfredo Buttari, Markus Huber 0004, Philippe Leleux, Théo Mary, Ulrich Rüde, and Barbara I. Wohlmuth
Published: 2022
Full Text: View/download PDF

16. Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores.

Author: Massimiliano Fasi, Nicholas J. Higham, Florent Lopez, Théo Mary, and Mantas Mikaitis
Published: 2023
Full Text: View/download PDF

17. Bridging the Gap Between Flat and Hierarchical Low-Rank Matrix Formats: The Multilevel Block Low-Rank Format.

Author: Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L'Excellent, and Théo Mary
Published: 2019
Full Text: View/download PDF

18. A New Approach to Probabilistic Rounding Error Analysis.

Author: Nicholas J. Higham and Théo Mary
Published: 2019
Full Text: View/download PDF

19. Robust and Accurate Stopping Criteria for Adaptive Randomized Sampling in Matrix-Free Hierarchically Semiseparable Construction.

Author: Christopher Gorman, Gustavo Chávez, Pieter Ghysels, Théo Mary, François-Henry Rouet, and Xiaoye Sherry Li
Published: 2019
Full Text: View/download PDF

20. A New Preconditioner that Exploits Low-Rank Approximations to Factorization Error.

Author: Nicholas J. Higham and Théo Mary
Published: 2019
Full Text: View/download PDF

21. Improving the Complexity of Block Low-Rank Factorizations with Fast Matrix Arithmetic.

Author: Claude-Pierre Jeannerod, Théo Mary, Clément Pernet, and Daniel S. Roche
Published: 2019
Full Text: View/download PDF

22. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures.

Author: Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L'Excellent, and Théo Mary
Published: 2019
Full Text: View/download PDF

23. Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.

Author: Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Jack J. Dongarra
Published: 2015
Full Text: View/download PDF

24. Access-averse framework for computing low-rank matrix approximations.

Author: Ichitaro Yamazaki, Théo Mary, Jakub Kurzak, Stanimire Tomov, and Jack J. Dongarra
Published: 2014
Full Text: View/download PDF

25. Matrix-free construction of HSS representation using adaptive randomized sampling.

Author: Christopher Gorman, Gustavo Chávez, Pieter Ghysels, Théo Mary, François-Henry Rouet, and Xiaoye Sherry Li
Published: 2018

26. On the Complexity of the Block Low-Rank Multifrontal Factorization.

Author: Patrick Amestoy, Alfredo Buttari, Jean-Yves L'Excellent, and Théo Mary
Published: 2017
Full Text: View/download PDF

27. Block low‐rank single precision coarse grid solvers for extreme scale multigrid methods

Author: Philippe Leleux, Ulrich Ruede, Barbara Wohlmuth, Markus Huber, Théo Mary, and Alfredo Buttari
Subjects: Petascale computing, Algebra and Number Theory, Multigrid method, Iterative method, Applied Mathematics, Solver, Supercomputer, Residual, Grid, Massively parallel, Computational science, Mathematics
Abstract: Extreme scale simulation requires fast and scalable algorithms, such as multigrid methods. To achieve asymptotically optimal complexity it is essential to employ a hierarchy of grids. The cost to solve the coarsest grid system can often be neglected in sequential computings, but cannot be ignored in massively parallel executions. In this case, the coarsest grid can be large and its efficient solution becomes a challenging task. We propose solving the coarse grid system using modern, approximate sparse direct methods and investigate the expected gains compared with traditional iterative methods. Since the coarse grid system only requires an approximate solution, we show that we can leverage block low-rank techniques, combined with the use of single precision arithmetic, to significantly reduce the computational requirements of the direct solver. In the case of extreme scale computing, the coarse grid system is too large for a sequential solution, but too small to permit massively parallel efficiency. We show that the agglomeration of the coarse grid system to a subset of processors is necessary for the sparse direct solver to achieve performance. We demonstrate the efficiency of the proposed method on a Stokes-type saddle point system. We employ a monolithic Uzawa multigrid method. In particular, we show that the use of an approximate sparse direct solver for the coarse grid system can outperform that of a preconditioned minimal residual iterative method. This is demonstrated for the multigrid solution of systems of order up to 1+e11 degrees of freedom on a petascale supercomputer using 43 200 processes.
Published: 2021
Full Text: View/download PDF

28. Bridging the Gap Between Flat and Hierarchical Low-Rank Matrix Formats: The Multilevel Block Low-Rank Format

Author: Jean-Yves L'Excellent, Alfredo Buttari, Théo Mary, and Patrick R. Amestoy
Subjects: Theoretical computer science, Applied Mathematics, Hierarchical matrix, Low-rank approximation, 010103 numerical & computational mathematics, Asymptotic complexity, 01 natural sciences, Matrix decomposition, Bridging (programming), Computational Mathematics, Matrix (mathematics), Factorization, 0101 mathematics, Block size, Mathematics
Abstract: Matrices possessing a low-rank property arise in numerous scientific applications. This property can be exploited to provide a substantial reduction of the complexity of their LU or LDL T factorization. Among the possible low-rank formats, the flat Block Low-Rank (BLR) format is easy to use but achieves superlinear complexity. Alternatively, the hierarchical formats achieve linear complexity at the price of a much more complex, hierarchical matrix representation. In this paper, we propose a new format based on multilevel BLR approximations: the matrix is recursively defined as a BLR matrix whose full-rank blocks are themselves represented by BLR matrices. We call this format multilevel BLR (MBLR). Contrarily to hierarchical matrices, the number of levels in the block hierarchy is fixed to a given constant; while this format can still be represented within the H formalism, we show that applying the H theory to it leads to very pessimistic complexity bounds. We therefore extend the theory to prove better bounds, and show that the MBLR format provides a simple way to finely control the desired complexity of dense factorizations. By striking a balance between the simplicity of the BLR format and the low complexity of the hierarchical ones, the MBLR format bridges the gap between flat and hierarchical low-rank matrix formats. The MBLR format is of particular relevance in the context of sparse direct solvers, for which it is able to trade off the optimal dense complexity of the hierarchical formats to benefit from the simplicity and flexibility of the BLR format while still achieving O(n) sparse complexity. We finally compare our MBLR format with the related BLR-H (or Lattice-H) format; our theoretical analysis shows that both formats achieve the same asymptotic complexity for a given top level block size.
Published: 2019
Full Text: View/download PDF

29. Five-Precision GMRES-based iterative refinement

Author: Patrick Amestoy, Alfredo Buttari, Nicholas Higham, Excellent, Jean-Yves L., Théo Mary, Bastien Vieublé, Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Centre National de la Recherche Scientifique (CNRS), University of Manchester [Manchester], Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Department of Mathematics [Manchester] (School of Mathematics), and Vieublé, Bastien
Subjects: mixed precision, floating-point arithmetic, [INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic, MathematicsofComputing_NUMERICALANALYSIS, GMRES, [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], rounding error analysis, [INFO.INFO-MS] Computer Science [cs]/Mathematical Software [cs.MS], preconditioning, [INFO.INFO-NA] Computer Science [cs]/Numerical Analysis [cs.NA], iterative refinement, [INFO.INFO-AO] Computer Science [cs]/Computer Arithmetic, linear system, multiple precision, backward error, forward error, [INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS]
Abstract: GMRES-based iterative refinement in three precisions (GMRES-IR3) uses a low precision LU factorization to accelerate the solution of a linear system without compromising numerical stability or robustness. GMRES-IR3 solves the update equation using GMRES preconditioned by the LU factors, where all operations within GMRES are carried out in the working precision u, except for the matrix-vector products and the application of the preconditioner, which require the use of extra precision u 2. The use of extra precision can be expensive, and is especially unattractive if it is not available in hardware; for this reason, existing implementations have not used extra precision, despite the absence of an error analysis for this approach. We relax the requirements on the precisions used within GMRES, allowing the use of arbitrary precisions up (for applying the preconditioner) and ug (for the rest of the operations). We obtain the five-precision GMRES-based iterative refinement (GMRES-IR5) algorithm. We carry out a rounding error analysis that generalizes that of GMRES-IR3, obtaining conditions under which the forward and backward errors converge to their limiting values. Our analysis makes use of a new result on the backward stability of MGS-GMRES in two precisions. On hardware where up to five arithmetics are available, the number of possible combinations of precisions in GMRES-IR5 is extremely large, but our analysis identifies a small subset of relevant combinations. By choosing from within this subset one can achieve different levels of tradeoff between cost and robustness, which allows for a finer choice of precisions depending on the problem difficulty and the available hardware. Our numerical experiments on both random dense matrices and real-life sparse matrices from a wide range of applications show that the practical behavior of GMRES-IR5 is in good agreement with our theoretical analysis. GMRES-IR5 therefore has the potential to solve relatively badly conditioned problems in less time and memory than GMRES-IR3, thanks to the use of lower precision arithmetic in the GMRES iterations.
Published: 2021

30. Up-to-date assessment of 3D frequency-domain full waveform inversion based on the sparse multifrontal solver MUMPS

Author: Laure Combe, Jean-Yves L'Excellent, C. Puglisi, Stéphane Operto, Théo Mary, M. Gerest, Alfredo Buttari, and Patrick R. Amestoy
Subjects: Rank (linear algebra), Computer science, Frequency band, Frequency domain, Linear system, Inversion (meteorology), Boundary value problem, Solver, Supercomputer, Algorithm
Abstract: Summary Efficient frequency-domain Full Waveform Inversion (FWI) can be applied on long-offset/wide-azimuth stationary-recording seabed acquisitions carried out with ocean-bottom cables (OBC) and ocean bottom nodes (OBN) since the wide angular illumination provided by these surveys allows for limiting the inversion to a few discrete frequencies. In the frequency domain, the forward problem is a boundary value problem requiring the solution of large and sparse linear systems with multiple right-hand sides. In this study, we revisit the potential of the massively-parallel sparse multifrontal solver MUMPS to perform efficiently the multi-source forward problem of 3D visco-acoustic FWI. The execution time and memory consumption of the solver are further improved by exploiting the low rank properties of the sub-blocks of the dense frontal matrices, the sparsity of the right-hand sides (seismic sources) and the work in progress on the use of mixed precision arithmetic. We revisit a 3D OBC case study from the North Sea in the 3.5~Hz-13~Hz frequency band using between 10 and 70 nodes of the Jean-Zay supercomputer of IDRIS and show that, even without exploiting low rank properties, problems involving 50 millions of unknowns and probably more can be tackled today with this technology.
Published: 2021
Full Text: View/download PDF

31. Block Low-Rank Matrices with Shared Bases: Potential and Limitations of the BLR^2 Format

Author: Alfredo Buttari, Théo Mary, Cleve Ashcraft, ANSYS, Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées, Performance et Qualité des Algorithmes Numériques (PEQUAN), Laboratoire d'Informatique de Paris 6 (LIP6), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Centre National de la Recherche Scientifique (CNRS), and LIP6
Subjects: Rank (linear algebra), Astrophysics::High Energy Astrophysical Phenomena, 010103 numerical & computational mathematics, Astrophysics::Cosmology and Extragalactic Astrophysics, computer.software_genre, 01 natural sciences, law.invention, Separable space, Matrix (mathematics), law, numerical linear algebra, [INFO]Computer Science [cs], 0101 mathematics, [MATH]Mathematics [math], Column (data store), Astrophysics::Galaxy Astrophysics, Mathematics, Block (data storage), Discrete mathematics, hierarchical matrices, Numerical linear algebra, block low-rank matrices, Special class, LU decomposition, Data sparse matrices, LU factorization, computer, Analysis, block separable matrices
Abstract: International audience; We investigate a special class of data sparse rank-structured matrices that combine a flat block low-rank (BLR) partitioning with the use of shared (called nested in the hierarchical case) bases. This format is to H 2 matrices what BLR is to H matrices: we therefore call it the BLR 2 matrix format. We present algorithms for the construction and LU factorization of BLR 2 matrices, and perform their cost analysis-both asymptotically and for a fixed problem size. With weak admissibility, BLR 2 matrices reduce to block separable matrices (the flat version of HBS/HSS). Our analysis and numerical experiments reveal some limitations of BLR 2 matrices with weak admissibility, which we propose to overcome with two approaches: strong admissibility, and the use of multiple shared bases per row and column.
Published: 2020

32. Solving Block Low-Rank Linear Systems by LU Factorization is Numerically Stable

Author: Nicholas J. Higham, Théo Mary, Department of Mathematics [Manchester] (School of Mathematics), University of Manchester [Manchester], Performance et Qualité des Algorithmes Numériques (PEQUAN), LIP6, Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS), and Sorbonne Université (SU)
Subjects: Rank (linear algebra), floating-point arithmetic, General Mathematics, 010103 numerical & computational mathematics, computer.software_genre, 01 natural sciences, Stability (probability), law.invention, Factorization, law, Applied mathematics, [INFO]Computer Science [cs], 0101 mathematics, [MATH]Mathematics [math], Mathematics, Numerical linear algebra, Applied Mathematics, Linear system, [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], LU decomposition, rounding error analysis, 010101 applied mathematics, Computational Mathematics, numerical stability, Block low-rank matrices, LU factorization, Round-off error, computer, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA], Numerical stability
Abstract: Block low-rank (BLR) matrices possess a blockwise low-rank property that can be exploited to reduce the complexity of numerical linear algebra algorithms. The impact of these low-rank approximations on the numerical stability of the algorithms in floating-point arithmetic has not previously been analysed. We present rounding error analysis for the solution of a linear system by LU factorization of BLR matrices. Assuming that a stable pivoting scheme is used, we prove backward stability: the relative backward error is bounded by a modest constant times $\varepsilon $, where the low-rank threshold $\varepsilon $ is the parameter controlling the accuracy of the blockwise low-rank approximations. In addition to this key result, our analysis offers three new insights into the numerical behaviour of BLR algorithms. First, we compare the use of a global or local low-rank threshold and find that a global one should be preferred. Second, we show that performing intermediate recompressions during the factorization can significantly reduce its cost without compromising numerical stability. Third, we consider different BLR factorization variants and determine the update–compress–factor variant to be the best. Tests on a wide range of matrices from various real-life applications show that the predictions from the analysis are realized in practice.
Published: 2020
Full Text: View/download PDF

33. Stochastic Rounding and its Probabilistic Backward Error Analysis

Author: Nicholas J. Higham, Théo Mary, Michael P. Connolly, Department of Mathematics [Manchester] (School of Mathematics), University of Manchester [Manchester], Performance et Qualité des Algorithmes Numériques (PEQUAN), LIP6, Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS), and Sorbonne Université (SU)
Subjects: Floating point, floating-point arithmetic, 010103 numerical & computational mathematics, computer.software_genre, 01 natural sciences, Error analysis, numerical linear algebra, stochastic rounding, [INFO]Computer Science [cs], 0101 mathematics, [MATH]Mathematics [math], probabilistic backward error analysis, Mathematics, Real number, Numerical linear algebra, Applied Mathematics, Rounding, stagnation, Probabilistic logic, TheoryofComputation_GENERAL, round to nearest, [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], rounding error analysis, Computational Mathematics, computer, Algorithm, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: International audience; Stochastic rounding rounds a real number to the next larger or smaller floating-point number with probabilities 1 minus the relative distances to those numbers. It is gaining attention in deep learning because it can increase the success of low precision computations. We compare basic properties of stochastic rounding with those for round to nearest, finding properties in common as well as significant differences. We prove that for stochastic rounding the rounding errors are mean independent random variables with zero mean. We derive a new version of our probabilistic error analysis theorem from [SIAM J. Sci. Comput., 41 (2019), pp. A2815-A2835], weakening the assumption of independence of the random variables to mean independence. These results imply that for a wide range of linear algebra computations the backward error for stochastic rounding is unconditionally bounded by a multiple of √ nu to first order, with a certain probability, where n is the problem size and u is the unit roundoff. This is the first scenario where the rule of thumb that one can replace nu by √ nu in a rounding error bound has been shown to hold without any additional assumptions on the rounding errors. We also explain how stochastic rounding avoids the phenomenon of stagnation in sums, whereby small addends are obliterated by round to nearest when they are too small relative to the sum.
Published: 2020
Full Text: View/download PDF

34. Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores

Author: Nicholas J. Higham, Srikara Pranesh, Florent Lopez, Pierre Blanchard, Théo Mary, Department of Mathematics [Manchester] (School of Mathematics), University of Manchester [Manchester], Innovative Computing Laboratory [Knoxville] (ICL), The University of Tennessee [Knoxville], Centre National de la Recherche Scientifique (CNRS), Performance et Qualité des Algorithmes Numériques (PEQUAN), LIP6, and Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
Subjects: fused multiply-add, Floating point, floating-point arithmetic, Carry (arithmetic), matrix multiplication, 010103 numerical & computational mathematics, 01 natural sciences, law.invention, Computational science, Matrix (mathematics), law, Tensor (intrinsic definition), [INFO]Computer Science [cs], 0101 mathematics, [MATH]Mathematics [math], NVIDIA GPU, Mathematics, Block (data storage), Multiply–accumulate operation, Applied Mathematics, [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], LU decomposition, Matrix multiplication, rounding error analysis, Computational Mathematics, tensor cores, LU factorization, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: International audience; Computing units that carry out a fused multiply-add (FMA) operation with matrix arguments, referred to as tensor units by some vendors, have great potential for use in scientific computing. However, these units are inherently mixed precision and existing rounding error analyses do not support them. We consider a mixed precision block FMA that generalizes both the usual scalar FMA and existing tensor units. We describe how to exploit such a block FMA in the numerical linear algebra kernels of matrix multiplication and LU factorization and give detailed rounding error analyses of both kernels. An important application is to GMRES-based iterative refinement with block FMAs, for which our analysis provides new insight. Our framework is applicable to the tensor core units in the NVIDIA Volta and Turing GPUs. For these we compare matrix multiplication and LU factorization with TC16 and TC32 forms of FMA, which differ in the precision used for the output of the tensor cores. Our experiments on an NVDIA V100 GPU confirm the predictions of the analysis that the TC32 variant is much more accurate than the TC16 one, and they show that the accuracy boost is obtained with almost no performance loss.
Published: 2020
Full Text: View/download PDF

35. Improving the Complexity of Block Low-Rank Factorizations with Fast Matrix Arithmetic

Author: Clément Pernet, Daniel S. Roche, Théo Mary, Claude-Pierre Jeannerod, Université de Lyon, École normale supérieure - Lyon (ENS Lyon), Arithmetic and Computing (ARIC), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), School of Mathematics [Manchester], University of Manchester [Manchester], Calcul Algébrique et Symbolique, Sécurité, Systèmes Complexes, Codes et Cryptologie (CASC), Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), United States Naval Academy, École normale supérieure de Lyon (ENS de Lyon), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
Subjects: [INFO.INFO-CC]Computer Science [cs]/Computational Complexity [cs.CC], Rank (linear algebra), [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Block (permutation group theory), 010103 numerical & computational mathematics, Fast matrix arithmetic, 01 natural sciences, Omega, law.invention, Matrix (mathematics), Factorization, AMS subject classifications 15A23, 65F05, law, Strassen algorithm, [INFO]Computer Science [cs], 0101 mathematics, Arithmetic, [MATH]Mathematics [math], Mathematics, 010102 general mathematics, block low-rank matrices, [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], LU decomposition, linear algebra, Linear algebra, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Analysis, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: International audience; We consider the LU factorization of an $n\times n$ matrix $A$ represented as a block low-rank (BLR) matrix: most of its off-diagonal blocks are approximated by matrices of small rank $r$, which reduces the asymptotic complexity of computing the LU factorization of A down to $O(n^2r)$. In this article, our aim is to further reduce this complexity by exploiting fast matrix arithmetic, that is, the ability to multiply two $n\times n$ full-rank matrices together for $O(n^\omega)$ flops, where $\omega
Published: 2019
Full Text: View/download PDF

36. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures

Author: Théo Mary, Jean-Yves L'Excellent, Alfredo Buttari, Patrick R. Amestoy, Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT), Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-École normale supérieure - Lyon (ENS Lyon)
Subjects: Multi-core processor, Rank (linear algebra), Computer science, Applied Mathematics, sparse linear algebra, 010103 numerical & computational mathematics, Parallel computing, Solver, 010502 geochemistry & geophysics, 01 natural sciences, Reduction (complexity), Factorization, Elliptic partial differential equation, multifrontal factorization, Scalability, [INFO]Computer Science [cs], 0101 mathematics, [MATH]Mathematics [math], Block Low-Rank, multicore architectures, Software, 0105 earth and related environmental sciences, Block (data storage)
Abstract: International audience; Matrices coming from elliptic Partial Differential Equations have been shown to have a low-rank property which can be efficiently exploited in multifrontal solvers to provide a substantial reduction of their complexity. Among the possible low-rank formats, the Block Low-Rank format (BLR) is easy to use in a general purpose multifrontal solver and its potential compared to standard (full-rank) solvers has been demonstrated. Recently, new variants have been introduced and it was proved that they can further reduce the complexity but their performance has never been analyzed. In this paper, we present a multithreaded BLR factorization, and analyze its efficiency and scalability in shared-memory multicore environments. We identify the challenges posed by the use of BLR approximations in multifrontal solvers and put forward several algorithmic variants of the BLR factorization that overcome these challenges by improving its efficiency and scalability. We illustrate the performance analysis of the BLR multifrontal factorization with numerical experiments on a large set of problems coming from a variety of real-life applications.
Published: 2019
Full Text: View/download PDF

37. On the Complexity of the Block Low-Rank Multifrontal Factorization

Author: Jean-Yves L'Excellent, Théo Mary, Alfredo Buttari, Patrick R. Amestoy, Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT), Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Centre National de la Recherche Scientifique - CNRS (FRANCE), Ecole Normale Supérieure de Lyon - ENS de Lyon (FRANCE), Institut National Polytechnique de Toulouse - INPT (FRANCE), Institut National de la Recherche en Informatique et en Automatique - INRIA (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Université Claude Bernard-Lyon I - UCBL (FRANCE), and Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
Subjects: Rank (linear algebra), 65F05, Property (programming), 010103 numerical & computational mathematics, 010502 geochemistry & geophysics, 15A23, 01 natural sciences, 65Y20, Reduction (complexity), Factorization, multifrontal factorization, 65F50, [INFO]Computer Science [cs], Multifrontal factorization, 0101 mathematics, [MATH]Mathematics [math], 0105 earth and related environmental sciences, Mathematics, Block (data storage), 65N30, Applied Mathematics, Order (ring theory), sparse linear algebra, Sparse linear algebra, Solver, Algebra, Calcul parallèle, distribué et partagé, Block low-rank, Computational Mathematics, Elliptic partial differential equation, Block Low-Rank AMS subject classifications 15A06, Block Low-Rank
Abstract: International audience; Matrices coming from elliptic Partial Differential Equations have been shown to have a low-rank property: well defined off-diagonal blocks of their Schur complements can be approximated by low-rank products and this property can be efficiently exploited in multifrontal solvers to provide a substantial reduction of their complexity. Among the possible low-rank formats, the Block Low-Rank format (BLR) is easy to use in a general purpose multifrontal solver and has been shown to provide significant gains compared to full-rank on practical applications. However, unlike hierarchical formats, such as H and HSS, its theoretical complexity was unknown. In this paper, we extend the theoretical work done on hierarchical matrices in order to compute the theoretical complexity of the BLR multifrontal factorization. We then present several variants of the BLR multifrontal factorization, depending on the strategies used to perform the updates in the frontal matrices and on the constraints on how numerical pivoting is handled. We show how these variants can further reduce the complexity of the factorization. In the best case (3D, constant ranks), we obtain a complexity of the order of O(n^4/3). We provide an experimental study with numerical results to support our complexity bounds.
Published: 2017
Full Text: View/download PDF

38. Large-scale 3D EM modeling with a Block Low-Rank multifrontal direct solver

Author: Théo Mary, Piyoosh Jaysaval, Patrick R. Amestoy, Jean-Yves L'Excellent, Sébastien de la Kethulle de Ryhove, Alfredo Buttari, Daniil V. Shantsev, emgs [Oslo], Department of Physics [Oslo], Faculty of Mathematics and Natural Sciences [Oslo], University of Oslo (UiO)-University of Oslo (UiO), Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées, Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), Laboratoire de l'Informatique du Parallélisme (LIP), Université Toulouse III - Paul Sabatier (UT3), Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Université de Toulouse (UT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut National Polytechnique de Toulouse - INPT (FRANCE), Institut National de la Recherche en Informatique et en Automatique - INRIA (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), University of Texas at Austin (USA), University of Oslo (NORWAY), Institut de Recherche en Informatique de Toulouse - IRIT (Toulouse, France), and Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE)
Subjects: Floating point, Discretization, Rank (linear algebra), Computer science, 010103 numerical & computational mathematics, 010502 geochemistry & geophysics, 01 natural sciences, Numerical solutions, Matrix decomposition, Computational science, Factorization, Geochemistry and Petrology, Numerical approximations and analysis, [INFO]Computer Science [cs], 0101 mathematics, 0105 earth and related environmental sciences, Block (data storage), Linear system, Solver, Electromagnetic theory, Calcul parallèle, distribué et partagé, Geophysics, Numerical modelling, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Marine electromagnetics, Controlled source electromagnetics (CSEM)
Abstract: International audience; We put forward the idea of using a Block Low-Rank (BLR) multifrontal direct solver to efficiently solve the linear systems of equations arising from a finite-difference discretization of the frequency-domain Maxwell equations for 3-D electromagnetic (EM) problems. The solver uses a low-rank representation for the off-diagonal blocks of the intermediate dense matrices arising in the multifrontal method to reduce the computational load. A numerical threshold, the so-called BLR threshold, controlling the accuracy of low-rank representations was optimized by balancing errors in the computed EM fields against savings in floating point operations (flops). Simulations were carried out over large-scale 3-D resistivity models representing typical scenarios for marine controlled-source EM surveys, and in particular the SEG SEAM model which contains an irregular salt body. The flop count, size of factor matrices and elapsed run time for matrix factorization are reduced dramatically by using BLR representations and can go down to, respectively, 10, 30 and 40 per cent of their full-rank values for our largest system with N = 20.6 million unknowns. The reductions are almost independent of the number of MPI tasks and threads at least up to 90 × 10 = 900 cores. The BLR savings increase for larger systems, which reduces the factorization flop complexity from O(N2) for the full-rank solver to O(Nm) with m = 1.4–1.6. The BLR savings are significantly larger for deep-water environments that exclude the highly resistive air layer from the computational domain. A study in a scenario where simulations are required at multiple source locations shows that the BLR solver can become competitive in comparison to iterative solvers as an engine for 3-D controlled-source electromagnetic Gauss–Newton inversion that requires forward modelling for a few thousand right-hand sides.
Published: 2017
Full Text: View/download PDF

39. Fast 3D frequency-domain full waveform inversion with a parallel Block Low-Rank multifrontal direct solver: application to OBC data from the North Sea

Author: Jean-Yves L'Excellent, Patrick R. Amestoy, Alain Miniussi, Stéphane Operto, Romain Brossier, Ludovic Métivier, Alfredo Buttari, Théo Mary, Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées, Institut des Sciences de la Terre (ISTerre), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-PRES Université de Grenoble-Institut de recherche pour le développement [IRD] : UR219-Institut national des sciences de l'Univers (INSU - CNRS)-Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-Université Joseph Fourier - Grenoble 1 (UJF), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Equations aux Dérivées Partielles (EDP), Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA), Géoazur (GEOAZUR 7329), Institut national des sciences de l'Univers (INSU - CNRS)-Observatoire de la Côte d'Azur, Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud]), Algorithmes Parallèles et Optimisation (IRIT-APO), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-Institut national des sciences de l'Univers (INSU - CNRS)-Institut de recherche pour le développement [IRD] : UR219-Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Centre National de la Recherche Scientifique (CNRS), Equations aux Dérivées Partielles (EDP ), Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Université Côte d'Azur (UCA)-Institut national des sciences de l'Univers (INSU - CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud])-Centre National de la Recherche Scientifique (CNRS)-Observatoire de la Côte d'Azur, Université Côte d'Azur (UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud])
Subjects: Speedup, [SDU.STU.GP]Sciences of the Universe [physics]/Earth Sciences/Geophysics [physics.geo-ph], Computation, 010103 numerical & computational mathematics, frequency domain, block low-rank, 010502 geochemistry & geophysics, 01 natural sciences, law.invention, Factorization, Geochemistry and Petrology, law, 0101 mathematics, 0105 earth and related environmental sciences, Sparse matrix, Mathematics, multifrontal method, finite difference, Solver, LU decomposition, vertical transverse isotropy, Geophysics, Amplitude, Frequency domain, Wave equation, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], Algorithm
Abstract: Wide-azimuth long-offset ocean bottom cable (OBC)/ocean bottom node surveys provide a suitable framework to perform computationally efficient frequency-domain full-waveform inversion (FWI) with a few discrete frequencies. Frequency-domain seismic modeling is performed efficiently with moderate computational resources for a large number of sources with a sparse multifrontal direct solver (Gauss-elimination techniques for sparse matrices). Approximate solutions of the time-harmonic wave equation are computed using a block low-rank (BLR) approximation, leading to a significant reduction in the operation count and in the volume of communication during the lower upper (LU) factorization as well as offering great potential for reduction in the memory demand. Moreover, the sparsity of the seismic source vectors is exploited to speed up the forward elimination step during the computation of the monochromatic wavefields. The relevance and the computational efficiency of the frequency-domain FWI performed in the viscoacoustic vertical transverse isotropic (VTI) approximation was tested with a real 3D OBC case study from the North Sea. The FWI subsurface models indicate a dramatic resolution improvement relative to the initial model built by reflection traveltime tomography. The amplitude errors introduced in the modeled wavefields by the BLR approximation for different low-rank thresholds have a negligible footprint in the FWI results. With respect to a standard multifrontal sparse direct factorization, and without compromise of the accuracy of the imaging, the BLR approximation can bring a reduction of the LU factor size by a factor of up to three. This reduction is not yet exploited to reduce the effective memory usage (ongoing work). The flop reduction can be larger than a factor of 10 and can bring a factor of time reduction of around three. Moreover, this reduction factor tends to increase with frequency, namely with the matrix size. Frequency-domain viscoacoustic VTI FWI can be viewed as an efficient tool to build an initial model for elastic FWI of 4C OBC data.
Published: 2016
Full Text: View/download PDF

40. 3D frequency-domain seismic modeling with a Parallel BLR multifrontal direct solver

Author: Romain Brossier, Ludovic Métivier, Jean Virieux, Théo Mary, Jean-Yves L'Excellent, Alain Miniussi, Alfredo Buttari, Clement Weisbecker, Stéphane Operto, Patrick R. Amestoy, Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Institut National Polytechnique (Toulouse) (Toulouse INP), Institut des Sciences de la Terre (ISTerre), Université Joseph Fourier - Grenoble 1 (UJF)-Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-Institut national des sciences de l'Univers (INSU - CNRS)-Institut de recherche pour le développement [IRD] : UR219-PRES Université de Grenoble-Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Equations aux Dérivées Partielles (EDP), Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Observatoire de la Côte d'Azur, COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA), Géoazur (GEOAZUR 7329), Institut national des sciences de l'Univers (INSU - CNRS)-Observatoire de la Côte d'Azur, COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud]), Livermore Software Technology Corporation [Livermore] (LSTC), Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux - IFSTTAR (FRANCE), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Institut National de la Recherche en Informatique et en Automatique - INRIA (FRANCE), Institut de Recherche pour le Développement - IRD (FRANCE), Université Nice Sophia Antipolis (FRANCE), Observatoire de la Côte d'Azur (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Université Joseph Fourier Grenoble 1 - UJF (FRANCE), Université de Savoie Mont Blanc - USMB (FRANCE), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Centre National de la Recherche Scientifique (CNRS)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Université Joseph Fourier - Grenoble 1 (UJF)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Centre National de la Recherche Scientifique (CNRS)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Université Joseph Fourier - Grenoble 1 (UJF)-Université Pierre Mendès France - Grenoble 2 (UPMF), Université Côte d'Azur (UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), Université Côte d'Azur (UCA)-Institut national des sciences de l'Univers (INSU - CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud])-Centre National de la Recherche Scientifique (CNRS)-Observatoire de la Côte d'Azur, Université Côte d'Azur (UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-PRES Université de Grenoble-Institut de recherche pour le développement [IRD] : UR219-Institut national des sciences de l'Univers (INSU - CNRS)-Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-Université Joseph Fourier - Grenoble 1 (UJF), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA), Université Côte d'Azur (UCA), Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud]), Institut national des sciences de l'Univers (INSU - CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud])-Centre National de la Recherche Scientifique (CNRS)-Observatoire de la Côte d'Azur, and Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)
Subjects: Finite difference, Wave propagation, Computer science, [SDU.STU]Sciences of the Universe [physics]/Earth Sciences, Inversion (meteorology), 010103 numerical & computational mathematics, Parallel computing, Solver, 010502 geochemistry & geophysics, Wave equation, 01 natural sciences, Calcul parallèle, distribué et partagé, Frequency-domain, Seismic modeling, Frequency domain, Programming, 0101 mathematics, Algebraic number, Approximate solution, Full waveform, ComputingMilieux_MISCELLANEOUS, 0105 earth and related environmental sciences, 3D
Abstract: Three-dimensional frequency-domain full waveform inversion (FWI) of fixed-spread data can be efficiently performed in the visco-acoustic approximation when seismic modeling is based on a sparse direct solver. We present a parallel algebraic Block Low-Rank (BLR) multifrontal solver which provides an approximate solution of the time-harmonic wave equation with a reduced operation count, memory demand, and volume of communication relative to the full-rank solver. We analyze the parallel efficiency and the accuracy of the solver with a realistic FWI case study from the Valhall oil field.
Published: 2015

41. Efficient 3D frequency-domain full-waveform inversion of ocean-bottom cable data with sparse block low-rank direct solver: a real data case study from the North Sea

Author: Jean-Yves L'Excellent, Jean Virieux, Alain Miniussi, Ludovic Métivier, Alfredo Buttari, Romain Brossier, Patrick R. Amestoy, Clement Weisbecker, Stéphane Operto, Alessandra Ribodetti, Théo Mary, Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux - IFSTTAR (FRANCE), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Institut National de la Recherche en Informatique et en Automatique - INRIA (FRANCE), Institut de Recherche pour le Développement - IRD (FRANCE), Université Nice Sophia Antipolis (FRANCE), Observatoire de la Côte d'Azur (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Université Joseph Fourier Grenoble 1 - UJF (FRANCE), Université de Savoie Mont Blanc - USMB (FRANCE), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées, Institut National Polytechnique (Toulouse) (Toulouse INP), Institut des Sciences de la Terre (ISTerre), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-PRES Université de Grenoble-Institut de recherche pour le développement [IRD] : UR219-Institut national des sciences de l'Univers (INSU - CNRS)-Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-Université Joseph Fourier - Grenoble 1 (UJF), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Equations aux Dérivées Partielles (EDP), Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA), Observatoire de la Côte d'Azur, Université Côte d'Azur (UCA), Géoazur (GEOAZUR 7329), Institut national des sciences de l'Univers (INSU - CNRS)-Observatoire de la Côte d'Azur, Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud]), Livermore Software Technology Corporation [Livermore] (LSTC), Algorithmes Parallèles et Optimisation (IRIT-APO), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Joseph Fourier - Grenoble 1 (UJF)-Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-Institut national des sciences de l'Univers (INSU - CNRS)-Institut de recherche pour le développement [IRD] : UR219-PRES Université de Grenoble-Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Université Joseph Fourier - Grenoble 1 (UJF)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Centre National de la Recherche Scientifique (CNRS)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Université Joseph Fourier - Grenoble 1 (UJF)-Université Pierre Mendès France - Grenoble 2 (UPMF), Université Côte d'Azur (UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), Université Côte d'Azur (UCA)-Institut national des sciences de l'Univers (INSU - CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud])-Centre National de la Recherche Scientifique (CNRS)-Observatoire de la Côte d'Azur, Université Côte d'Azur (UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud]), Institut national des sciences de l'Univers (INSU - CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud])-Centre National de la Recherche Scientifique (CNRS)-Observatoire de la Côte d'Azur, and Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)
Subjects: Wave propagation, OBC, Computer science, Frequency band, Modeling, [SDU.STU]Sciences of the Universe [physics]/Earth Sciences, Inversion (meteorology), [PHYS.PHYS.PHYS-GEO-PH]Physics [physics]/Physics [physics]/Geophysics [physics.geo-ph], Full-waveform inversion, Solver, [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, LU decomposition, law.invention, Calcul parallèle, distribué et partagé, Overburden, Frequency-domain, Wavelet, law, Frequency domain, Algorithm
Abstract: International audience; We present an application of 3D frequency-domain full waveform inversion (FWI) on ocean-bottom cable data from the North Sea. Frequency-domain seismic modeling is performed in the visco-acoustic VTI approximation with a sparse direct solver based on the multifrontal method. The computational cost of the multifrontal LU factorization is efficiently reduced with a block-low rank (BLR) approximation of the dense frontal matrices. A multiscale frequency-domain FWI is applied by successive inversions of 11 discrete frequencies in the 3.5Hz-10Hz frequency band. The velocity model built by FWI reveals short-scale features such as channels, scrapes left by drifting icebergs on the paleo-seafloor, fractures and deep reflectors below the reservoir level, although the presence of gas in the overburden. The quality of the FWI results is controlled by time-domain modeling and source wavelet estimation. Next step is the application of multi-parameter FWI with second-order optimization algorithms, which can be efficiently implemented with our frequency-domain modeling engine.
Published: 2015
Full Text: View/download PDF

42. Access-averse framework for computing low-rank matrix approximations

Author: Jakub Kurzak, Stanimire Tomov, Jack Dongarra, Théo Mary, Ichitaro Yamazaki, Innovative Computing Laboratory [Knoxville] (ICL), The University of Tennessee [Knoxville], Université Toulouse III - Paul Sabatier (UT3), and Université Fédérale Toulouse Midi-Pyrénées
Subjects: Numerical linear algebra, Theoretical computer science, Computer science, Low-rank approximation, [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], computer.software_genre, QR decomposition, Matrix decomposition, Matrix (mathematics), Range (mathematics), Lanczos resampling, Power iteration, Singular value decomposition, computer, ComputingMilieux_MISCELLANEOUS
Abstract: Low-rank matrix approximations play important roles in many statistical, scientific, and engineering applications. To compute such approximations, different algorithms have been developed by researchers from a wide range of areas including theoretical computer science, numerical linear algebra, statistics, applied mathematics, data analysis, machine learning, and physical and biological sciences. In this paper, to combine these efforts, we present an “access-averse” framework which encapsulates some of the existing algorithms for computing a truncated singular value decomposition (SVD). This framework not only allows us to develop software whose performance can be tuned based on domain specific knowledge, but it also allows a user from one discipline to test an algorithm from another, or to combine the techniques from different algorithms. To demonstrate this potential, we implement the framework on multicore CPUs with multiple GPUs and compare the performance of two representative algorithms, blocked variants of matrix power and Lanczos methods. Our performance studies with large-scale graphs from real applications demonstrate that, when combined with communication-avoiding and thick-restarting techniques, the Lanczos method can be competitive with the power method, which is one of the most popular methods currently used for these applications. In addition, though we only focus on the truncated SVDs, the two computational kernels used in our studies, the sparse-matrix dense-matrix multiply and tall-skinny QR factorization, are fundamental building blocks for computing low-rank approximations with other objectives. Hence, our studies may have a greater impact beyond the truncated SVDs.
Published: 2014
Full Text: View/download PDF

43. Stochastic rounding: implementation, error analysis and applications

Author: Matteo Croci, Massimiliano Fasi, Nicholas J. Higham, Theo Mary, and Mantas Mikaitis
Subjects: floating-point arithmetic, rounding error analysis, IEEE 754, binary16, bfloat16, machine learning, Science
Abstract: Stochastic rounding (SR) randomly maps a real number x to one of the two nearest values in a finite precision number system. The probability of choosing either of these two numbers is 1 minus their relative distance to x. This rounding mode was first proposed for use in computer arithmetic in the 1950s and it is currently experiencing a resurgence of interest. If used to compute the inner product of two vectors of length n in floating-point arithmetic, it yields an error bound with constant [Formula: see text] with high probability, where u is the unit round-off. This is not necessarily the case for round to nearest (RN), for which the worst-case error bound has constant nu. A particular attraction of SR is that, unlike RN, it is immune to the phenomenon of stagnation, whereby a sequence of tiny updates to a relatively large quantity is lost. We survey SR by discussing its mathematical properties and probabilistic error analysis, its implementation, and its use in applications, with a focus on machine learning and the numerical solution of differential equations.
Published: 2022
Full Text: View/download PDF

44. Combining sparse approximate factorizations with mixed precision iterative refinement

Author: Patrick Amestoy, Alfredo Buttari, Nicholas Higham, Excellent, Jean-Yves L., Théo Mary, Bastien Vieublé, Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT), École normale supérieure de Lyon (ENS de Lyon), Mumps Technologies [Lyon], Algorithmes Parallèles et Optimisation (IRIT-APO), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Centre National de la Recherche Scientifique (CNRS), University of Manchester [Manchester], Department of Mathematics [Manchester] (School of Mathematics), Performance et Qualité des Algorithmes Numériques (PEQUAN), LIP6, Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Olympe supercomputer of the CALMIP center (project P0989), European Project: 676629,H2020 Pilier Excellent Science,H2020-EINFRA-2015-1,EoCoE(2015), Vieublé, Bastien, and Energy oriented Centre of Excellence for computer applications - EoCoE - - H2020 Pilier Excellent Science2015-10-01 - 2018-09-30 - 676629 - VALID
Subjects: multifrontal method, parallelism, mixed precision, floating-point arithmetic, GMRES, [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA], rounding error analysis, [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF], [INFO.INFO-MS] Computer Science [cs]/Mathematical Software [cs.MS], [INFO.INFO-PF] Computer Science [cs]/Performance [cs.PF], preconditioning, [INFO.INFO-NA] Computer Science [cs]/Numerical Analysis [cs.NA], sparse direct solver, iterative refinement, linear system, multiple precision, [INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS]
Abstract: International audience; The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so we first develop a new error analysis for LU-and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank (BLR) factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that the proposed approach can lead to considerable reductions of both the time and memory consumption.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

44 results on '"Théo Mary"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources