584 results on '"Darve, Eric"'
Search Results
202. Low-Rank Factorizations in Data Sparse Hierarchical Algorithms for Preconditioning Symmetric Positive Definite Matrices
- Author
-
Agullo, Emmanuel, primary, Darve, Eric, additional, Giraud, Luc, additional, and Harness, Yuval, additional
- Published
- 2018
- Full Text
- View/download PDF
203. Optimal estimation and scheduling in aquifer management using the rapid feedback control method
- Author
-
Ghorbanidehno, Hojat, primary, Kokkinaki, Amalia, additional, Kitanidis, Peter K., additional, and Darve, Eric, additional
- Published
- 2017
- Full Text
- View/download PDF
204. An efficient preconditioner for the fast simulation of a 2D stokes flow in porous media
- Author
-
Quaife, Bryan, primary, Coulier, Pieter, additional, and Darve, Eric, additional
- Published
- 2017
- Full Text
- View/download PDF
205. Efficiently sampling conformations and pathways using the concurrent adaptive sampling (CAS) algorithm
- Author
-
Ahn, Surl-Hee, primary, Grate, Jay W., additional, and Darve, Eric F., additional
- Published
- 2017
- Full Text
- View/download PDF
206. Smoothing‐based compressed state K alman filter for joint state‐parameter estimation: Applications in reservoir characterization and CO 2 storage monitoring
- Author
-
Li, Y. J., primary, Kokkinaki, Amalia, additional, Darve, Eric F., additional, and Kitanidis, Peter K., additional
- Published
- 2017
- Full Text
- View/download PDF
207. Sparse Supernodal Solver Using Block Low-Rank Compression
- Author
-
Pichon, Gregoire, primary, Darve, Eric, additional, Faverge, Mathieu, additional, Ramet, Pierre, additional, and Roman, Jean, additional
- Published
- 2017
- Full Text
- View/download PDF
208. Fast hierarchical algorithms for generating Gaussian random fields
- Author
-
Blanchard, Pierre, Coulaud, Olivier, Darve, Eric, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Institute for Computational and Mathematical Engineering [Stanford] (ICME), Stanford University, Department of Mechanical Engineering [Stanford], Inria Bordeaux Sud-Ouest, and FastLA-PLAFRIM
- Subjects
randomized SVD ,FMM ,H 2 -methods ,[INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA] ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,multivariate Gaussian random variables ,[STAT.CO]Statistics [stat]/Computation [stat.CO] ,covariance kernel matrices ,FFT ,[INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS] - Abstract
Low-rank approximation (LRA) techniques have become crucial tools in scientific computing in order to reduce the cost of storing matrices and compute usual matrix operations. Since standard techniques like the SVD do not scale well with the problem size N, there has been recently a growing interest for alternative methods like randomized LRAs. These methods are usually cheap, easy to implement and optimize, since they involve only very basic operations like Matrix Vector Products (MVPs) or orthogonalizations. More precisely, randomization allows for reducing the cubic cost required to perform a standard matrix factorization to the quadratic cost required to apply a few MVPs, namely O(r × N^2) operations where r is the numerical rank of the matrix. First of all, we present a new efficient algorithm for performing MVPs in O(N) operations called the Uniform FMM (ufmm). It is based on a hierarchical (data sparse) representation of a kernel matrix combined with polynomial interpolation of the kernel on equispaced grids. The latter feature allows for FFT-acceleration and consequently reduce both running time and memory footprint but has implications on accuracy and stability. Then, the ufmm is used to speed-up the MVPs involved in the randomized SVD, thus reducing its cost to O(r^2 × N) and exhibiting very competitive performance when the distribution of points is large and highly heterogeneous. Finally, we make use of this algorithm to efficiently generate spatially correlated multivariate Gaussian random variables.; Les approximations de rang faible (LRA) sont devenus des outils fondamentaux en calcul scientifique en vue de réduire les coûts liés au stockage et aux opérations matricielles. Le coût des méthodes standards comme la SVD croît très rapidement avec la taille du problème N, c'est pourquoi des méthodes alternatives comme les approches aléatoires (i.e. basées sur la projection ou la sélection de colonnes/l'échantillonnage aléatoire) se popularisent. Ces méthodes sont en général peu coûteuses et facile à implémenter et optimiser, car elles ne mettent en oeuvre que des opérations matricielles simples comme des produits ou des orthogonalisations. Plus précisemment, les LRA aléatoires permettent de réduire le coût cubique en N des méthodes standards de factorisation au coût quadratique nécessaire à la réalisation de quelques produits matrices vecteurs, i.e., O(r × N^2) opérations où r est le rang numérique de la matrice. Dans un premier temps, nous présentons un algorithme efficace pour réaliser des MVPs en O(N) opérations, que nous appelons Uniform FMM (ufmm). Il est basé sur la combinaison d'une représentation hiérachique d'une matrice noyau et l'interpolation polynomiale du noyau associé sur une grille régulière (uniforme). Cette dernière propriété permet une accélération par FFT réduisant ainsi le temps de calcul et la consommation mémoire mais a des répercussions sur la précision et la stabilité de l'algorithme. Ensuite, la ufmm est utilisée pour accélerer les MVPs intervenants dans la SVD aléatoire (i.e., SVD par projection aléatoire) diminuant son coût asymptotique à O(r^2 × N). La méthode est particulièrement compétitive pour des distributions de points hétérogènes. Enfin, nous utilisons cet algorithme pour générer des champs de variables Gaussiens de manière efficace.
- Published
- 2015
209. Algorithmes hiérarchiques rapides pour la génération de champs aléatoires Gaussiens
- Author
-
Blanchard, Pierre, Coulaud, Olivier, Darve, Eric, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Institute for Computational and Mathematical Engineering [Stanford] (ICME), Stanford University, Department of Mechanical Engineering [Stanford], Inria Bordeaux Sud-Ouest, FastLA-PLAFRIM, Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
randomized SVD ,FMM ,H 2 -methods ,[INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA] ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,multivariate Gaussian random variables ,[STAT.CO]Statistics [stat]/Computation [stat.CO] ,covariance kernel matrices ,FFT ,[INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS] - Abstract
Low-rank approximation (LRA) techniques have become crucial tools in scientific computing in order to reduce the cost of storing matrices and compute usual matrix operations. Since standard techniques like the SVD do not scale well with the problem size N, there has been recently a growing interest for alternative methods like randomized LRAs. These methods are usually cheap, easy to implement and optimize, since they involve only very basic operations like Matrix Vector Products (MVPs) or orthogonalizations. More precisely, randomization allows for reducing the cubic cost required to perform a standard matrix factorization to the quadratic cost required to apply a few MVPs, namely O(r × N^2) operations where r is the numerical rank of the matrix. First of all, we present a new efficient algorithm for performing MVPs in O(N) operations called the Uniform FMM (ufmm). It is based on a hierarchical (data sparse) representation of a kernel matrix combined with polynomial interpolation of the kernel on equispaced grids. The latter feature allows for FFT-acceleration and consequently reduce both running time and memory footprint but has implications on accuracy and stability. Then, the ufmm is used to speed-up the MVPs involved in the randomized SVD, thus reducing its cost to O(r^2 × N) and exhibiting very competitive performance when the distribution of points is large and highly heterogeneous. Finally, we make use of this algorithm to efficiently generate spatially correlated multivariate Gaussian random variables.; Les approximations de rang faible (LRA) sont devenus des outils fondamentaux en calcul scientifique en vue de réduire les coûts liés au stockage et aux opérations matricielles. Le coût des méthodes standards comme la SVD croît très rapidement avec la taille du problème N, c'est pourquoi des méthodes alternatives comme les approches aléatoires (i.e. basées sur la projection ou la sélection de colonnes/l'échantillonnage aléatoire) se popularisent. Ces méthodes sont en général peu coûteuses et facile à implémenter et optimiser, car elles ne mettent en oeuvre que des opérations matricielles simples comme des produits ou des orthogonalisations. Plus précisemment, les LRA aléatoires permettent de réduire le coût cubique en N des méthodes standards de factorisation au coût quadratique nécessaire à la réalisation de quelques produits matrices vecteurs, i.e., O(r × N^2) opérations où r est le rang numérique de la matrice. Dans un premier temps, nous présentons un algorithme efficace pour réaliser des MVPs en O(N) opérations, que nous appelons Uniform FMM (ufmm). Il est basé sur la combinaison d'une représentation hiérachique d'une matrice noyau et l'interpolation polynomiale du noyau associé sur une grille régulière (uniforme). Cette dernière propriété permet une accélération par FFT réduisant ainsi le temps de calcul et la consommation mémoire mais a des répercussions sur la précision et la stabilité de l'algorithme. Ensuite, la ufmm est utilisée pour accélerer les MVPs intervenants dans la SVD aléatoire (i.e., SVD par projection aléatoire) diminuant son coût asymptotique à O(r^2 × N). La méthode est particulièrement compétitive pour des distributions de points hétérogènes. Enfin, nous utilisons cet algorithme pour générer des champs de variables Gaussiens de manière efficace.
- Published
- 2015
210. H-matrix techniques for parallel hybrid solvers
- Author
-
Agullo, Emmanuel, Darve, Eric, Harness, Yuval, Giraud, Luc, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Mechanical Engineering [Stanford], Stanford University, Institute for Computational and Mathematical Engineering [Stanford] (ICME), Inria@SiliconValley, FastLA, and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
- Subjects
[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,ComputingMilieux_MISCELLANEOUS ,[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA] - Abstract
International audience
- Published
- 2015
211. Data sparse techniques for parallel hybrid solvers
- Author
-
Agullo, Emmanuel, Darve, Eric, Giraud, Luc, Harness, Yuval, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Mechanical Engineering [Stanford], Stanford University, Institute for Computational and Mathematical Engineering [Stanford] (ICME), Inria@SiliconValley, FastLA, and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
- Subjects
Preconditioning techniques ,MathematicsofComputing_NUMERICALANALYSIS ,data-sparse ,Schur complement ,[INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA] ,[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA] - Abstract
International audience; In this talk we will describe how H-matrix data sparse techniques can be implemented in a parallel hybrid sparse linear solver based on algebraic non overlapping domain decomposition approach.Strong-hierarchical matrix arithmetic and various clustering techniques to approximate the local Schur complements will be investigated, aiming at reducing workload and memory consumption while complying with structures of the local interfaces of the sub-domains. Utilization of these techniques to form effective global preconditioner will be presented.
- Published
- 2015
212. ScalFMM: A Generic Parallel Fast Multipole Library
- Author
-
Blanchard, Pierre, Bramas, Bérenger, Coulaud, Olivier, Darve, Eric, Dupuy, Laurent, Etcheverry, Arnaud, Sylvand, Guillaume, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Department of Mechanical Engineering [Stanford], Stanford University, Institute for Computational and Mathematical Engineering [Stanford] (ICME), Service des Recherches Métallurgiques Appliquées (SRMA), Département des Matériaux pour le Nucléaire (DMN), CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Airbus Group Innovations [Suresnes], Airbus [France], HiePACS, FastLA, Optidis, Plafrim, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)
- Subjects
ScalFMM ,Time Domain ,BEM ,HPC ,Dislocation Dynamics ,FMM ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] - Abstract
International audience; ScalFMM (Parallel Fast Multipole Library for Large Scale Simulations) offers all the functionalities needed to perform large parallel simulations while enabling an easy customization of the simulation components: kernels, particles and cells. We will present how we use our library on two kinds of application involving boundary integral representations of physical fields. The first one implements isotropic dislocation kernels for Dislocation Dynamics and the second a time dependent kernel for acoustic problems.
- Published
- 2015
213. BLOCK BASIS FACTORIZATION FOR SCALABLE KERNEL EVALUATION.
- Author
-
RUOXI WANG, YINGZHOU LI, MAHONEY, MICHAEL W., and DARVE, ERIC
- Subjects
ALGORITHMS ,RADIAL basis functions ,LOW-rank matrices ,APPROXIMATION algorithms ,KERNEL functions - Abstract
Kernel methods are widespread in machine learning; however, they are limited by the quadratic complexity of the construction, application, and storage of kernel matrices. Low-rank matrix approximation algorithms are widely used to address this problem and reduce the arithmetic and storage cost. However, we observed that for some datasets with wide intraclass variability, the optimal kernel parameter for smaller classes yields a matrix that is less well-approximated by low-rank methods. In this paper, we propose an efficient structured low-rank approximation method|the block basis factorization (BBF)|and its fast construction algorithm to approximate radial basis function kernel matrices. Our approach has linear memory cost and oating point operations for many machine learning kernels. BBF works for a wide range of kernel bandwidth parameters and extends the domain of applicability of low-rank approximation methods significantly. Our empirical results demonstrate the stability and superiority over the state-of-the-art kernel approximation algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
214. Efficiently sampling conformations and pathways using the concurrent adaptive sampling (CAS) algorithm.
- Author
-
Ahn, Surl-Hee, Ahn, Surl-Hee, Grate, Jay W, Darve, Eric F, Ahn, Surl-Hee, Ahn, Surl-Hee, Grate, Jay W, and Darve, Eric F
- Abstract
Molecular dynamics simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules, but they are limited by the time scale barrier. That is, we may not obtain properties' efficiently because we need to run microseconds or longer simulations using femtosecond time steps. To overcome this time scale barrier, we can use the weighted ensemble (WE) method, a powerful enhanced sampling method that efficiently samples thermodynamic and kinetic properties. However, the WE method requires an appropriate partitioning of phase space into discrete macrostates, which can be problematic when we have a high-dimensional collective space or when little is known a priori about the molecular system. Hence, we developed a new WE-based method, called the "Concurrent Adaptive Sampling (CAS) algorithm," to tackle these issues. The CAS algorithm is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective variables and adaptive macrostates to enhance the sampling in the high-dimensional space. This is especially useful for systems in which we do not know what the right reaction coordinates are, in which case we can use many collective variables to sample conformations and pathways. In addition, a clustering technique based on the committor function is used to accelerate sampling the slowest process in the molecular system. In this paper, we introduce the new method and show results from two-dimensional models and bio-molecules, specifically penta-alanine and a triazine trimer.
- Published
- 2017
215. On the numerical rank of radial basis function kernels in high dimension
- Author
-
Wang, Ruoxi, Li, Yingzhou, Darve, Eric, Wang, Ruoxi, Li, Yingzhou, and Darve, Eric
- Abstract
Low-rank approximations are popular methods to reduce the high computational cost of algorithms involving large-scale kernel matrices. The success of low-rank methods hinges on the matrix rank of the kernel matrix, and in practice, these methods are effective even for high-dimensional datasets. Their practical success motivates our analysis of the function rank, an upper bound of the matrix rank. In this paper, we consider radial basis functions (RBF), approximate the RBF kernel with a low-rank representation that is a finite sum of separate products and provide explicit upper bounds on the function rank and the $L_\infty$ error for such approximations. Our three main results are as follows. First, for a fixed precision, the function rank of RBFs, in the worst case, grows polynomially with the data dimension. Second, precise error bounds for the low-rank approximations in the $L_\infty$ norm are derived in terms of the function smoothness and the domain diameters. Finally, a group pattern in the magnitude of singular values for RBF kernel matrices is observed and analyzed, and is explained by a grouping of the expansion terms in the kernel's low-rank representation. Empirical results verify the theoretical results.
- Published
- 2017
- Full Text
- View/download PDF
216. A distributed-memory hierarchical solver for general sparse linear systems
- Author
-
Chen, Chao, Pouransari, Hadi, Rajamanickam, Sivasankaran, Boman, Erik G., Darve, Eric, Chen, Chao, Pouransari, Hadi, Rajamanickam, Sivasankaran, Boman, Erik G., and Darve, Eric
- Abstract
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results to demonstrate the versatility and scalability of the parallel algorithm.
- Published
- 2017
217. A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems
- Author
-
Wong, Jonathan, Kuhl, Ellen, and Darve, Eric
- Subjects
Computer Science::Performance ,Computational Engineering, Finance, and Science (cs.CE) ,FOS: Computer and information sciences ,65Y10 ,Computer Science::Graphics ,Computer Science::Mathematical Software ,Computer Science - Mathematical Software ,Computer Science - Computational Engineering, Finance, and Science ,Mathematical Software (cs.MS) - Abstract
Recently, graphics processors (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector multiplication (SPMV) operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are developed for unstructured finite element meshes on GPUs. The effective bandwidth of current GPU algorithms and the newly proposed algorithms are measured and analyzed for 15 sparse matrices of varying sizes and varying sparsity structures. The effects of optimization and differences between the new GPU algorithm and its variants are then subsequently studied. Lastly, both new and current SPMV GPU algorithms are utilized in the GPU CG Solver in GPU finite element simulations of the heart. These results are then compared against parallel PETSc finite element implementation results. The effective bandwidth tests indicate that the new algorithms compare very favorably with current algorithms for a wide variety of sparse matrices and can yield very notable benefits. GPU finite element simulation results demonstrate the benefit of using GPUs for finite element analysis, and also show that the proposed algorithms can yield speedup factors up to 12-fold for real finite element applications., Comment: 35 pages, 22 figures
- Published
- 2015
- Full Text
- View/download PDF
218. Block Basis Factorization for Scalable Kernel Matrix Evaluation
- Author
-
Wang, Ruoxi, Li, Yingzhou, Mahoney, Michael W., and Darve, Eric
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,FOS: Mathematics ,Machine Learning (stat.ML) ,Mathematics - Numerical Analysis ,Numerical Analysis (math.NA) ,Machine Learning (cs.LG) - Abstract
Kernel methods are widespread in machine learning; however, they are limited by the quadratic complexity of the construction, application, and storage of kernel matrices. Low-rank matrix approximation algorithms are widely used to address this problem and reduce the arithmetic and storage cost. However, we observed that for some datasets with wide intra-class variability, the optimal kernel parameter for smaller classes yields a matrix that is less well approximated by low-rank methods. In this paper, we propose an efficient structured low-rank approximation method -- the Block Basis Factorization (BBF) -- and its fast construction algorithm to approximate radial basis function (RBF) kernel matrices. Our approach has linear memory cost and floating-point operations for many machine learning kernels. BBF works for a wide range of kernel bandwidth parameters and extends the domain of applicability of low-rank approximation methods significantly. Our empirical results demonstrate the stability and superiority over the state-of-art kernel approximation algorithms., Comment: 16 pages, 5 figures
- Published
- 2015
- Full Text
- View/download PDF
219. The Inverse Fast Multipole Method
- Author
-
Ambikasaran, Sivaram and Darve, Eric
- Subjects
FOS: Mathematics ,Computer Science - Numerical Analysis ,Numerical Analysis (math.NA) ,Mathematics - Numerical Analysis - Abstract
This article introduces a new fast direct solver for linear systems arising out of wide range of applications, integral equations, multivariate statistics, radial basis interpolation, etc., to name a few. \emph{The highlight of this new fast direct solver is that the solver scales linearly in the number of unknowns in all dimensions.} The solver, termed as Inverse Fast Multipole Method (abbreviated as IFMM), works on the same data-structure as the Fast Multipole Method (abbreviated as FMM). More generally, the solver can be immediately extended to the class of hierarchical matrices, denoted as $\mathcal{H}^2$ matrices with strong admissibility criteria (weak low-rank structure), i.e., \emph{the interaction between neighboring cluster of particles is full-rank whereas the interaction between particles corresponding to well-separated clusters can be efficiently represented as a low-rank matrix}. The algorithm departs from existing approaches in the fact that throughout the algorithm the interaction corresponding to neighboring clusters are always treated as full-rank interactions. Our approach relies on two major ideas: (i) The $N \times N$ matrix arising out of FMM (from now on termed as FMM matrix) can be represented as an extended sparser matrix of size $M \times M$, where $M \approx 3N$. (ii) While solving the larger extended sparser matrix, \emph{the fill-in's that arise in the matrix blocks corresponding to well-separated clusters are hierarchically compressed}. The ordering of the equations and the unknowns in the extended sparser matrix is strongly related to the local and multipole coefficients in the FMM~\cite{greengard1987fast} and \emph{the order of elimination is different from the usual nested dissection approach}. Numerical benchmarks on $2$D manifold confirm the linear scaling of the algorithm., 25 pages, 28 figures
- Published
- 2014
220. Task-based FMM for heterogeneous architectures
- Author
-
Agullo, Emmanuel, Bramas, Bérenger, Coulaud, Olivier, Darve, Eric, Messner, Matthias, Takahashi, Toru, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Institute for Computational and Mathematical Engineering [Stanford] (ICME), Stanford University, Department of Mechanical Engineering [Stanford], Department of Mechanical Science and Engineering, Nagoya University, Inria, plafrim, Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
heterogeneous architectures ,pipeline ,Fast multipole methods ,scheduling ,runtime system ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,graphics processing unit - Abstract
High performance \FMM is crucial for the numerical simulation of many physical problems. In a previous study~\cite{Agullo2013}, we have shown that task-based \FMM provides the flexibility required to process a wide spectrum of particle distributions efficiently on multicore architectures. In this paper, we now show how such an approach can be extended to fully exploit heterogeneous platforms. For that, we design highly tuned GPU versions of the two dominant operators (P2P and M2L) as well as a scheduling strategy that dynamically decides which proportion of subsequent tasks are processed on regular CPU cores and on GPU accelerators. We assess our method with the StarPU runtime system for executing the resulting task flow on an Intel X5650 Nehalem multicore processor possibly enhanced with one, two or three Nvidia Fermi M2070 or M2090 GPUs. A detailed experimental study on two 30 million particle distributions (a cube and an ellipsoid) shows that the resulting software consistently achieves high performance across architectures.; Développer une méthode des Multipôles Rapide (FMM) à haute performance est cruciale pour des simulations numériques dans beaucoup de problèmes physiques. Dans une étude précédente~\cite{Agullo2013}, nous avons montré que l'utilisation d'un paradigme à base de tâches fournit la flexibilité nécessaire pour traiter efficacement un large spectre de distributions de particules sur des architectures homogènes. Dans ce document, nous montrons maintenant comment une telle approche peut être étendue pour exploiter toutes les unités de calculs (CPU et GPU) des machines hétérogènes. Pour cela, nous présentons une version optimisée pour GPU des deux opérateurs dominants (P2P et M2L) de la FMM ainsi qu'une stratégie d'ordonnancement qui décide dynamiquement quelle proportion de tâches est traitée par les cœurs CPU et par des accélérateurs GPU. Nous évaluons notre méthode avec le moteur d'exécution StarPU pour exécuter le flot de tâches résultant sur un processeur Intel X5650 Nehalem augmenté avec un, deux ou trois Nvidia Fermi M2070 ou M2090. Une étude expérimentale détaillée sur deux distributions de 30 millions de particules (un cube et un ellipsoïde) montre que nous obtenons des résultats performants sur cette architecture.
- Published
- 2014
221. Computing the non-Markovian coarse-grained interactions derived from the Mori–Zwanzig formalism in molecular systems: Application to polymer melts
- Author
-
Li, Zhen, primary, Lee, Hee Sun, additional, Darve, Eric, additional, and Karniadakis, George Em, additional
- Published
- 2017
- Full Text
- View/download PDF
222. The Inverse Fast Multipole Method: Using a Fast Approximate Direct Solver as a Preconditioner for Dense Linear Systems
- Author
-
Coulier, Pieter, primary, Pouransari, Hadi, additional, and Darve, Eric, additional
- Published
- 2017
- Full Text
- View/download PDF
223. A fast, memory efficient and robust sparse preconditioner based on a multifrontal approach with applications to finite‐element matrices
- Author
-
Aminfar, AmirHossein, primary and Darve, Eric, additional
- Published
- 2016
- Full Text
- View/download PDF
224. OptiDis: A parallel Fast Multipole Dislocation Dynamics code
- Author
-
Blanchard, Pierre, Etcheverry, Arnaud, Coulaud, Olivier, Dupuy, Laurent, Bletry, Marc, Darve, Eric, Blanchard, Pierre, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Service des Recherches Métallurgiques Appliquées (SRMA), Département des Matériaux pour le Nucléaire (DMN), CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-CEA-Direction des Energies (ex-Direction de l'Energie Nucléaire) (CEA-DES (ex-DEN)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Institut de Chimie et des Matériaux Paris-Est (ICMPE), Institut de Chimie du CNRS (INC)-Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12)-Centre National de la Recherche Scientifique (CNRS), Department of Mechanical Engineering [Stanford], Stanford University, Institute for Computational and Mathematical Engineering [Stanford] (ICME), PLAFRIM, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), and Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12)-Centre National de la Recherche Scientifique (CNRS)-Institut de Chimie du CNRS (INC)
- Subjects
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,Fast Multipole Method ,Fast Fourier Transform ,[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,[PHYS.MECA.MSMECA] Physics [physics]/Mechanics [physics]/Materials and structures in mechanics [physics.class-ph] ,[INFO.INFO-MO] Computer Science [cs]/Modeling and Simulation ,[PHYS.MECA.MSMECA]Physics [physics]/Mechanics [physics]/Materials and structures in mechanics [physics.class-ph] ,[INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation ,Dislocation Dynamics Simulations - Abstract
International audience; Among all the steps involved in DD simulations, the computation of the internal elastic forces and energy are by far the most resources consuming. However, since these are long-ranged and fast decreasing interactions, hierarchical algorithms like the Fast Multipole Method (FMM) are well suited for their fast evaluation. The relatively low accuracy required for the interaction forces between dislocations brought us to develop a more efficient approximation method for the farfield. On the other hand, the nearfield interactions are still evaluated analytically, which required a rather performant implementation (AVX, GPU...). Regarding parallelism, our code benefits from a hybrid OpenMP/MPI paradigm and a cache-conscious data structure. Finally, an accurate handling of topological elements intersecting the structure of the octree was considered. The latter feature implied careful modifications of the P2M/L2P operators in order to deal with shared memory model of parallelism.
- Published
- 2014
225. Task-based FMM for multicore architectures
- Author
-
Agullo, Emmanuel, Bramas, Bérenger, Coulaud, Olivier, Darve, Eric, Messner, Matthias, Takahashi, Toru, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Mechanical Engineering [Stanford], Stanford University, Institute for Computational and Mathematical Engineering [Stanford] (ICME), Department of Mechanical Science and Engineering, Nagoya University, Equipe associée FastLA, INRIA, Plafrim, FastLA, Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
- Subjects
shared memory paradigm ,pipeline ,Fast multipole methods ,runtime system ,78M16 ,68W10 ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,multicore architectures - Abstract
Preliminary version of a paper to appear in SIAM SISC; Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the Fast Multipole Method (FMM) algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, to process the tasks on the different computing units. We carefully design the task flow, the mathematical operators, their implementations as well as scheduling schemes. Potentials and forces on 200 million particles are computed in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and good scalability is shown.; Les méthodes multipôles rapides (FMM) constituent une opération fondamentale pour la simulation de nombreux problèmes physiques. Leur mise en œuvre haute performance requiert habituellement d'optimiser attentivement l'algorithme à la fois pour la physique visée et le matériel utilisé. Dans ce papier, nous proposons une nouvelle approche qui atteint une performance élevée et portable. Notre méthode consiste à exprimer l'algorithme FMM comme un flot de tâches et d'employer un moteur d'exécution, StarPU, afin de traiter les tâches sur les différentes unités d'exécution. Nous concevons précisément le flot de tâches, les opérateurs mathématiques, leurs implémentations sur unité centrale de traitement (CPU) ainsi que les schémas d'ordonnancement. Nous calculons les potentiels et forces pour des problèmes avec 200 millions de particules en 48.7 secondes sur une machine homogène SGI Altix UV 100 comportant 160 cœurs
- Published
- 2013
226. ON THE NUMERICAL RANK OF RADIAL BASIS FUNCTION KERNELS IN HIGH DIMENSIONS.
- Author
-
RUOXI WANG, YINGZHOU LI, and DARVE, ERIC
- Subjects
RADIAL basis functions ,KERNEL functions ,DIAMETER ,SMOOTHNESS of functions ,APPROXIMATION error ,DIMENSIONS - Abstract
Low-rank approximations are popular methods to reduce the high computational cost of algorithms involving large-scale kernel matrices. The success of low-rank methods hinges on the matrix rank, and in practice, these methods are effective even for high-dimensional datasets. The practical success has elicited the theoretical analysis of the function rank in this paper, which is an upper bound of the matrix rank. The concept of function rank will be introduced to define the number of terms in the minimal separate form of a kernel function. We consider radial basis functions (RBF) in particular, and approximate the RBF kernel with a low-rank representation that is a finite sum of separate products, and provide explicit upper bounds on the function rank and the L
∞ error for such approximation. Our three main results are as follows. First, for a fixed precision, the function rank of RBFs, in the worst case, grows polynomially with the data dimension. Second, precise error bounds for the low-rank approximations in the L∞ norm are derived in terms of the function smoothness and the domain diameters. And last, a group pattern in the magnitude of singular values for RBF kernel matrices is observed and analyzed, and is explained by a grouping of the expansion terms in the kernel's low-rank representation. Empirical results verify the theoretical results. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
227. Fast algorithms for dense linear algebra
- Author
-
Darve, Eric
- Published
- 2013
- Full Text
- View/download PDF
228. Method and Advantages of Genetic Algorithms in Parameterization of Interatomic Potentials: Metal-Oxides
- Author
-
Solomon, Jose, Chung, Peter, Srivastava, Deepak, and Darve, Eric
- Subjects
Condensed Matter - Materials Science ,Materials Science (cond-mat.mtrl-sci) ,FOS: Physical sciences - Abstract
The method and the advantages of an evolutionary computing based approach using a steady state genetic algorithm (GA) for the parameterization of interatomic potentials for metal oxides within the shell model framework are developed and described. We show that the GA based methodology for the parameterization of interatomic force field functions is capable of (a) simultaneous optimization of the multiple phases or properties of a material in a single run, (b) facilitates the incremental re-optimization of the whole system as more data is available for either additional phases or material properties not included in previous runs, and (c) successful global optimization in the presence of multiple local minima in the parameter space. As an example, we apply the method towards simultaneous optimization of four distinct crystalline phases of Barium Titanate (BaTiO3 or BTO) using an ab initio density functional theory (DFT) based reference dataset. We find that the optimized force field function is capable of the prediction of the two phases not used in the optimization procedure, and that many derived physical properties such as the equilibrium lattice constants, unit cell volume, elastic properties, coefficient of thermal expansion, and average electronic polarization are in good agreement with the experimental results available from the literature., Comment: 31 pages, 16 figures
- Published
- 2013
- Full Text
- View/download PDF
229. Computing reaction rates in bio-molecular systems using discrete macro-states
- Author
-
Darve, Eric and Ryu, Ernest
- Subjects
Quantitative Biology::Biomolecules ,FOS: Biological sciences ,FOS: Mathematics ,Dynamical Systems (math.DS) ,Mathematics - Dynamical Systems ,Quantitative Biology - Quantitative Methods ,Quantitative Methods (q-bio.QM) - Abstract
Computing reaction rates in biomolecular systems is a common goal of molecular dynamics simulations. The reactions considered often involve conformational changes in the molecule, either changes in the structure of a protein or the relative position of two molecules, for example when modeling the binding of a protein and ligand. Here we will consider the general problem of computing the rate of transfer from a subset A of the conformational space Omega to a subset B of Omega. It is assumed that A and B are associated with minimum energy basins and are long-lived states. Rates can be obtained using many different methods. We review some of the most popular approaches. We organize the different approaches roughly in chronological order and under four main categories: reactive flux, transition path sampling, conformation dynamics. The fourth class of methods, to which we do not give any specific name, in some sense attempts to combine features from transition path sampling and conformation dynamics. They include non-equilibrium umbrella sampling (Warmflash et al. [2007], Dickson et al. [2009b]), and weighted ensemble dynamics (Huber and Kim [1996]).
- Published
- 2013
- Full Text
- View/download PDF
230. Chapter 16 - Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics
- Author
-
Cecka, Cris, Lew, Adrian, and Darve, Eric
- Published
- 2012
- Full Text
- View/download PDF
231. Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method
- Author
-
Messner, Matthias, Bramas, Bérenger, Coulaud, Olivier, Darve, Eric, High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Mechanical Engineering [Stanford], Stanford University, Institute for Computational and Mathematical Engineering [Stanford] (ICME), and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest
- Subjects
Fast Multipole Method ,[INFO.INFO-CE]Computer Science [cs]/Computational Engineering, Finance, and Science [cs.CE] ,Computer Science::Mathematical Software ,black-box method ,oscillatory kernels ,asymptotically smooth kernels ,Chebyshev interpolation ,Computer Science::Numerical Analysis ,[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA] ,[INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS] - Abstract
A fast multipole method (FMM) for asymptotically smooth kernel functions (1/r, 1/r^4, Gauss and Stokes kernels, radial basis functions, etc.) based on a Chebyshev interpolation scheme has been introduced in [Fong et al., 2009]. The method has been extended to oscillatory kernels (e.g., Helmholtz kernel) in [Messner et al., 2012]. Beside its generality this FMM turns out to be favorable due to its easy implementation and its high performance based on intensive use of highly optimized BLAS libraries. However, one of its bottlenecks is the precomputation of the multiple-to-local (M2L) operator, and its higher number of floating point operations (flops) compared to other FMM formulations. Here, we present several optimizations for that operator, which is known to be the costliest FMM operator. The most efficient ones do not only reduce the precomputation time by a factor up to 340 but they also speed up the matrix-vector product. We conclude with comparisons and numerical validations of all presented optimizations.
- Published
- 2012
232. Pipelining the Fast Multipole Method over a Runtime System
- Author
-
Agullo, Emmanuel, Bramas, Bérenger, Coulaud, Olivier, Darve, Eric, Messner, Matthias, Takahashi, Toru, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Mechanical Engineering [Stanford], Stanford University, Institute for Computational and Mathematical Engineering [Stanford] (ICME), Department of Mechanical Science and Engineering, Nagoya University, Equipe associée FastLA, INRIA, PlaFRIM, FastLA, Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, FastLA-PLAFRIM, HiePACS, and Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,heterogeneous architectures ,Computer Science - Distributed, Parallel, and Cluster Computing ,pipeline ,Fast multipole methods graphics processing unit ,FMM ,Distributed, Parallel, and Cluster Computing (cs.DC) ,runtime system ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] - Abstract
Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs., No. RR-7981 (2012)
- Published
- 2012
233. Extension and optimization of the FIND algorithm: computing Green's and less-than Green's functions (with technical appendix)
- Author
-
Li, Song and Darve, Eric
- Subjects
FOS: Mathematics ,Mathematics - Numerical Analysis ,Numerical Analysis (math.NA) - Abstract
The FIND algorithm is a fast algorithm designed to calculate certain entries of the inverse of a sparse matrix. Such calculation is critical in many applications, e.g., quantum transport in nano-devices. We extended the algorithm to other matrix inverse related calculations. Those are required for example to calculate the less-than Green's function and the current density through the device. For a 2D device discretized as an N_x x N_y mesh, the best known algorithms have a running time of O(N_x^3 N_y), whereas FIND only requires O(N_x^2 N_y). Even though this complexity has been reduced by an order of magnitude, the matrix inverse calculation is still the most time consuming part in the simulation of transport problems. We could not reduce the order of complexity, but we were able to significantly reduce the constant factor involved in the computation cost. By exploiting the sparsity and symmetry, the size of the problem beyond which FIND is faster than other methods typically decreases from a 130x130 2D mesh down to a 40x40 mesh. These improvements make the optimized FIND algorithm even more competitive for real-life applications.
- Published
- 2011
- Full Text
- View/download PDF
234. A fast block low-rank dense solver with applications to finite-element matrices
- Author
-
Aminfar, AmirHossein, primary, Ambikasaran, Sivaram, additional, and Darve, Eric, additional
- Published
- 2016
- Full Text
- View/download PDF
235. Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation
- Author
-
Pouransari, Hadi, Coulier, Pieter, Darve, Eric, Pouransari, Hadi, Coulier, Pieter, and Darve, Eric
- Abstract
Inversion of sparse matrices with standard direct solve schemes is robust, but computationally expensive. Iterative solvers, on the other hand, demonstrate better scalability; but, need to be used with an appropriate preconditioner (e.g., ILU, AMG, Gauss-Seidel, etc.) for proper convergence. The choice of an effective preconditioner is highly problem dependent. We propose a novel fully algebraic sparse matrix solve algorithm, which has linear complexity with the problem size. Our scheme is based on the Gauss elimination. For a given matrix, we approximate the LU factorization with a tunable accuracy determined a priori. This method can be used as a stand-alone direct solver with linear complexity and tunable accuracy, or it can be used as a black-box preconditioner in conjunction with iterative methods such as GMRES. The proposed solver is based on the low-rank approximation of fill-ins generated during the elimination. Similar to H-matrices, fill-ins corresponding to blocks that are well-separated in the adjacency graph are represented via a hierarchical structure. The linear complexity of the algorithm is guaranteed if the blocks corresponding to well-separated clusters of variables are numerically low-rank.
- Published
- 2015
- Full Text
- View/download PDF
236. The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems
- Author
-
Coulier, Pieter, Pouransari, Hadi, Darve, Eric, Coulier, Pieter, Pouransari, Hadi, and Darve, Eric
- Abstract
Although some preconditioners are available for solving dense linear systems, there are still many matrices for which preconditioners are lacking, in particular in cases where the size of the matrix $N$ becomes very large. There remains hence a great need to develop general purpose preconditioners whose cost scales well with the matrix size $N$. In this paper, we propose a preconditioner with broad applicability and with cost $\mathcal{O}(N)$ for dense matrices, when the matrix is given by a smooth kernel. Extending the method using the same framework to general $\mathcal{H}^2$-matrices is relatively straightforward. These preconditioners have a controlled accuracy (machine accuracy can be achieved if needed) and scale linearly with $N$. They are based on an approximate direct solve of the system. The linear scaling of the algorithm is achieved by means of two key ideas. First, the $\mathcal{H}^2$-structure of the dense matrix is exploited to obtain an extended sparse system of equations. Second, fill-ins arising when performing the elimination are compressed as low-rank matrices if they correspond to well-separated interactions. This ensures that the sparsity pattern of the extended sparse matrix is preserved throughout the elimination, hence resulting in a very efficient algorithm with $\mathcal{O}(N \log(1/\varepsilon)^2 )$ computational cost and $\mathcal{O}(N \log 1/\varepsilon )$ memory requirement, for an error tolerance $0 < \varepsilon < 1$. The solver is inexact, although the error can be controlled and made as small as needed. These solvers are related to ILU in the sense that the fill-in is controlled. However, in ILU, most of the fill-in is simply discarded whereas here it is approximated using low-rank blocks, with a prescribed tolerance. Numerical examples are discussed to demonstrate the linear scaling of the method and to illustrate its effectiveness as a preconditioner., Comment: Revised version Submitted to the SIAM Journal on Scientific Computing. 35 pages, 29 figures
- Published
- 2015
237. Task‐based FMM for heterogeneous architectures
- Author
-
Agullo, Emmanuel, primary, Bramas, Berenger, additional, Coulaud, Olivier, additional, Darve, Eric, additional, Messner, Matthias, additional, and Takahashi, Toru, additional
- Published
- 2015
- Full Text
- View/download PDF
238. The compressed state Kalman filter for nonlinear state estimation: Application to large‐scale reservoir monitoring
- Author
-
Li, Judith Yue, primary, Kokkinaki, Amalia, additional, Ghorbanidehno, Hojat, additional, Darve, Eric F., additional, and Kitanidis, Peter K., additional
- Published
- 2015
- Full Text
- View/download PDF
239. Real-time data assimilation for large-scale systems: The spectral Kalman filter
- Author
-
Ghorbanidehno, Hojat, primary, Kokkinaki, Amalia, additional, Li, Judith Yue, additional, Darve, Eric, additional, and Kitanidis, Peter K., additional
- Published
- 2015
- Full Text
- View/download PDF
240. A comparison of weighted ensemble and Markov state model methodologies
- Author
-
Feng, Haoyun, primary, Costaouec, Ronan, additional, Darve, Eric, additional, and Izaguirre, Jesús A., additional
- Published
- 2015
- Full Text
- View/download PDF
241. N-Body Simulations on GPUs
- Author
-
Elsen, Erich, Vishal, V., Houston, Mike, Pande, Vijay, Hanrahan, Pat, and Darve, Eric
- Subjects
Computational Engineering, Finance, and Science (cs.CE) ,FOS: Computer and information sciences ,Computer Science - Distributed, Parallel, and Cluster Computing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Computer Science - Computational Engineering, Finance, and Science - Abstract
Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In some of the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs is comparable to specialized processors such as GRAPE-6A and MDGRAPE-3, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@Home.
- Published
- 2007
242. An efficient preconditioner for the fast simulation of a 2D stokes flow in porous media.
- Author
-
Quaife, Bryan, Coulier, Pieter, and Darve, Eric
- Subjects
STOKES flow ,POROUS materials ,BOUNDARY element methods ,PROBLEM solving ,ITERATIVE methods (Mathematics) - Abstract
We consider an efficient preconditioner for a boundary integral equation (BIE) formulation of the two-dimensional Stokes equations in porous media. While BIEs are well-suited for resolving the complex porous geometry, they lead to a dense linear system of equations that is computationally expensive to solve for large problems. This expense is further amplified when a significant number of iterations is required in an iterative Krylov solver such as generalized minimial residual method (GMRES). In this paper, we apply a fast inexact direct solver, the inverse fast multipole method, as an efficient preconditioner for GMRES. This solver is based on the framework of [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
243. Smoothing-based compressed state Kalman filter for joint state-parameter estimation: Applications in reservoir characterization and CO2 storage monitoring.
- Author
-
Li, Y. J., Kokkinaki, Amalia, Darve, Eric F., and Kitanidis, Peter K.
- Subjects
KALMAN filtering ,STOCHASTIC systems ,CARBON dioxide - Abstract
The operation of most engineered hydrogeological systems relies on simulating physical processes using numerical models with uncertain parameters and initial conditions. Predictions by such uncertain models can be greatly improved by Kalman-filter techniques that sequentially assimilate monitoring data. Each assimilation constitutes a nonlinear optimization, which is solved by linearizing an objective function about the model prediction and applying a linear correction to this prediction. However, if model parameters and initial conditions are uncertain, the optimization problem becomes strongly nonlinear and a linear correction may yield unphysical results. In this paper, we investigate the utility of one-step ahead smoothing, a variant of the traditional filtering process, to eliminate nonphysical results and reduce estimation artifacts caused by nonlinearities. We present the smoothing-based compressed state Kalman filter (sCSKF), an algorithm that combines one step ahead smoothing, in which current observations are used to correct the state and parameters one step back in time, with a nonensemble covariance compression scheme, that reduces the computational cost by efficiently exploring the high-dimensional state and parameter space. Numerical experiments show that when model parameters are uncertain and the states exhibit hyperbolic behavior with sharp fronts, as in CO
2 storage applications, one-step ahead smoothing reduces overshooting errors and, by design, gives physically consistent state and parameter estimates. We compared sCSKF with commonly used data assimilation methods and showed that for the same computational cost, combining one step ahead smoothing and nonensemble compression is advantageous for real-time characterization and monitoring of large-scale hydrogeological systems with sharp moving fronts. [ABSTRACT FROM AUTHOR]- Published
- 2017
- Full Text
- View/download PDF
244. FAST HIERARCHICAL SOLVERS FOR SPARSE MATRICES USING EXTENDED SPARSIFICATION AND LOW-RANK APPROXIMATION.
- Author
-
POURANSARI, HADI, COULIER, PIETER, and DARVE, ERIC
- Subjects
SPARSE matrices ,LU factorization ,GENERALIZED minimal residual method - Abstract
Inversion of sparse matrices with standard direct solve schemes is robust but computationally expensive. Iterative solvers, on the other hand, demonstrate better scalability but need to be used with an appropriate preconditioner (e.g., ILU, AMG, Gauss-Seidel) for proper convergence. The choice of an effective preconditioner is highly problem dependent. We propose a novel fully algebraic sparse matrix solve algorithm. The computational complexity is linear under the assumption that fill-in blocks have bounded rank. Our scheme is based on the Gauss elimination. For a given matrix, we approximate the LU factorization with a tunable accuracy determined a priori. This method can be used as a stand-alone direct solver, or it can be used as a black-box preconditioner in conjunction with iterative methods such as GMRES. The proposed solver is based on the low-rank approximation of fill-ins generated during the elimination. Similar to H-matrices, fill-ins corresponding to blocks that are well-separated in the adjacency graph are represented via a hierarchical structure. The linear complexity of the algorithm is guaranteed if the blocks corresponding to well-separated clusters of variables are numerically low-rank. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
245. Optimizing the Adaptive Fast Multipole Method for Fractal Sets
- Author
-
Pouransari, Hadi, primary and Darve, Eric, additional
- Published
- 2015
- Full Text
- View/download PDF
246. A Fast and Memory Efficient Sparse Solver with Applications to Finite-Element Matrices
- Author
-
Aminfar, AmirHossein, Darve, Eric, Aminfar, AmirHossein, and Darve, Eric
- Abstract
In this article, we introduce a fast and memory efficient solver for sparse matrices arising from the finite element discretization of elliptic partial differential equations (PDEs). We use a fast direct (but approximate) multifrontal solver as a preconditioner, and use an iterative solver to achieve a desired accuracy. This approach combines the advantages of direct and iterative schemes to arrive at a fast, robust and accurate solver. We will show that this solver is faster ($\sim$ 2x) and more memory efficient ($\sim$ 2--3x) than a conventional direct multifrontal solver. Furthermore, we will demonstrate that the solver is both a faster and more effective preconditioner than other preconditioners such as the incomplete LU preconditioner. Specific speed-ups depend on the matrix size and improve as the size of the matrix increases. The solver can be applied to both structured and unstructured meshes in a similar manner. We build on our previous work and utilize the fact that dense frontal and update matrices, in the multifrontal algorithm, can be represented as hierarchically off-diagonal low-rank (HODLR) matrices. Using this idea, we replace all large dense matrix operations in the multifrontal elimination process with $O(N)$ HODLR operations to arrive at a faster and more memory efficient solver., Comment: 25 pages
- Published
- 2014
247. A Kalman filter powered by $\mathcal{H}^2$-matrices for quasi-continuous data assimilation problems
- Author
-
Li, Judith Y., Ambikasaran, Sivaram, Darve, Eric F., Kitanidis, Peter K., Li, Judith Y., Ambikasaran, Sivaram, Darve, Eric F., and Kitanidis, Peter K.
- Abstract
Continuously tracking the movement of a fluid or a plume in the subsurface is a challenge that is often encountered in applications, such as tracking a plume of injected CO$_2$ or of a hazardous substance. Advances in monitoring techniques have made it possible to collect measurements at a high frequency while the plume moves, which has the potential advantage of providing continuous high-resolution images of fluid flow with the aid of data processing. However, the applicability of this approach is limited by the high computational cost associated with having to analyze large data sets within the time constraints imposed by real-time monitoring. Existing data assimilation methods have computational requirements that increase super-linearly with the size of the unknowns $m$. In this paper, we present the HiKF, a new Kalman filter (KF) variant powered by the hierarchical matrix approach that dramatically reduces the computational and storage cost of the standard KF from $\mathcal{O}(m^2)$ to $\mathcal{O}(m)$, while producing practically the same results. The version of HiKF that is presented here takes advantage of the so-called random walk dynamical model, which is tailored to a class of data assimilation problems in which measurements are collected quasi-continuously. The proposed method has been applied to a realistic CO$_2$ injection model and compared with the ensemble Kalman filter (EnKF). Numerical results show that HiKF can provide estimates that are more accurate than EnKF, and also demonstrate the usefulness of modeling the system dynamics as a random walk in this context., Comment: 18 pages, 7 figures. Water Resources Research, 2014
- Published
- 2014
- Full Text
- View/download PDF
248. A Fast Block Low-Rank Dense Solver with Applications to Finite-Element Matrices
- Author
-
Aminfar, Amirhossein, Ambikasaran, Sivaram, Darve, Eric, Aminfar, Amirhossein, Ambikasaran, Sivaram, and Darve, Eric
- Abstract
This article presents a fast solver for the dense "frontal" matrices that arise from the multifrontal sparse elimination process of 3D elliptic PDEs. The solver relies on the fact that these matrices can be efficiently represented as a hierarchically off-diagonal low-rank (HODLR) matrix. To construct the low-rank approximation of the off-diagonal blocks, we propose a new pseudo-skeleton scheme, the boundary distance low-rank approximation, that picks rows and columns based on the location of their corresponding vertices in the sparse matrix graph. We compare this new low-rank approximation method to the adaptive cross approximation (ACA) algorithm and show that it achieves betters speedup specially for unstructured meshes. Using the HODLR direct solver as a preconditioner (with a low tolerance) to the GMRES iterative scheme, we can reach machine accuracy much faster than a conventional LU solver. Numerical benchmarks are provided for frontal matrices arising from 3D finite element problems corresponding to a wide range of applications.
- Published
- 2014
- Full Text
- View/download PDF
249. Numerical Methods for Calculating the Potential of Mean Force
- Author
-
Darve, Eric, primary
- Full Text
- View/download PDF
250. AWE-WQ: Fast-Forwarding Molecular Dynamics Using the Accelerated Weighted Ensemble
- Author
-
Abdul-Wahid, Badi’, primary, Feng, Haoyun, additional, Rajan, Dinesh, additional, Costaouec, Ronan, additional, Darve, Eric, additional, Thain, Douglas, additional, and Izaguirre, Jesús A., additional
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.