60 results on '"Dense matrices"'
Search Results
2. Accelerating Revised Simplex Method Using GPU-Based Basis Update
- Author
-
Usman Ali Shah, Suhail Yousaf, Iftikhar Ahmad, Safi Ur Rehman, and Muhammad Ovais Ahmad
- Subjects
Dense matrices ,GPU ,GPGPU ,linear programming ,revised simplex method ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Optimization problems lie at the core of scientific and engineering endeavors. Solutions to these problems are often compute-intensive. To fulfill their compute-resource requirements, graphics processing unit (GPU) technology is considered a great opportunity. To this end, we focus on linear programming (LP) problem solving on GPUs using revised simplex method (RSM). This method has potentially GPU-friendly tasks, when applied to large dense problems. Basis update (BU) is one such task, which is performed in every iteration to update a matrix called basis-inverse matrix. The contribution of this paper is two-fold. Firstly, we experimentally analyzed the performance of existing GPU-based BU techniques. We discovered that the performance of a relatively old technique, in which each GPU thread computed one element of the basis-inverse matrix, could be significantly improved by introducing a vector-copy operation to its implementation with a sophisticated programming framework. Second, we extended the adapted element-wise technique to develop a new BU technique by using three inexpensive vector operations. This allowed us to reduce the number of floating-point operations and conditional processing performed by GPU threads. A comparison of BU techniques implemented in double precision showed that our proposed technique achieved 17.4% and 13.3% average speed-up over its closest competitor for randomly generated and well-known sets of problems, respectively. Furthermore, the new technique successfully updated basis-inverse matrix in relatively large problems, which the competitor was unable to update. These results strongly indicate that our proposed BU technique is not only efficient for dense RSM implementations but is also scalable.
- Published
- 2020
- Full Text
- View/download PDF
3. Estudio de la eficiencia del método de integración directa mediante diferencias centrales (DIMCD).
- Author
-
Urkullu, Gorka, Fernández-de-Bustos, Igor, Olabarrieta, Ander, and Ansola, Rubén
- Subjects
SPARSE matrices ,MULTIBODY systems ,MATRICES (Mathematics) ,EQUATIONS ,QUATERNIONS - Abstract
Copyright of DYNA - Ingeniería e Industria is the property of Publicaciones Dyna SL and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
4. Targeting the Microtubule-Network Rescues CTL Killing Efficiency in Dense 3D Matrices.
- Author
-
Zhao, Renping, Zhou, Xiangda, Khan, Essak S., Alansary, Dalia, Friedmann, Kim S., Yang, Wenjuan, Schwarz, Eva C., del Campo, Aránzazu, Hoth, Markus, and Qu, Bin
- Subjects
CYTOTOXIC T cells ,EXTRACELLULAR matrix ,PROTEIN expression - Abstract
Efficacy of cytotoxic T lymphocyte (CTL)-based immunotherapy is still unsatisfactory against solid tumors, which are frequently characterized by condensed extracellular matrix. Here, using a unique 3D killing assay, we identify that the killing efficiency of primary human CTLs is substantially impaired in dense collagen matrices. Although the expression of cytotoxic proteins in CTLs remained intact in dense collagen, CTL motility was largely compromised. Using light-sheet microscopy, we found that persistence and velocity of CTL migration was influenced by the stiffness and porosity of the 3D matrix. Notably, 3D CTL velocity was strongly correlated with their nuclear deformability, which was enhanced by disruption of the microtubule network especially in dense matrices. Concomitantly, CTL migration, search efficiency, and killing efficiency in dense collagen were significantly increased in microtubule-perturbed CTLs. In addition, the chemotherapeutically used microtubule inhibitor vinblastine drastically enhanced CTL killing efficiency in dense collagen. Together, our findings suggest targeting the microtubule network as a promising strategy to enhance efficacy of CTL-based immunotherapy against solid tumors, especially stiff solid tumors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
5. Targeting the Microtubule-Network Rescues CTL Killing Efficiency in Dense 3D Matrices
- Author
-
Renping Zhao, Xiangda Zhou, Essak S. Khan, Dalia Alansary, Kim S. Friedmann, Wenjuan Yang, Eva C. Schwarz, Aránzazu del Campo, Markus Hoth, and Bin Qu
- Subjects
CTLs ,collagen ,dense matrices ,microtubules ,migration ,nuclear deformation ,Immunologic diseases. Allergy ,RC581-607 - Abstract
Efficacy of cytotoxic T lymphocyte (CTL)-based immunotherapy is still unsatisfactory against solid tumors, which are frequently characterized by condensed extracellular matrix. Here, using a unique 3D killing assay, we identify that the killing efficiency of primary human CTLs is substantially impaired in dense collagen matrices. Although the expression of cytotoxic proteins in CTLs remained intact in dense collagen, CTL motility was largely compromised. Using light-sheet microscopy, we found that persistence and velocity of CTL migration was influenced by the stiffness and porosity of the 3D matrix. Notably, 3D CTL velocity was strongly correlated with their nuclear deformability, which was enhanced by disruption of the microtubule network especially in dense matrices. Concomitantly, CTL migration, search efficiency, and killing efficiency in dense collagen were significantly increased in microtubule-perturbed CTLs. In addition, the chemotherapeutically used microtubule inhibitor vinblastine drastically enhanced CTL killing efficiency in dense collagen. Together, our findings suggest targeting the microtubule network as a promising strategy to enhance efficacy of CTL-based immunotherapy against solid tumors, especially stiff solid tumors.
- Published
- 2021
- Full Text
- View/download PDF
6. Accuracy Directly Controlled Fast Direct Solution of General \mathcal H^2 -Matrices and Its Application to Solving Electrodynamic Volume Integral Equations.
- Author
-
Ma, Miaomiao and Jiao, Dan
- Subjects
- *
ELECTRODYNAMICS , *INTEGRAL equations , *MATRIX mechanics , *MAXWELL equations , *FACTORIZATION - Abstract
The dense matrix resulting from an integral equation (IE)-based solution of Maxwell’s equations can be compactly represented by an \mathcal H^2 -matrix. Given a general dense \mathcal H^2 -matrix, prevailing fast direct solutions involve approximations whose accuracy can only be indirectly controlled. In this paper, we propose new direct solution algorithms whose accuracy is directly controlled, including both factorization and inversion, for solving general \mathcal H^2 -matrices. Different from the recursive inverse performed in existing \mathcal H^2 -based direct solutions, this new direct solution is a one-way traversal of the cluster tree from the leaf level all the way up to the root level. The underlying multiplications and additions are carried out as they are without using formatted multiplications and additions whose accuracy cannot be directly controlled. The cluster bases and their rank of the original matrix are also updated level by level based on prescribed accuracy, without increasing computational complexity, to take into account the contributions of fill-ins generated during the direct solution procedure. For constant-rank \mathcal H^2 -matrices, the proposed direct solution has a strict O(N) complexity in both time and memory. For rank that linearly grows with the electrical size, the complexity of the proposed direct solution is O(N\text {log}N) in factorization and inversion time, and O(N)$ in solution time and memory for solving volume IEs (VIEs). Rapid direct solutions of electrodynamic VIEs involving millions of unknowns have been obtained on a single CPU core with directly controlled accuracy. Comparisons with state-of-the-art \mathcal H^2 -based direct VIE solvers have also demonstrated the advantages of the proposed direct solution in accuracy control, as well as achieving better accuracy with much less CPU time. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
7. Fast 3D FEM-BEM coupling for dynamic soil-structure interaction.
- Author
-
Schepers, Winfried
- Subjects
SOIL structure ,FINITE element method ,BOUNDARY element methods ,LINEAR equations ,SPARSE matrices - Abstract
We propose a fast method for solving soil-structure interaction problems in frequency domain. Finite Elements are applied to discretize the structure, while boundary elements are applied to discretize the interface between the structure and the unbounded horizontally layered soil. Some mild restrictions imposed on the boundary element mesh allow for storing the fully populated N × N soil flexibility matrix in only O ( N ) storage and in negligible time. For solving the soil-structure interaction problem a hybrid linear equation system is solved with direct and indirect solvers. Though our current implementation of the method in a commercial FE code does not allow for fully exploiting the superior properties of the proposed method, it is shown that iterative solvers are superior to direct solvers for the solution of the soil-structure interaction problems investigated here. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
8. Targeting the Microtubule-Network Rescues CTL Killing Efficiency in Dense 3D Matrices
- Author
-
Essak S. Khan, Kim S. Friedmann, Xiangda Zhou, Aránzazu del Campo, Bin Qu, Eva C. Schwarz, Wenjuan Yang, Dalia Alansary, Markus Hoth, and Renping Zhao
- Subjects
collagen ,Cytotoxicity, Immunologic ,dense matrices ,nuclear deformation ,medicine.medical_treatment ,Immunology ,Motility ,chemical and pharmacologic phenomena ,migration ,Vinblastine ,Immunotherapy, Adoptive ,Microtubules ,Collagen Type I ,Extracellular matrix ,CTLs ,Microtubule ,Cell Movement ,Cell Line, Tumor ,Neoplasms ,medicine ,Immunology and Allergy ,Cytotoxic T cell ,Humans ,Cytotoxicity ,Original Research ,3D killing ,Chemistry ,hemic and immune systems ,Hydrogels ,Immunotherapy ,RC581-607 ,Coculture Techniques ,Elasticity ,Tubulin Modulators ,Cell biology ,Extracellular Matrix ,CTL ,Immunologic diseases. Allergy ,Porosity ,medicine.drug ,T-Lymphocytes, Cytotoxic - Abstract
Efficacy of cytotoxic T lymphocyte (CTL)-based immunotherapy is still unsatisfactory against solid tumors, which are frequently characterized by condensed extracellular matrix. Here, using a unique 3D killing assay, we identify that the killing efficiency of primary human CTLs is substantially impaired in dense collagen matrices. Although the expression of cytotoxic proteins in CTLs remained intact in dense collagen, CTL motility was largely compromised. Using light-sheet microscopy, we found that persistence and velocity of CTL migration was influenced by the stiffness and porosity of the 3D matrix. Notably, 3D CTL velocity was strongly correlated with their nuclear deformability, which was enhanced by disruption of the microtubule network especially in dense matrices. Concomitantly, CTL migration, search efficiency, and killing efficiency in dense collagen were significantly increased in microtubule-perturbed CTLs. In addition, the chemotherapeutically used microtubule inhibitor vinblastine drastically enhanced CTL killing efficiency in dense collagen. Together, our findings suggest targeting the microtubule network as a promising strategy to enhance efficacy of CTL-based immunotherapy against solid tumors, especially stiff solid tumors.
- Published
- 2021
9. A multilevel H2-based preconditioner for the electric field integral equation
- Author
-
Ventre, S., Carpentieri, B., Scalera, V., Karaosmanoglu, B., Giovinco, G., Rubinacci, G., Tamburrino, A., Villone, F., Ventre, S., Carpentieri, B., Scalera, V., Karaosmanoglu, B., Giovinco, G., Rubinacci, G., Tamburrino, A., and Villone, F.
- Subjects
dense matrices ,preconditioning ,boundary element method ,H-2-matrix arithmetic ,LU factorization ,Boundary element method ,Dense matrices ,H2-matrix arithmetic ,Preconditioning - Published
- 2021
10. Accelerating Revised Simplex Method using GPU-based Basis Update
- Abstract
Optimization problems lie at the core of scientific and engineering endeavors. Solutions to these problems are often compute-intensive. To fulfill their compute-resource requirements, graphics processing unit (GPU) technology is considered a great opportunity. To this end, we focus on linear programming (LP) problem solving on GPUs using revised simplex method (RSM). This method has potentially GPU-friendly tasks, when applied to large dense problems. Basis update (BU) is one such task, which is performed in every iteration to update a matrix called basis-inverse matrix. The contribution of this paper is two-fold. Firstly, we experimentally analyzed the performance of existing GPU-based BU techniques. We discovered that the performance of a relatively old technique, in which each GPU thread computed one element of the basis-inverse matrix, could be significantly improved by introducing a vectorcopy operation to its implementation with a sophisticated programming framework. Second, we extended the adapted element-wise technique to develop a new BU technique by using three inexpensive vector operations. This allowed us to reduce the number of floating-point operations and conditional processing performed by GPU threads. A comparison of BU techniques implemented in double precision showed that our proposed technique achieved 17.4% and 13.3% average speed-up over its closest competitor for randomly generated and well-known sets of problems, respectively. Furthermore, the new technique successfully updated basisinverse matrix in relatively large problems, which the competitor was unable to update. These results strongly indicate that our proposed BU technique is not only efficient for dense RSM implementations but is also scalable.
- Published
- 2020
- Full Text
- View/download PDF
11. Accelerating Revised Simplex Method using GPU-based Basis Update
- Abstract
Optimization problems lie at the core of scientific and engineering endeavors. Solutions to these problems are often compute-intensive. To fulfill their compute-resource requirements, graphics processing unit (GPU) technology is considered a great opportunity. To this end, we focus on linear programming (LP) problem solving on GPUs using revised simplex method (RSM). This method has potentially GPU-friendly tasks, when applied to large dense problems. Basis update (BU) is one such task, which is performed in every iteration to update a matrix called basis-inverse matrix. The contribution of this paper is two-fold. Firstly, we experimentally analyzed the performance of existing GPU-based BU techniques. We discovered that the performance of a relatively old technique, in which each GPU thread computed one element of the basis-inverse matrix, could be significantly improved by introducing a vectorcopy operation to its implementation with a sophisticated programming framework. Second, we extended the adapted element-wise technique to develop a new BU technique by using three inexpensive vector operations. This allowed us to reduce the number of floating-point operations and conditional processing performed by GPU threads. A comparison of BU techniques implemented in double precision showed that our proposed technique achieved 17.4% and 13.3% average speed-up over its closest competitor for randomly generated and well-known sets of problems, respectively. Furthermore, the new technique successfully updated basisinverse matrix in relatively large problems, which the competitor was unable to update. These results strongly indicate that our proposed BU technique is not only efficient for dense RSM implementations but is also scalable.
- Published
- 2020
- Full Text
- View/download PDF
12. Accelerating Revised Simplex Method using GPU-based Basis Update
- Abstract
Optimization problems lie at the core of scientific and engineering endeavors. Solutions to these problems are often compute-intensive. To fulfill their compute-resource requirements, graphics processing unit (GPU) technology is considered a great opportunity. To this end, we focus on linear programming (LP) problem solving on GPUs using revised simplex method (RSM). This method has potentially GPU-friendly tasks, when applied to large dense problems. Basis update (BU) is one such task, which is performed in every iteration to update a matrix called basis-inverse matrix. The contribution of this paper is two-fold. Firstly, we experimentally analyzed the performance of existing GPU-based BU techniques. We discovered that the performance of a relatively old technique, in which each GPU thread computed one element of the basis-inverse matrix, could be significantly improved by introducing a vectorcopy operation to its implementation with a sophisticated programming framework. Second, we extended the adapted element-wise technique to develop a new BU technique by using three inexpensive vector operations. This allowed us to reduce the number of floating-point operations and conditional processing performed by GPU threads. A comparison of BU techniques implemented in double precision showed that our proposed technique achieved 17.4% and 13.3% average speed-up over its closest competitor for randomly generated and well-known sets of problems, respectively. Furthermore, the new technique successfully updated basisinverse matrix in relatively large problems, which the competitor was unable to update. These results strongly indicate that our proposed BU technique is not only efficient for dense RSM implementations but is also scalable.
- Published
- 2020
- Full Text
- View/download PDF
13. Accelerating Revised Simplex Method using GPU-based Basis Update
- Abstract
Optimization problems lie at the core of scientific and engineering endeavors. Solutions to these problems are often compute-intensive. To fulfill their compute-resource requirements, graphics processing unit (GPU) technology is considered a great opportunity. To this end, we focus on linear programming (LP) problem solving on GPUs using revised simplex method (RSM). This method has potentially GPU-friendly tasks, when applied to large dense problems. Basis update (BU) is one such task, which is performed in every iteration to update a matrix called basis-inverse matrix. The contribution of this paper is two-fold. Firstly, we experimentally analyzed the performance of existing GPU-based BU techniques. We discovered that the performance of a relatively old technique, in which each GPU thread computed one element of the basis-inverse matrix, could be significantly improved by introducing a vectorcopy operation to its implementation with a sophisticated programming framework. Second, we extended the adapted element-wise technique to develop a new BU technique by using three inexpensive vector operations. This allowed us to reduce the number of floating-point operations and conditional processing performed by GPU threads. A comparison of BU techniques implemented in double precision showed that our proposed technique achieved 17.4% and 13.3% average speed-up over its closest competitor for randomly generated and well-known sets of problems, respectively. Furthermore, the new technique successfully updated basisinverse matrix in relatively large problems, which the competitor was unable to update. These results strongly indicate that our proposed BU technique is not only efficient for dense RSM implementations but is also scalable.
- Published
- 2020
- Full Text
- View/download PDF
14. Accelerating Revised Simplex Method using GPU-based Basis Update
- Abstract
Optimization problems lie at the core of scientific and engineering endeavors. Solutions to these problems are often compute-intensive. To fulfill their compute-resource requirements, graphics processing unit (GPU) technology is considered a great opportunity. To this end, we focus on linear programming (LP) problem solving on GPUs using revised simplex method (RSM). This method has potentially GPU-friendly tasks, when applied to large dense problems. Basis update (BU) is one such task, which is performed in every iteration to update a matrix called basis-inverse matrix. The contribution of this paper is two-fold. Firstly, we experimentally analyzed the performance of existing GPU-based BU techniques. We discovered that the performance of a relatively old technique, in which each GPU thread computed one element of the basis-inverse matrix, could be significantly improved by introducing a vectorcopy operation to its implementation with a sophisticated programming framework. Second, we extended the adapted element-wise technique to develop a new BU technique by using three inexpensive vector operations. This allowed us to reduce the number of floating-point operations and conditional processing performed by GPU threads. A comparison of BU techniques implemented in double precision showed that our proposed technique achieved 17.4% and 13.3% average speed-up over its closest competitor for randomly generated and well-known sets of problems, respectively. Furthermore, the new technique successfully updated basisinverse matrix in relatively large problems, which the competitor was unable to update. These results strongly indicate that our proposed BU technique is not only efficient for dense RSM implementations but is also scalable.
- Published
- 2020
- Full Text
- View/download PDF
15. AN EFFICIENT COLLOCATION METHOD FOR A NON-LOCAL DIFFUSION MODEL.
- Author
-
HAO TIAN, HONG WANG, and WENQIA WANG
- Subjects
- *
COLLOCATION methods , *NUMERICAL solutions to differential equations , *EIGENFUNCTIONS , *NUMERICAL solutions to integral equations , *SOLID mechanics - Abstract
The non-local diffusion model provides an appropriate description of the deformation of a continuous body involving discontinuities or other singularities, which cannot be described properly by classical theory of solid mechanics. However, because the non-local nature of the non-local diffusion operator, the numerical methods for non-local diffusion model generate dense or even full stiffness matrices. A direct solver typically requires O(N3) of operations and O(N2) of memory where N is the number of unknowns. We develop a fast collocation method for the non-local diffusion model which has the following features: (i) It reduces the computational cost from O(N3) to O(N log2 N) and memory requirement from O(N2) to O(N). (ii) It requires only one-fold integration in the evaluation of the stiffness matrix. Numerical experiments show the utility of the method. [ABSTRACT FROM AUTHOR]
- Published
- 2013
16. A fast Galerkin method with efficient matrix assembly and storage for a peridynamic model
- Author
-
Wang, Hong and Tian, Hao
- Subjects
- *
MATRICES (Mathematics) , *GALERKIN methods , *DYNAMIC models , *DEFORMATIONS (Mechanics) , *STIFFNESS (Mechanics) , *NUMERICAL analysis , *MATHEMATICAL singularities - Abstract
Abstract: Peridynamic theory provides an appropriate description of the deformation of a continuous body involving discontinuities or other singularities, which cannot be described properly by classical theory of solid mechanics. However, the operators in the peridynamic models are nonlocal, so the resulting numerical methods generate dense or full stiffness matrices. Gaussian types of direct solvers were traditionally used to solve these problems, which requires of operations and of memory where N is the number of spatial nodes. This imposes significant computational and memory challenge for a peridynamic model, especially for problems in multiple space dimensions. A simplified model, which assumes that the horizon of the material , was proposed to reduce the computational cost and memory requirement to . However, the drawback is that the corresponding error estimate becomes one-order suboptimal. Furthermore, the assumption of does not seem to be physically reasonable since the horizon represents a physical property of the material that should not depend on computational mesh size. We develop a fast Galerkin method for the (non-simplified) peridynamic model by exploiting the structure of the stiffness matrix. The new method reduces the computational work from required by the traditional methods to and the memory requirement from to without using any lossy compression. The significant computational and memory reduction of the fast method is better reflected in numerical experiments. When solving a one-dimensional peridynamic model with unknowns, the traditional method consumed CPU time of 6days and 11h while the fast method used only 3.3s. In addition, on the same computer (with 128GB memory), the traditional method with a Gaussian elimination or conjugate gradient method ran out of memory when solving the problem with unknowns. In contrast, the fast method was able to solve the problem with unknowns using 3days and 11h of CPU time. This shows the benefit of the significantly reduced memory requirement of the fast method. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
17. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
18. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
19. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
20. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
21. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
22. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
23. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
24. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
25. On diagonally structured matrix computation
- Abstract
In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.
- Published
- 2019
26. A distributed, scaleable simplex method.
- Author
-
Yarmish, Gavriel and Slyke, Richard
- Subjects
- *
SIMPLEXES (Mathematics) , *LINEAR programming , *DIGITAL filters (Mathematics) , *IMAGE processing , *COMPUTER networks - Abstract
We present a simple, scaleable, distributed simplex implementation for large linear programs. It is designed for coarse-grained computation, particularly, readily available networks of workstations. Scalability is achieved by using the standard form of the simplex rather than the revised method. Virtually all serious implementations are based on the revised method because it is much faster for sparse LPs, which are most common. However, there are advantages to the standard method as well. First, the standard method is effective for dense problems. Although dense problems are uncommon in general, they occur frequently in some important applications such as wavelet decomposition, digital filter design, text categorization, and image processing. Second, the standard method can be easily and effectively extended to a coarse grained, distributed algorithm. Such an implementation is presented here. The effectiveness of the approach is supported by experiment and analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
27. A new implementation of the CMRH method for solving dense linear systems
- Author
-
Heyouni, M. and Sadok, H.
- Subjects
- *
LINEAR systems , *ALGORITHMS , *NUMERICAL analysis , *SYSTEMS theory - Abstract
Abstract: The CMRH method [H. Sadok, Méthodes de projections pour les systèmes linéaires et non linéaires, Habilitation thesis, University of Lille1, Lille, France, 1994; H. Sadok, CMRH: A new method for solving nonsymmetric linear systems based on the Hessenberg reduction algorithm, Numer. Algorithms 20 (1999) 303–321] is an algorithm for solving nonsymmetric linear systems in which the Arnoldi component of GMRES is replaced by the Hessenberg process, which generates Krylov basis vectors which are orthogonal to standard unit basis vectors rather than mutually orthogonal. The iterate is formed from these vectors by solving a small least squares problem involving a Hessenberg matrix. Like GMRES, this method requires one matrix–vector product per iteration. However, it can be implemented to require half as much arithmetic work and less storage. Moreover, numerical experiments show that this method performs accurately and reduces the residual about as fast as GMRES. With this new implementation, we show that the CMRH method is the only method with long-term recurrence which requires not storing at the same time the entire Krylov vectors basis and the original matrix as in the GMRES algorithm. A comparison with Gaussian elimination is provided. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
28. A short survey on preconditioning techniques for large-scale dense complex linear systems in electromagnetics.
- Author
-
Wang, Yin, Lee, Jeonghwa, and Zhang, Jun
- Subjects
- *
MATHEMATICAL statistics , *MATRICES (Mathematics) , *ELECTROMAGNETISM , *COMPUTER algorithms , *SCATTERING (Mathematics) , *EQUATIONS , *COMPUTATIONAL mathematics - Abstract
In solving systems of linear equations arising from practical scientific and engineering modelling and simulations such as electromagnetics applications, it is important to choose a fast and robust solver. Due to the large scale of those problems, preconditioned Krylov subspace methods are most suitable. In electromagnetics simulations, the use of preconditioned Krylov subspace methods in the context of multilevel fast multipole algorithms (MLFMA) is particularly attractive. In this paper, we present a short survey of a few preconditioning techniques in this application. We also compare several preconditioning techniques combined with the Krylov subspace methods to solve large dense linear systems arising from electromagnetic scattering problems and present some numerical results. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
29. On why an algorithmic time complexity measure can be system invariant rather than system independent
- Author
-
Chakraborty, Soubhik and Sourabh, Suman Kumar
- Subjects
- *
MATRICES (Mathematics) , *UNIVERSAL algebra , *ALGEBRA , *ALGORITHMS - Abstract
Abstract: The present paper argues that it suffices for an algorithmic time complexity measure to be system invariant rather than system independent (which means predicting from the desk). [Copyright &y& Elsevier]
- Published
- 2007
- Full Text
- View/download PDF
30. Efficient parallel factorization and solution of structured and unstructured linear systems
- Author
-
Reif, John H.
- Subjects
- *
LINEAR systems , *MATRICES (Mathematics) , *ALGORITHMS , *COMPUTER science - Abstract
Abstract: This paper gives improved parallel methods for several exact factorizations of some classes of symmetric positive definite (SPD) matrices. Our factorizations also provide us similarly efficient algorithms for exact computation of the solution of the corresponding linear systems (which need not be SPD), and for finding rank and determinant magnitude. We assume the input matrices have entries that are rational numbers expressed as a ratio of integers with at most a polynomial number of bits . We assume a parallel random access machine (PRAM) model of parallel computation, with unit cost arithmetic operations, including division, over a finite field , where p is a prime number whose binary representation is linear in the size of the input matrix and is randomly chosen by the algorithm. We require only bit precision which is the asymptotically optimal bit precision for . Our algorithms are randomized, giving the outputs with high likelihood . We compute LU and QR factorizations for dense matrices, and LU factorizations of sparse matrices which are -separable, reducing the known parallel time bounds for these factorizations from to , without an increase in processors (matching the best known work bounds of known parallel algorithms with polylog time bounds). Using the same parallel algorithm specialized to structured matrices, we compute LU factorizations for Toeplitz matrices and matrices of bounded displacement rank in time with processors, reducing by a nearly linear factor the best previous processor bounds for polylog times (however, these prior works did not generally require unit cost division over a finite field). We use this result to solve in the same bounds: polynomial resultant; and Padé approximants of rational functions; and in a factor more time: polynomial greatest common divisors (GCD) and extended GCD; again reducing the best processor bounds by a nearly linear factor. [Copyright &y& Elsevier]
- Published
- 2005
- Full Text
- View/download PDF
31. COMBINING KRONECKER PRODUCT APPROXIMATION WITH DISCRETE WAVELET TRANSFORMS TO SOLVE DENSE, FUNCTION-RELATED LINEAR SYSTEMS.
- Author
-
Ford, Judith M. and Tyrtyshnikov, Eugene E.
- Subjects
- *
LINEAR systems , *KRONECKER products , *MATRICES (Mathematics) , *INTEGRAL equations , *WAVELETS (Mathematics) , *PERSONAL computers - Abstract
A new solution technique is proposed for linear systems with large dense matrices of a certain class including those that come from typical integral equations of potential theory. This technique combines Kronecker product approximation and wavelet sparsification for the Kronecker product factors. The user is only required to supply a procedure for computation of each entry of the given matrix. The main sources of efficiency are the incomplete cross approximation procedure adapted from the mosaic-skeleton method of the second author and data-sparse preconditioners (the incomplete LU decomposition with dynamic choice of the fill-in structure with a prescribed threshold and the inverse Kronecker product preconditioner) constructed for the sum of Kronecker products of sparsified finger-like matrices computed by the discrete wavelet transform. In some model, but quite representative, examples the new technique allowed us to solve dense systems with more than 1 million unknowns in a few minutes on a personal computer with 1 Gbyte operative memory. [ABSTRACT FROM AUTHOR]
- Published
- 2003
- Full Text
- View/download PDF
32. ON A RECURSIVE SCHUR PRECONDITIONER FOR ITERATIVE SOLUTION OF A CLASS OF DENSE MATRIX PROBLEMS.
- Author
-
Ford, Judith M., KE CHEN, Judith M., and Evans, David
- Subjects
- *
MATRICES (Mathematics) , *WAVELETS (Mathematics) , *ITERATIVE methods (Mathematics) , *SCHUR functions , *LINEAR systems - Abstract
There are currently several distinct preconditioning methods for dense matrices based on applying a wavelet transform to obtain a matrix with a large number of small entries. A sparse preconditioner for this transformed matrix can be formed by setting to zero entries that are assumed to be unimportant. The effectiveness of the preconditioner depends on retaining the most important entries and on ensuring that they are positioned conveniently within the transformed matrix. In this paper we present a new, recursive preconditioning strategy that takes into account more of the significant entries without greatly increasing cost and outperforms existing methods in certain cases. [ABSTRACT FROM AUTHOR]
- Published
- 2003
- Full Text
- View/download PDF
33. Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems
- Author
-
Seher Acer, Cevdet Aykanat, Oguz Selvitopi, and Aykanat, Cevdet
- Subjects
Hypergraph ,Matrix partitioning ,Recursive bipartitioning ,Large scale systems ,Computer Networks and Communications ,Computer science ,Sparse matrix dense matrix multiplication ,Communication volume balancing ,010103 numerical & computational mathematics ,02 engineering and technology ,Parallel computing ,Matrix algebra ,01 natural sciences ,Theoretical Computer Science ,Big data ,Matrix (mathematics) ,Kernel (linear algebra) ,Combinatorial scientific computing ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Irregular applications ,Linear algebra ,0101 mathematics ,Resource allocation ,Sparse matrix ,020203 distributed computing ,Hypergraph partitioning ,Graph partitioning ,Graph partition ,Dense matrices ,Load balancing (computing) ,Computer Graphics and Computer-Aided Design ,Partition (database) ,Graph ,Graph theory ,Networking and Information Technology R&D (NITRD) ,Hardware and Architecture ,Sparse matrices ,Cognitive Science ,Cognitive Sciences ,Distributed Computing ,Load balancing ,Software - Abstract
A generic model to scale sparse matrix dense matrix multiplication (SpMM).SpMM suffers from high communication volume overhead.Different volume-based metrics such as maximum volume besides total volume.Simultaneous minimization of volume-based communication cost metrics.Portable models based on graph and hypergraph partitioning. We propose a comprehensive and generic framework to minimize multiple and different volume-based communication cost metrics for sparse matrix dense matrix multiplication (SpMM). SpMM is an important kernel that finds application in computational linear algebra and big data analytics. On distributed memory systems, this kernel is usually characterized with its high communication volume requirements. Our approach targets irregularly sparse matrices and is based on both graph and hypergraph partitioning models that rely on the widely adopted recursive bipartitioning paradigm. The proposed models are lightweight, portable (can be realized using any graph and hypergraph partitioning tool) and can simultaneously optimize different cost metrics besides total volume, such as maximum send/receive volume, maximum sum of send and receive volumes, etc., in a single partitioning phase. They allow one to define and optimize as many custom volume-based metrics as desired through a flexible formulation. The experiments on a wide range of about thousand matrices show that the proposed models drastically reduce the maximum communication volume compared to the standard partitioning models that only address the minimization of total volume. The improvements obtained on volume-based partition quality metrics using our models are validated with parallel SpMM as well as parallel multi-source BFS experiments on two large-scale systems. For parallel SpMM, compared to the standard partitioning models, our graph and hypergraph partitioning models respectively achieve reductions of 14% and 22% in runtime, on average. Compared to the state-of-the-art partitioner UMPa, our graph model is overall 14.5 faster and achieves an average improvement of 19% in the partition quality on instances that are bounded by maximum volume. For parallel BFS, we show on graphs with more than a billion edges that the scalability can significantly be improved with our models compared to a recently proposed two-dimensional partitioning model.
- Published
- 2016
34. An algorithm for accelerated computation of DWTPer-based band preconditioners.
- Author
-
Ford, Judith and Chen, Ke
- Abstract
We present a new algorithm for computing DWT-based preconditioners at a reduced cost, and we illustrate the savings that can be achieved with examples taken from the solution of a nonlinear problem by a Newton–Krylov method. [ABSTRACT FROM AUTHOR]
- Published
- 2001
- Full Text
- View/download PDF
35. Improving Iterative Solutions of the Electric-Field Integral Equation Via Transformations Into Normal Equations
- Author
-
Levent Gurel, Ozgur Ergul, Gürel, Levent, and Ergül, Özgür
- Subjects
Number of iterations ,Electromagnetics ,Discretization ,MathematicsofComputing_NUMERICALANALYSIS ,General Physics and Astronomy ,Electric-field integral equation ,Problem size ,Matrix (mathematics) ,Matrix equations ,Electrical and Electronic Engineering ,Integral equations ,GMRES algorithm ,Mathematics ,Sparse matrix ,Generalized minimal residual algorithms ,Conducting objects ,Independent equation ,Mathematical analysis ,Dense matrices ,Multi-level fast multi-pole algorithm ,Computer Science::Numerical Analysis ,Integral equation ,Generalized minimal residual method ,Iterative solutions ,Electronic, Optical and Magnetic Materials ,Discretizations ,Antennas ,Normal equations ,Algorithms - Abstract
We consider the solution of electromagnetics problems involving perfectly conducting objects formulated with the electric-field integral equation (EFIE). Dense matrix equations obtained from the discretization of EFIE are solved iteratively by the generalized minimal residual (GMRES) algorithm accelerated with a parallel multilevel fast multipole algorithm. We show that the number of iterations is halved by transforming the original matrix equations into normal equations. This way, memory required for the GMRES algorithm is reduced by more than 50%, which is significant when the problem size is large.
- Published
- 2010
36. FAST AND ACCURATE ANALYSIS OF LARGE METAMATERIAL STRUCTURES USING THE MULTILEVEL FAST MULTIPOLE ALGORITHM
- Author
-
Levent Gurel, Ozgur Ergul, Tahir Malas, Alper Unal, Gürel, Levent, and Ergül, Özgür
- Subjects
Fast and accurate simulations ,Discretization ,Fast multipole method ,Basis function ,Preconditioners ,Split-ring resonator ,Resonance ,Accurate analysis ,Matrix vector multiplication ,Electrical and Electronic Engineering ,Sparse matrix ,Mathematics ,Radiation ,Preconditioning techniques ,Metamaterial structures ,Preconditioner ,Dense matrices ,Condensed Matter Physics ,Multi-level fast multi-pole algorithm ,Scattering problems ,Integral equation ,Matrix multiplication ,Unit cells ,Thin wires ,Homogenization approximation ,Metamaterials ,Multipole expansion ,Algorithm ,Rao-Wilton-Glisson basis functions - Abstract
We report fast and accurate simulations of metamaterial structures constructed with large numbers of unit cells containing split-ring resonators and thin wires. Scattering problems involving various metamaterial walls are formulated rigorously using the electric-field integral equation, discretized with the Rao-Wilton-Glisson basis functions. Resulting dense matrix equations are solved iteratively, where the matrix-vector multiplications are performed efficiently with the multilevel fast multipole algorithm. For rapid solutions at resonance frequencies, convergence of the iterations is accelerated by using robust preconditioning techniques, such as the sparse-approximate-inverse preconditioner. Without resorting to homogenization approximations and periodicity assumptions, we are able to obtain accurate solutions of realistic metamaterial problems discretized with millions of unknowns.
- Published
- 2009
37. A new implementation of the CMRH method for solving dense linear systems
- Author
-
Hassane Sadok and Mohammed Heyouni
- Subjects
Hessenberg process ,Numerical linear algebra ,Krylov method ,Applied Mathematics ,Numerical analysis ,education.educational_degree ,Dense matrices ,computer.software_genre ,Generalized minimal residual method ,Orthogonal basis ,Matrix multiplication ,Habilitation ,Hessenberg matrix ,Computational Mathematics ,symbols.namesake ,Gaussian elimination ,Linear system ,symbols ,education ,Algorithm ,computer ,Mathematics - Abstract
The CMRH method [H. Sadok, Méthodes de projections pour les systèmes linéaires et non linéaires, Habilitation thesis, University of Lille1, Lille, France, 1994; H. Sadok, CMRH: A new method for solving nonsymmetric linear systems based on the Hessenberg reduction algorithm, Numer. Algorithms 20 (1999) 303–321] is an algorithm for solving nonsymmetric linear systems in which the Arnoldi component of GMRES is replaced by the Hessenberg process, which generates Krylov basis vectors which are orthogonal to standard unit basis vectors rather than mutually orthogonal. The iterate is formed from these vectors by solving a small least squares problem involving a Hessenberg matrix. Like GMRES, this method requires one matrix–vector product per iteration. However, it can be implemented to require half as much arithmetic work and less storage. Moreover, numerical experiments show that this method performs accurately and reduces the residual about as fast as GMRES. With this new implementation, we show that the CMRH method is the only method with long-term recurrence which requires not storing at the same time the entire Krylov vectors basis and the original matrix as in the GMRES algorithm. A comparison with Gaussian elimination is provided.
- Published
- 2008
38. Approximate Inverse Preconditioners for Some Large Dense Random Electrostatic Interaction Matrices
- Author
-
Johan Helsing
- Subjects
dense matrices ,integral equations ,Mathematical optimization ,Computer Networks and Communications ,Iterative method ,Diagonal ,Inverse ,computer.software_genre ,Least squares ,Mathematics::Numerical Analysis ,preconditioners ,potential theory ,inverses ,Applied mathematics ,Mathematics ,Sparse matrix ,Numerical linear algebra ,Preconditioner ,Applied Mathematics ,Linear system ,Computer Science::Numerical Analysis ,Computational Mathematics ,sparse approximate ,Computer Science::Mathematical Software ,iterative methods ,computer ,Software - Abstract
A sparse mesh-neighbour based approximate inverse preconditioner is proposed for a type of dense matrices whose entries come from the evaluation of a slowly decaying free space Green's function at randomly placed points in a unit cell. By approximating distant potential fields originating at closely spaced sources in a certain way, the preconditioner is given properties similar to, or better than, those of a standard least squares approximate inverse preconditioner while its setup cost is only that of a diagonal block approximate inverse preconditioner. Numerical experiments on iterative solutions of linear systems with up to four million unknowns illustrate how the new preconditioner drastically outperforms standard approximate inverse preconditioners of otherwise similar construction, and especially so when the preconditioners are very sparse.
- Published
- 2006
39. Efficient parallel factorization and solution of structured and unstructured linear systems
- Author
-
John H. Reif
- Subjects
Polynomial ,Displacement rank ,Rank (linear algebra) ,Parallel algorithms ,Computer Networks and Communications ,Linear systems ,010103 numerical & computational mathematics ,0102 computer and information sciences ,01 natural sciences ,Theoretical Computer Science ,law.invention ,Resultant ,Combinatorics ,Matrix (mathematics) ,law ,Newton iteration ,GCD ,0101 mathematics ,Structured matrices ,Padé approximation ,Mathematics ,Discrete mathematics ,Applied Mathematics ,Dense matrices ,Polynomial greatest common divisors ,Toeplitz matrix ,LU decomposition ,Finite field ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,Toeplitz matrices ,LU factorization ,Sparse matrices ,Parallel random-access machine ,Extended Euclidean algorithm - Abstract
This paper gives improved parallel methods for several exact factorizations of some classes of symmetric positive definite (SPD) matrices. Our factorizations also provide us similarly efficient algorithms for exact computation of the solution of the corresponding linear systems (which need not be SPD), and for finding rank and determinant magnitude. We assume the input matrices have entries that are rational numbers expressed as a ratio of integers with at most a polynomial number of bits @b. We assume a parallel random access machine (PRAM) model of parallel computation, with unit cost arithmetic operations, including division, over a finite field Z"p, where p is a prime number whose binary representation is linear in the size of the input matrix and is randomly chosen by the algorithm. We require only bit precision O(n(@b+logn)), which is the asymptotically optimal bit precision for @b>=logn. Our algorithms are randomized, giving the outputs with high likelihood >=1-1/n^@W^(^1^). We compute LU and QR factorizations for dense matrices, and LU factorizations of sparse matrices which are s(n)-separable, reducing the known parallel time bounds for these factorizations from @W(log^3n) to O(log^2n), without an increase in processors (matching the best known work bounds of known parallel algorithms with polylog time bounds). Using the same parallel algorithm specialized to structured matrices, we compute LU factorizations for Toeplitz matrices and matrices of bounded displacement rank in time O(log^2n) with nloglogn processors, reducing by a nearly linear factor the best previous processor bounds for polylog times (however, these prior works did not generally require unit cost division over a finite field). We use this result to solve in the same bounds: polynomial resultant; and Pade approximants of rational functions; and in a factor O(logn) more time: polynomial greatest common divisors (GCD) and extended GCD; again reducing the best processor bounds by a nearly linear factor.
- Published
- 2005
40. Approximate Inverse Preconditioners for Some Large Dense Random Electrostatic Interaction Matrices
- Author
-
Helsing, Johan
- Published
- 2006
- Full Text
- View/download PDF
41. Solving linear systems using wavelet compression combined with Kronecker product approximation
- Author
-
Ford, Judith M. and Tyrtyshnikov, Eugene E.
- Published
- 2005
- Full Text
- View/download PDF
42. Wavelet-based Preconditioners for Dense Matrices with Non-Smooth Local Features
- Author
-
Ford, Judith and Chen, Ke
- Published
- 2001
- Full Text
- View/download PDF
43. Key concepts for parallel out-of-core LU factorization
- Author
-
Jack Dongarra, David W. Walker, and Sven Hammarling
- Subjects
Parallel computing ,Computer Networks and Communications ,Computer science ,Performance ,Out-of-core LU factorization ,Theoretical Computer Science ,law.invention ,Artificial Intelligence ,Simple (abstract algebra) ,law ,Modelling and Simulation ,Performance model ,Intel Paragon ,ScaLAPACK ,Dense matrices ,Incomplete LU factorization ,Computer Graphics and Computer-Aided Design ,Parallel I/O ,LU decomposition ,Computational Mathematics ,Computational Theory and Mathematics ,Hardware and Architecture ,Modeling and Simulation ,Key (cryptography) ,Out-of-core algorithm ,Software - Abstract
This paper considers key ideas in the design of out-of-core dense LU factorization routines. A left-looking variant of the LU factorization algorithm is shown to require less I/O to disk than the right-looking variant, and is used to develop a parallel, out-of-core implementation. This implementation makes use of a small library of parallel I/O routines, together with ScaLAPACK and PBLAS routines. Results for runs on an Intel Paragon are presented and interpreted using a simple performance model.
- Published
- 1998
- Full Text
- View/download PDF
44. Computational analysis of complicated metamaterial structures using MLFMA and nested preconditioners
- Author
-
Levent Gurel, Ozgur Ergul, Tahir Malas, Alper Unal, C. Yavuz, Gürel, Levent, and Ergül, Özgür
- Subjects
Discretization ,Clustering algorithms ,Parallel algorithms ,Parallel algorithm ,Basis function ,Electric-field integral equation ,Preconditioners ,Split-ring resonator ,Computational science ,Matrix (mathematics) ,Matrix vector multiplication ,Electronic equipment ,Ill-conditioned ,Matrix equations ,Computational analysis ,Integral equations ,Mathematics ,Sparse matrix ,Electromagnetic wave scattering ,Multilevel fast multipole Algorithm ,Problem solving ,Metamaterial structures ,Mathematical analysis ,Dense matrices ,Multilevel fast multipole algorithms ,Computer simulation ,Integral equation ,Scattering problems ,Matrix multiplication ,Iterative solutions ,Rapid convergence ,Unit cells ,Personal computers ,Thin wires ,Metamaterials ,Electromagnetic scattering ,Iterative solvers ,Antennas ,Nested preconditioners ,Rao-Wilton-Glisson basis functions - Abstract
Date of Conference: 11-16 Nov. 2007 Conference name: 2nd European Conference on Antennas and Propagation (EuCAP 2007) We consider accurate solution of scattering problems involving complicated metamaterial (MM) structures consisting of thin wires and split-ring resonators. The scattering problems are formulated by the electric-field integral equation (EFIE) discretized with the Rao-Wilton- Glisson basis functions defined on planar triangles. The resulting dense matrix equations are solved iteratively, where the matrix-vector multiplications that are required by the iterative solvers are accelerated with the multilevel fast multipole algorithm (MLFMA). Since EFIE usually produces matrix equations that are ill-conditioned and difficult to solve iteratively, we employ nested preconditioners to achieve rapid convergence of the iterative solutions. To further accelerate the simulations, we parallelize our algorithm and perform the solutions on a cluster of personal computers. This way, we are able to solve problems of MMs involving thousands of unit cells.
- Published
- 2007
45. Fast and accurate solutions of scattering problems involving dielectric objects with moderate and low contrasts
- Author
-
Levent Gurel, Ozgur Ergul, Gürel, Levent, and Ergül, Özgür
- Subjects
preconditioner ,Aircraft ,Discretization ,Iterative methods ,Iterative method ,Combined field integral equation (CFIE) ,Convergence (mathematics) ,Scattering ,Electromagnetism ,Electromagnetic scattering problems ,Surface integral equations ,Convergence (routing) ,Integral equations ,Electromagnetic wave scattering ,Mathematics ,Sparse matrix ,Three-dimensional (3-D) objects ,Preconditioner ,Three dimensional ,Multilevel fast multipole algorithm (MLFMA) ,Mathematical analysis ,Magnetism ,Dense matrices ,Dielectric objects ,Rao-Wilton-Glisson (RWG) functions ,Integral equation ,Function evaluation ,Solutions ,Breakdown (BD) ,Magnetic currents ,Surface formulations ,Problem sizes ,Scattering problem (SP) ,Computational electromagnetics (CEM) ,Multipole expansion ,In order - Abstract
Date of Conference: 30-31 Aug. 2007 Conference name: 2007 Computational Electromagnetics Workshop We consider the solution of electromagnetic scattering problems involving relatively large dielectric objects with moderate and low contrasts. Three-dimensional objects are discretized with Rao-Wilton-Glisson functions and the scattering problems are formulated with surface integral equations. The resulting dense matrix equations are solved iteratively by employing the multilevel fast multipole algorithm. We compare the accuracy and efficiency of the results obtained by employing various integral equations for the formulation of the problem. If the problem size is large, we show that a combined formulation, namely, electric-magnetic current combined-field integral equation, provides faster iterative convergence compared to other formulations, when it is accelerated with an efficient block preconditioner. For low-contrast problems, we introduce various stabilization procedures in order to avoid the numerical breakdown encountered in the conventional surface formulations. © 2007 IEEE.
- Published
- 2007
46. Algèbre linéaire exacte efficace : le calcul du polynôme caractéristique
- Author
-
Pernet, Clément, Laboratoire de Modélisation et Calcul (LMC - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Université Joseph-Fourier - Grenoble I, and Dominique Duval(dominique.duval@imag.fr)
- Subjects
Exact computation ,Polynômecaractéristique ,Algorithmes de Keller-Gehrig ,Dense matrices ,[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE] ,Calcul exact ,Characteristic polynomial ,Black Box ,Forme normale de Frobenius ,BLAS ,Arithmétique matricielle rapide ,Matrice en boîte noire ,Linear algebra ,Matrice dense ,Keller-Gehrig algorithms - Abstract
Linear algebra is a building block in scientific computation. Initially dominated by the numerical computation, it has been the scene of major breakthrough in exact computation during the last decade. These algorithmic progresses making the exact computation approach feasible, it became necessary to consider these algorithms from the viewpoint of practicability. We present the building of a set of basic exact linear algebra subroutines. Their efficiency over a finite field near the numerical BLAS. Beyond the applications in exact computation, we show that they offer analternative to the multiprecision numerical methods for the resolution of ill-conditioned problems.The computation of the characteristic polynomial is part of the classic problem in linear algebra. Its exact computation, e.g. helps determine the similarity equivalence between two matrices, using the Frobenius normal form, or the cospectrality of two graphs. The improvement of its theoretical complexity remains an open problem, for both dense or black-box methods. We address this problem from the viewpoint of efficiency in practice: adaptive algorithms for dense or black-box matrices are derived from the best existing algorithms, to ensure high efficiency in practice.It makes it possible to handle problems whose dimensions was up to now unreachable.; L'algèbre linéaire est une brique de base essentielle du calcul scientifique. Initialement dominée par le calcul numérique, elle connaît depuis les dix dernières années des progrès considérables en calcul exact. Ces avancées algorithmiques rendant l'approche exacte envisageable, il est devenu nécessaire de considérer leur mise en pratique. Nous présentons la mise en oeuvre de routines de base en algèbre linéaire exacte dont l'efficacité sur les corps finis est comparable celles des BLAS numériques. Au délà des applications propres au calcul exact, nous montrons qu'elles offrent une alternative au calcul numérique multiprécision pour la résolution de certains problèmes numériques mal conditionnés.Le calcul du polynôme caractéristique est l'un des problèmes classiques en algèbre linéaire. Son calcul exact permet par exemple de déterminer la similitude entre deux matrices, par le calcul de la forme normale de Frobenius, ou la cospectralité de deux graphes. Si l'amélioration de sa complexité théorique reste un problème ouvert, tant pour les méthodes denses que boîte noire, nous abordons la question du point de vue de la praticabilité : des algorithmes adaptatifs pour les matrices denses ou boîte noire sont dérivés des meilleurs algorithmes existants pour assurer l'efficacité en pratique. Cela permet de traiter de façon exacte des problèmes de dimensions jusqu'alors inaccessibles.
- Published
- 2006
47. Efficient exact linear algebra: computing the characteristic polynomial
- Author
-
Pernet, Clément, Laboratoire de Modélisation et Calcul (LMC - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Université Joseph-Fourier - Grenoble I, Dominique Duval(dominique.duval@imag.fr), and Pernet, Clément
- Subjects
Exact computation ,Polynômecaractéristique ,Algorithmes de Keller-Gehrig ,[INFO.INFO-SE] Computer Science [cs]/Software Engineering [cs.SE] ,Dense matrices ,[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE] ,Calcul exact ,Characteristic polynomial ,Black Box ,Forme normale de Frobenius ,BLAS ,Arithmétique matricielle rapide ,Matrice en boîte noire ,Linear algebra ,Matrice dense ,Keller-Gehrig algorithms - Abstract
Linear algebra is a building block in scientific computation. Initially dominated by the numerical computation, it has been the scene of major breakthrough in exact computation during the last decade. These algorithmic progresses making the exact computation approach feasible, it became necessary to consider these algorithms from the viewpoint of practicability. We present the building of a set of basic exact linear algebra subroutines. Their efficiency over a finite field near the numerical BLAS. Beyond the applications in exact computation, we show that they offer analternative to the multiprecision numerical methods for the resolution of ill-conditioned problems.The computation of the characteristic polynomial is part of the classic problem in linear algebra. Its exact computation, e.g. helps determine the similarity equivalence between two matrices, using the Frobenius normal form, or the cospectrality of two graphs. The improvement of its theoretical complexity remains an open problem, for both dense or black-box methods. We address this problem from the viewpoint of efficiency in practice: adaptive algorithms for dense or black-box matrices are derived from the best existing algorithms, to ensure high efficiency in practice.It makes it possible to handle problems whose dimensions was up to now unreachable., L'algèbre linéaire est une brique de base essentielle du calcul scientifique. Initialement dominée par le calcul numérique, elle connaît depuis les dix dernières années des progrès considérables en calcul exact. Ces avancées algorithmiques rendant l'approche exacte envisageable, il est devenu nécessaire de considérer leur mise en pratique. Nous présentons la mise en oeuvre de routines de base en algèbre linéaire exacte dont l'efficacité sur les corps finis est comparable celles des BLAS numériques. Au délà des applications propres au calcul exact, nous montrons qu'elles offrent une alternative au calcul numérique multiprécision pour la résolution de certains problèmes numériques mal conditionnés.Le calcul du polynôme caractéristique est l'un des problèmes classiques en algèbre linéaire. Son calcul exact permet par exemple de déterminer la similitude entre deux matrices, par le calcul de la forme normale de Frobenius, ou la cospectralité de deux graphes. Si l'amélioration de sa complexité théorique reste un problème ouvert, tant pour les méthodes denses que boîte noire, nous abordons la question du point de vue de la praticabilité : des algorithmes adaptatifs pour les matrices denses ou boîte noire sont dérivés des meilleurs algorithmes existants pour assurer l'efficacité en pratique. Cela permet de traiter de façon exacte des problèmes de dimensions jusqu'alors inaccessibles.
- Published
- 2006
48. Improved upper bounds for the expected circuit complexity of dense systems of linear equations over GF(2).
- Author
-
Visconti A, Schiavo CV, and Peralta R
- Abstract
Minimizing the Boolean circuit implementation of a given cryptographic function is an important issue. A number of papers [1], [2], [3], [4] only consider cancellation-free straight-line programs for producing small circuits over GF(2). Cancellation is allowed by the Boyar-Peralta ( BP ) heuristic [5, 6]. This yields a valuable tool for practical applications such as building fast software and low-power circuits for cryptographic applications, e.g. AES [5, 7], HMAC-SHA-1 [8], PRESENT [9], GOST [9], and so on. However, the BP heuristic does not take into account the matrix density. In a dense linear system the rows can be computed by adding or removing a few elements from a "common path" that is "close" to almost all rows. The new heuristic described in this paper will merge the idea of "cancellation" and "common path". An extensive testing activity has been performed. Experimental results of the new and the BP heuristic were compared. They show that the Boyar-Peralta results are not optimal on dense systems.
- Published
- 2018
- Full Text
- View/download PDF
49. Approximate inverse preconditioners for some large dense random electrostatic interaction matrices
- Abstract
A sparse mesh-neighbour based approximate inverse preconditioner is proposed for a type of dense matrices whose entries come from the evaluation of a slowly decaying free space Green's function at randomly placed points in a unit cell. By approximating distant potential fields originating at closely spaced sources in a certain way, the preconditioner is given properties similar to, or better than, those of a standard least squares approximate inverse preconditioner while its setup cost is only that of a diagonal block approximate inverse preconditioner. Numerical experiments on iterative solutions of linear systems with up to four million unknowns illustrate how the new preconditioner drastically outperforms standard approximate inverse preconditioners of otherwise similar construction, and especially so when the preconditioners are very sparse.
- Published
- 2006
50. Exposing and Exploiting Internal Parallelism in MEMS-Based Storage
- Abstract
MEMS-based storage has interesting access parallelism features. Specifically, subsets of a MEMStore's thousands of tips can be used in parallel, and the particular subset can be dynamically chosen. This paper describes how such access parallelism can be exposed to system software -- with minimal changes to system interfaces -- and utilized cleanly for two classes of applications. First, background tasks can utilize unused parallelism to access media locations with no impact on foreground activity. Second, two-dimensional data structures, such as dense matrices and relational database tables, can be accessed in both row order and column order with maximum efficiency. With proper table layout, unwanted portions of a table can be skipped while scanning at full speed. Using simulation, the authors explore performance features of using this device parallelism for an example application from each class., Sponsored in part by the Defense Advanced Research Projects Agency (DARPA).
- Published
- 2003
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.