40 results on '"Coppersmith–Winograd algorithm"'
Search Results
2. On the arithmetic complexity of Strassen-like matrix multiplications
- Author
-
Murat Cenk and M. Anwar Hasan
- Subjects
Algebra and Number Theory ,Computational complexity theory ,Dimension (graph theory) ,010103 numerical & computational mathematics ,0102 computer and information sciences ,Symbolic computation ,01 natural sciences ,Matrix multiplication ,Matrix chain multiplication ,Computational Mathematics ,010201 computation theory & mathematics ,Strassen algorithm ,Arithmetic circuit complexity ,0101 mathematics ,Arithmetic ,Mathematics ,Coppersmith–Winograd algorithm - Abstract
The Strassen algorithm for multiplying 22 matrices requires seven multiplications and 18 additions. The recursive use of this algorithm for matrices of dimension n yields a total arithmetic complexity of (7n2.816n2) for n=2k. Winograd showed that using seven multiplications for this kind of matrix multiplication is optimal. Therefore, any algorithm for multiplying 22 matrices with seven multiplications is called a Strassen-like algorithm. Winograd also discovered an additively optimal Strassen-like algorithm with 15 additions. This algorithm is called the Winograd's variant, whose arithmetic complexity is (6n2.815n2) for n=2k and (3.73n2.815n2) for n=82k, which is the best-known bound for Strassen-like multiplications. This paper proposes a method that reduces the complexity of Winograd's variant to (5n2.81+0.5n2.59+2n2.326.5n2) for n=2k. It is also shown that the total arithmetic complexity can be improved to (3.55n2.81+0.148n2.59+1.02n2.326.5n2) for n=82k, which, to the best of our knowledge, improves the best-known bound for a Strassen-like matrix multiplication algorithm.
- Published
- 2017
- Full Text
- View/download PDF
3. Determining the Effects of Cross-over Point on the Running Time of Strassen Matrix Multiplication Algorithm
- Author
-
T. A. Atabong and S. C. Agu
- Subjects
Divide and conquer algorithms ,Freivalds' algorithm ,Multiplication algorithm ,MathematicsofComputing_NUMERICALANALYSIS ,Square matrix ,Matrix multiplication ,Strassen algorithm ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Computer Science::Mathematical Software ,General Earth and Planetary Sciences ,Multiplication ,Arithmetic ,General Environmental Science ,Mathematics ,Coppersmith–Winograd algorithm - Abstract
This paper studies Strassen’s algorithms for fast multiplication of two finite dimensional matrices. However, one pertinent issue that has deterred Strassen’s scheme from been considered for practical usage is determining the cross-over point. In this light, large matrices with different sizes were randomly generated on which Strassen and conventional matrix multiplication algorithms were implemented in MATLAB R2008b. Two MATLAB built-in functions nextpow2 and pow2 were used for implementing padding techniques to ensure that the matrices are to the power of two. Three different experiments were carried out using five, four and three levels of recursion (divide and conquer algorithm) respectively to determine the suitable cut-off point which were used to evaluate the optimal running time for Strassen’s algorithm. For each experiment, eight finite dimensional square matrices of real numbers were generated and iteratively multiplied. The experiment reveals that the cut-off point with five level of recursion optimized the Strassens time.
- Published
- 2015
- Full Text
- View/download PDF
4. Matrix Multiplication, a Little Faster
- Author
-
Elaye Karstadt and Oded Schwartz
- Subjects
Freivalds' algorithm ,Multiplication algorithm ,Generalization ,MathematicsofComputing_NUMERICALANALYSIS ,010103 numerical & computational mathematics ,0102 computer and information sciences ,01 natural sciences ,Matrix chain multiplication ,Matrix (mathematics) ,Artificial Intelligence ,Cuthill–McKee algorithm ,Strassen algorithm ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Computer Science::Symbolic Computation ,0101 mathematics ,Base (exponentiation) ,Mathematics ,Coppersmith–Winograd algorithm ,Discrete mathematics ,Basis (linear algebra) ,Block matrix ,Matrix multiplication ,Hardware and Architecture ,Control and Systems Engineering ,010201 computation theory & mathematics ,Computer Science::Mathematical Software ,Constant (mathematics) ,Change of basis ,Algorithm ,Software ,Information Systems - Abstract
Strassen’s algorithm (1969) was the first sub-cubic matrix multiplication algorithm. Winograd (1971) improved the leading coefficient of its complexity from 6 to 7. There have been many subsequent asymptotic improvements. Unfortunately, most of these have the disadvantage of very large, often gigantic, hidden constants. Consequently, Strassen-Winograd’s O ( n log 2 7 ) algorithm often outperforms other fast matrix multiplication algorithms for all feasible matrix dimensions. The leading coefficient of Strassen-Winograd’s algorithm has been generally believed to be optimal for matrix multiplication algorithms with a 2 × 2 base case, due to the lower bounds by Probert (1976) and Bshouty (1995). Surprisingly, we obtain a faster matrix multiplication algorithm, with the same base case size and asymptotic complexity as Strassen-Winograd’s algorithm, but with the leading coefficient reduced from 6 to 5. To this end, we extend Bodrato’s (2010) method for matrix squaring, and transform matrices to an alternative basis. We also prove a generalization of Probert’s and Bshouty’s lower bounds that holds under change of basis, showing that for matrix multiplication algorithms with a 2 × 2 base case, the leading coefficient of our algorithm cannot be further reduced, and is therefore optimal. We apply our method to other fast matrix multiplication algorithms, improving their arithmetic and communication costs by significant constant factors.
- Published
- 2017
- Full Text
- View/download PDF
5. MapReduce Implementation of Strassen's Algorithm for Matrix Multiplication
- Author
-
Prakash Ramanan and Minhao Deng
- Subjects
Reducer ,Logarithm ,Computer science ,Recursion (computer science) ,020207 software engineering ,02 engineering and technology ,Parallel computing ,Matrix multiplication ,Matrix (mathematics) ,020204 information systems ,Strassen algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Multiplication ,Algorithm ,Sequential algorithm ,Coppersmith–Winograd algorithm - Abstract
Consider the multiplication of two n×n matrices. A straight-forward sequential algorithm for computing the product takes Θ(n3) time. Strassen [20] presented an algorithm that takes Θ(n1g 7) time; lg denotes logarithm to the base 2; lg 7 is about 2.81.Now, consider the implementation of these two algorithms (straight-forward and Strassen) in the mapReduce framework. Several papers have studied mapReduce implementations of the straight-forward algorithm; this algorithm can be implemented using a constant number (typically, one or two) of mapReduce passes. In this paper, we study the mapReduce implementation of Strassen's algorithm.If we unwind the recursion, Strassen's algorithm consists of three parts, Parts I-III. Direct mapReduce implementations of the three parts take lg n, 1 and lg n passes, respectively; total number of passes is 2 lg n+1. We show that Part I can be implemented in 2 passes, with total work Θ(n1g 7), and reducer size and reducer workload o(n). We believe that Part III can also be implemented, with small reducer size and workload, in a constant number of passes; this is future work.
- Published
- 2017
- Full Text
- View/download PDF
6. Fast matrix decomposition inF2
- Author
-
Enrico Bertolazzi and Anna Rimoldi
- Subjects
Computational Mathematics ,Applied Mathematics ,Strassen algorithm ,Recursion (computer science) ,Multiplication ,Row ,Column (database) ,Algorithm ,Matrix multiplication ,Coppersmith–Winograd algorithm ,Matrix decomposition ,Mathematics - Abstract
In this work an efficient algorithm to perform a block decomposition for large dense rectangular matrices with entries in F"2 is presented. Matrices are stored as column blocks of row major matrices in order to facilitate rows operation and matrix multiplications with blocks of columns. One of the major bottlenecks of matrix decomposition is the pivoting involving both rows and column exchanges. Since row swaps are cheap and column swaps are order of magnitude slower, the number of column swaps should be reduced as much as possible. Here an algorithm that completely avoids the column permutations is presented. An asymptotically fast algorithm is obtained by combining the four Russian algorithm and the recursion with the Strassen algorithm for matrix-matrix multiplication. Moreover optimal parameters for the tuning of the algorithm are theoretically estimated and then experimentally verified. A comparison with the state of the art public domain software SAGE shows that the proposed algorithm is generally faster.
- Published
- 2014
- Full Text
- View/download PDF
7. Communication costs of Strassen's matrix multiplication
- Author
-
James Demmel, Oded Schwartz, Grey Ballard, and Olga Holtz
- Subjects
Multiplication algorithm ,Analysis of parallel algorithms ,General Computer Science ,Computer science ,Computation ,Parallel algorithm ,010103 numerical & computational mathematics ,0102 computer and information sciences ,Parallel computing ,01 natural sciences ,Matrix multiplication ,010201 computation theory & mathematics ,Strassen algorithm ,Multiplication ,0101 mathematics ,Coppersmith–Winograd algorithm - Abstract
Algorithms have historically been evaluated in terms of the number of arithmetic operations they performed. This analysis is no longer sufficient for predicting running times on today's machines. Moving data through memory hierarchies and among processors requires much more time (and energy) than performing computations. Hardware trends suggest that the relative costs of this communication will only increase. Proving lower bounds on the communication of algorithms and finding algorithms that attain these bounds are therefore fundamental goals. We show that the communication cost of an algorithm is closely related to the graph expansion properties of its corresponding computation graph. Matrix multiplication is one of the most fundamental problems in scientific computing and in parallel computing. Applying expansion analysis to Strassen's and other fast matrix multiplication algorithms, we obtain the first lower bounds on their communication costs. These bounds show that the current sequential algorithms are optimal but that previous parallel algorithms communicate more than necessary. Our new parallelization of Strassen's algorithm is communication-optimal and outperforms all previous matrix multiplication algorithms.
- Published
- 2014
- Full Text
- View/download PDF
8. Matrix multiplication over word-size modular rings using approximate formulae
- Author
-
Brice Boyer, Jean-Guillaume Dumas, Department of mathematics [North Carolina], North Carolina State University [Raleigh] (NC State), University of North Carolina System (UNC)-University of North Carolina System (UNC), Calculs Algébriques et Systèmes Dynamiques (CASYS), Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), and ANR-11-BS02-0013,HPAC,Calcul Algébrique Haute-Performance(2011)
- Subjects
Discrete mathematics ,[INFO.INFO-SC]Computer Science [cs]/Symbolic Computation [cs.SC] ,Multiplication algorithm ,Ring (mathematics) ,Strassen-Winograd's algorithm ,Rank (linear algebra) ,Applied Mathematics ,Modulo ,010102 general mathematics ,matrix multiplication ,Bini's approximate bilinear algorithm ,010103 numerical & computational mathematics ,01 natural sciences ,Matrix multiplication ,Algebra ,efficient implementation ,exact linear algebra ,symbolic-numeric computing ,Strassen algorithm ,Diagonal matrix ,0101 mathematics ,memory placement and scheduling ,Software ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
Bini-Capovani-Lotti-Romani approximate formula (or border rank) for matrix multiplication achieves a better complexity than Strassen’s matrix multiplication formula. In this article, we show a novel way to use the approximate formula in the special case where the ring is Z / p Z . In addition, we show an implementation à la FFLAS--FFPACK, where p is a word-size modulo, that improves on state-of-the-art Z / p Z matrix multiplication implementations.
- Published
- 2016
- Full Text
- View/download PDF
9. Automatic Reproduction of a Genius Algorithm: Strassen's Algorithm Revisited by Genetic Search
- Author
-
Seunghyun Oh and Byung-Ro Moon
- Subjects
Computer science ,Evolutionary algorithm ,Matrix multiplication ,Theoretical Computer Science ,Sammon mapping ,symbols.namesake ,Computational Theory and Mathematics ,Gaussian elimination ,Search algorithm ,Strassen algorithm ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Genetic algorithm ,symbols ,Algorithm ,Software ,Coppersmith–Winograd algorithm - Abstract
In 1968, Volker Strassen, a young German mathematician, announced a clever algorithm to reduce the asymptotic complexity of n×n matrix multiplication from the order of n 3 to n 2.81. It soon became one of the most famous scientific discoveries in the 20th century and provoked numerous studies by other mathematicians to improve upon it. Although a number of improvements have been made, Strassen's algorithm is still optimal in his original framework, the bilinear systems of 2 × 2 matrix multiplication, and people are still curious how Strassen developed his algorithm. We examined it to see if we could automatically reproduce Strassen's discovery using a search algorithm and find other algorithms of the same quality. In total, we found 608 algorithms that have the same quality as Strassen's, including Strassen's original algorithm. We partitioned the algorithms into nine different groups based on the way they are constructed. This paper was made possible by the combination of genetic search and linear-algebraic techniques. To the best of our knowledge, this is the first work that automatically reproduced Strassen's algorithm, and furthermore, discovered new algorithms with equivalent asymptotic complexity using a search algorithm.
- Published
- 2010
- Full Text
- View/download PDF
10. Fast multiplication of matrices over a finitely generated semiring
- Author
-
Klas Markström, Lars Hellström, and Daniel Andrén
- Subjects
Discrete mathematics ,Freivalds' algorithm ,Ring (mathematics) ,Matrix multiplication ,Computer Science Applications ,Theoretical Computer Science ,Semiring ,Combinatorics ,Strassen algorithm ,Signal Processing ,Multiplication ,Logical matrix ,Information Systems ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
In this paper we show that nxn matrices with entries from a semiring R which is generated additively by q generators can be multiplied in time O(q^2n^@w), where n^@w is the complexity for matrix multiplication over a ring (Strassen: @w
- Published
- 2008
- Full Text
- View/download PDF
11. Improving the numerical stability of fast matrix multiplication
- Author
-
Benjamin Lipshitz, Austin R. Benson, Oded Schwartz, Grey Ballard, and Alex Druinsky
- Subjects
Diagonal scaling ,Mathematical optimization ,Freivalds' algorithm ,Numerical and Computational Mathematics ,Applied Mathematics ,Scalar (mathematics) ,Computer Science - Numerical Analysis ,Numerical & Computational Mathematics ,Numerical Analysis (math.NA) ,010103 numerical & computational mathematics ,0102 computer and information sciences ,01 natural sciences ,Matrix multiplication ,010201 computation theory & mathematics ,Strassen algorithm ,FOS: Mathematics ,Use case ,0101 mathematics ,Algorithm ,Analysis ,cs.NA ,Mathematics ,Numerical stability ,Coppersmith–Winograd algorithm - Abstract
© 2016 Sandia Corporation, operator of Sandia National Laboratories for the U.S. Department of Energy. Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar operations than the classical algorithm, have been considered primarily of theoretical interest. Apart from Strassen's original algorithm, few fast algorithms have been efficiently implemented or used in practical applications. However, there exist many practical alternatives to Strassen's algorithm with varying performance and numerical properties. Fast algorithms are known to be numerically stable, but because their error bounds are slightly weaker than the classical algorithm, they are not used even in cases where they provide a performance benefit. We argue in this paper that the numerical sacrifice of fast algorithms, particularly for the typical use cases of practical algorithms, is not prohibitive, and we explore ways to improve the accuracy both theoretically and empirically. The numerical accuracy of fast matrix multiplication depends on properties of the algorithm and of the input matrices, and we consider both contributions independently. We generalize and tighten previous error analyses of fast algorithms and compare their properties. We discuss algorithmic techniques for improving the error guarantees from two perspectives: manipulating the algorithms, and reducing input anomalies by various forms of diagonal scaling. Finally, we benchmark performance and demonstrate our improved numerical accuracy.
- Published
- 2016
- Full Text
- View/download PDF
12. Fast Matrix Multiplication
- Author
-
Andris Ambainis, Yuval Filmus, and François Le Gall
- Subjects
Class (set theory) ,Conjecture ,people.profession ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Identity (music) ,Matrix multiplication ,Running time ,Combinatorics ,010201 computation theory & mathematics ,Tensor (intrinsic definition) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Coppersmith ,people ,Mathematics ,Coppersmith–Winograd algorithm - Abstract
Until a few years ago, the fastest known matrix multiplication algorithm, due to Coppersmith and Winograd (1990), ran in time O(n2.3755). Recently, a surge of activity by Stothers, Vassilevska-Williams, and Le~Gall has led to an improved algorithm running in time O(n2.3729). These algorithms are obtained by analyzing higher and higher tensor powers of a certain identity of Coppersmith and Winograd. We show that this exact approach cannot result in an algorithm with running time O(n2.3725), and identify a wide class of variants of this approach which cannot result in an algorithm with running time $O(n^{2.3078}); in particular, this approach cannot prove the conjecture that for every e > 0, two n x n matrices can be multiplied in time O(n2+e).We describe a new framework extending the original laser method, which is the method underlying the previously mentioned algorithms. Our framework accommodates the algorithms by Coppersmith and Winograd, Stothers, Vassilevska-Williams and Le~Gall. We obtain our main result by analyzing this framework. The framework also explains why taking tensor powers of the Coppersmith--Winograd identity results in faster algorithms.
- Published
- 2015
- Full Text
- View/download PDF
13. A Vector Implementation of Gaussian Elimination over GF(2): Exploring the Design-Space of Strassen's Algorithm as a Case Study
- Author
-
Enric Morancho
- Subjects
symbols.namesake ,Gaussian elimination ,Computer science ,Strassen algorithm ,Linear algebra ,symbols ,Linear independence ,System of linear equations ,Algorithm ,GF(2) ,Row echelon form ,Coppersmith–Winograd algorithm - Abstract
Gaussian elimination is a key algorithm in linear algebra. It has many usages, for instance solving systems of linear equations and determining whether a set of vectors is linearly independent. This algorithm transforms an input matrix into a matrix in row (column) echelon form. The matrix entries and the transformations are defined over algebraic fields either infinite (e.g. the real numbers) or finite (e.g. GF (2)). This work discusses a vector implementation of this algorithm over GF (2). The evaluation develops a case study that searches exhaustively for algorithms over GF (2) similar to Strassen's algorithm (a matrix-multiply algorithm with sub cubic complexity) because the search engine requires solving a huge number of Gaussian eliminations over GF (2). Our vector implementation allows the search engine to complete the exploration in less than nine hours on a commodity processor supporting AVX2, outperforming by 1.92X a scalar-SWAR implementation specialized for the case study and by 7.43X a generic scalar-SWAR implementation. Our results show that, over GF (2), there are 20 algorithms similar to Strassen's.
- Published
- 2015
- Full Text
- View/download PDF
14. The aggregation and cancellation techniques as a practical tool for faster matrix multiplication
- Author
-
Igor E. Kaporin
- Subjects
Freivalds' algorithm ,General Computer Science ,Computational complexity theory ,Numerical stability ,010103 numerical & computational mathematics ,0102 computer and information sciences ,01 natural sciences ,Matrix multiplication ,Theoretical Computer Science ,Computational complexity ,010201 computation theory & mathematics ,Strassen algorithm ,Linear algebra ,Pan's aggregation/cancellation method ,Multiplication ,0101 mathematics ,Arithmetic ,Algorithm ,Winograd algorithm ,Mathematics ,Coppersmith–Winograd algorithm ,Fast matrix multiplication ,Computer Science(all) - Abstract
The main purpose of this paper is to present a fast matrix multiplication algorithm taken from the paper of Laderman et al. (Linear Algebra Appl. 162-164 (1992) 557) in a refined compact "analytical" form and to demonstrate that it can be implemented as quite efficient computer code. Our improved presentation enables us to simplify substantially the analysis of the computational complexity and numerical stability of the algorithm as well as its computer implementation. The algorithm multiplies two N × N matrices using O(N2.7760) arithmetic operations. In the case where N = 18 ċ 48k, for a positive integer k, the total number of flops required by the algorithm is 4.894N2.7760 - 16.165N2, which may be compared to a similar estimate for the Winograd algorithm, 3.732N2.8074 - 5N2 flops, N = 8 ċ 2k, the latter being current record bound among all known practical algorithms. Moreover, we present a pseudo-code of the algorithm which demonstrates its very moderate working memory requirements, much smaller than that of the best available implementations of Strassen and Winograd algorithms. For matrices of medium-large size (say, 2000 ≤ N < 10,000) we consider one-level algorithms and compare them with the (multilevel) Strassen and Winograd algorithms. The results of numerical tests clearly indicate that our accelerated matrix multiplication routines implementing two or three disjoint product-based algorithm are comparable in computational time with an implementation of Winograd algorithm and clearly outperform it with respect to working space and (especially) numerical stability. The tests were performed for the matrices of the order of up to 7000, both in double and single precision.
- Published
- 2004
- Full Text
- View/download PDF
15. Rounding and Chaining LLL: Finding Faster Small Roots of Univariate Polynomial Congruences
- Author
-
Jean-Sébastien Coron, Guénaël Renault, Phong Q. Nguyen, Rina Zeitoun, Jingguo Bi, Jean-Charles Faugère, Institute for Advanced Study [Tsinghua], Tsinghua University [Beijing] (THU), Cryptanalyse (CRYPT), Laboratoire Franco-Chinois d'Informatique, d'Automatique et de Mathématiques Appliquées (LIAMA), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de la Recherche Agronomique (INRA)-Chinese Academy of Sciences [Changchun Branch] (CAS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institute of Automation - Chinese Academy of Sciences-Centre National de la Recherche Scientifique (CNRS)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de la Recherche Agronomique (INRA)-Chinese Academy of Sciences [Changchun Branch] (CAS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institute of Automation - Chinese Academy of Sciences-Centre National de la Recherche Scientifique (CNRS)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria), Université du Luxembourg (Uni.lu), Polynomial Systems (PolSys), Laboratoire d'Informatique de Paris 6 (LIP6), Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Oberthur Card Systems - Puteaux, Oberthur Card Systems, Hugo Krawczyk, and Tsinghua University [Beijing]
- Subjects
Polynomial ,Speedup ,Small Roots of Polynomial Equations ,LLL ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Reduction (complexity) ,[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR] ,RSA ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,0202 electrical engineering, electronic engineering, information engineering ,Coppersmith's Algorithm ,Coppersmith ,Time complexity ,Mathematics ,Coppersmith–Winograd algorithm ,Discrete mathematics ,[INFO.INFO-SC]Computer Science [cs]/Symbolic Computation [cs.SC] ,Rounding ,people.profession ,020206 networking & telecommunications ,010201 computation theory & mathematics ,Lattice reduction ,people ,Algorithm - Abstract
International audience; In a seminal work at EUROCRYPT '96, Coppersmith showed how to find all small roots of a univariate polynomial congruence in polynomial time: this has found many applications in public-key cryptanalysis and in a few security proofs. However, the running time of the algorithm is a high-degree polynomial, which limits experiments: the bottleneck is an LLL reduction of a high-dimensional matrix with extra-large coefficients. We present in this paper the first significant speedups over Coppersmith's algorithm. The first speedup is based on a special property of the matrices used by Coppersmith's algorithm, which allows us to provably speed up the LLL reduction by rounding, and which can also be used to improve the complexity analysis of Coppersmith's original algorithm. The exact speedup depends on the LLL algorithm used: for instance, the speedup is asymptotically quadratic in the bit-size of the small-root bound if one uses the Nguyen-Stehlé L2 algorithm. The second speedup is heuristic and applies whenever one wants to enlarge the root size of Coppersmith's algorithm by exhaustive search. Instead of performing several LLL reductions independently, we exploit hidden relationships between these matrices so that the LLL reductions can be somewhat chained to decrease the global running time. When both speedups are combined, the new algorithm is in practice hundreds of times faster for typical parameters.
- Published
- 2014
- Full Text
- View/download PDF
16. A practical algorithm for faster matrix multiplication
- Author
-
Igor E. Kaporin
- Subjects
Freivalds' algorithm ,Multiplication algorithm ,Algebra and Number Theory ,Applied Mathematics ,Strassen algorithm ,Cuthill–McKee algorithm ,Workspace ,Arithmetic ,Algorithm ,Matrix multiplication ,Mathematics ,Coppersmith–Winograd algorithm ,Numerical stability - Abstract
The purpose of this paper is to present an algorithm for matrix multiplication based on a formula discovered by Pan [7]. For matrices of order up to 10 000, the nearly optimum tuning of the algorithm results in a rather clear non-recursive one- or two-level structure with the operation count comparable to that of the Strassen algorithm [9]. The algorithm takes less workspace and has better numerical stability as compared to the Strassen algorithm, especially in Winograd's modification [2]. Moreover, its clearer and more flexible structure is potentially more suitable for efficient implementation on modern supercomputers. Copyright © 1999 John Wiley & Sons, Ltd.
- Published
- 1999
- Full Text
- View/download PDF
17. Improving and estimating the accuracy of Strassen's algorithm
- Author
-
Bogdan Dumitrescu
- Subjects
Computational Mathematics ,Approximation error ,Applied Mathematics ,Numerical analysis ,Strassen algorithm ,Computer Science::Mathematical Software ,Estimator ,Multiplication ,Scaling ,Algorithm ,Matrix multiplication ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
Fast matrix multiplication algorithms of Strassen and Winograd are known to have weaker numerical accuracy than usual (inner product) multiplication. In this paper, we show that scaling usually improves accuracy when operands have elements of widely varying magnitude. We also propose estimators for numerical errors, based on samples of the result. All these estimators can be computed in \(O(n^2)\) operations. Experiments prove the effectiveness of the scaling idea and of the absolute error estimator.
- Published
- 1998
- Full Text
- View/download PDF
18. Breaking through memory limitation in GPU parallel processing using Strassen Algorithm
- Author
-
Robertus Hudi, Pujianto Yugopuspito, and Sutrisno
- Subjects
Divide and conquer algorithms ,Freivalds' algorithm ,Multiplication algorithm ,Computer science ,Cuthill–McKee algorithm ,Strassen algorithm ,Parallel algorithm ,Algorithm design ,Parallel computing ,Algorithm ,Coppersmith–Winograd algorithm - Abstract
Matrix multiplication is one of the basic operations in linear algebra that mostly used in computer science. For ages, applying naive algorithm to complete it has done it, and it has a standard complexity O(n3). Many researches are concluded to find more efficient and effective algorithm to process this operation, and one day Strassen has one that overcome the naive algorithm complexity with only O(n2.8074). The basic concept of this algorithm is divide and conquer (DnC) but with adjustment, to break the limitation of memory in a GPU computation. An idea to overcome this limit is to combine CPU and GPU processing, which implement Strassen algorithm. This paper shows the result of breaking through GPU memory limitation, checking the accuracy of the algorithm, compare the running time result and the most important thing is to make sure this algorithm can process bigger matrix than the naive one. Through the created charts and tables based on each performances, it showed that the maximum size of data samples that are processed by Strassen Algorithm reach 32.768 for (2n×2n) matrix size, which is bigger than the naive algorithm 8.192 could process. Also for all attempt to matrices with larger than 2048, the running times are way faster, about 0.04 seconds to 0.2 seconds for worst case scenario, and overcome the one with naive algorithm.
- Published
- 2013
- Full Text
- View/download PDF
19. Work-efficient matrix inversion in polylogarithmic time
- Author
-
Raoul Steffen, Peter Sanders, and Jochen Speck
- Subjects
Recursion ,Computer science ,Parallel algorithm ,Recursion (computer science) ,Positive-definite matrix ,Parallel computing ,Matrix multiplication ,Computer Science Applications ,Matrix (mathematics) ,Computational Theory and Mathematics ,Hardware and Architecture ,Modeling and Simulation ,Strassen algorithm ,Linear algebra ,Algorithm ,Massively parallel ,Critical path method ,Software ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
We present an algorithm for matrix inversion that combines the practical requirement of an optimal number of arithmetic operations and the theoretical goal of a polylogarithmic critical path length. The algorithm reduces inversion to matrix multiplication. It uses Strassen's recursion scheme but on the critical path, it breaks the recursion early switching to an asymptotically inefficient yet fast use of Newton's method. We also show that the algorithm is numerically stable. Overall, we get a candidate for a massively parallel algorithm that scales to exascale systems even on relatively small inputs. Preliminary experiments on multicore machines give the surprising result that even on such moderately parallel machines the algorithm outperforms Intel's Math Kernel Library and that Strassen's algorithm seems to be numerically more stable than one might expect.
- Published
- 2013
- Full Text
- View/download PDF
20. A HIGH PERFORMANCE PARALLEL STRASSEN IMPLEMENTATION
- Author
-
Brian C. Grayson, Robert A. van de Geijn, and Ajay P Shah
- Subjects
Reduction (complexity) ,Multiplication algorithm ,Computational complexity theory ,Hardware and Architecture ,Computer science ,Strassen algorithm ,Multiplication ,Parallel computing ,Software ,Matrix multiplication ,Theoretical Computer Science ,Intel Paragon ,Coppersmith–Winograd algorithm - Abstract
In this paper, we give what we believe to be the first high performance parallel implementation of Strassen''s algorithm for matrix multiplication. We show how under restricted conditions, this algorithm can be implemented plug compatible with standard parallel matrix multiplication algorithms. Results obtained on a large Intel Paragon system show a 10-20% reduction in execution time compared to what we believe to be the fastest standard parallel matrix multiplication implementation available at this time.
- Published
- 1996
- Full Text
- View/download PDF
21. Fast Multiplication of Interval Matrices (Interval Version of Strassen's Algorithm)
- Author
-
Vladik Kreinovich and Martine Ceberio
- Subjects
Discrete mathematics ,Applied Mathematics ,Interval arithmetic ,Computational Mathematics ,Strassen algorithm ,Computer Science::Mathematical Software ,Order (group theory) ,Interval (graph theory) ,Computer Science::Symbolic Computation ,Multiplication ,Algorithm ,Software ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
Strassen's algorithm multiplies two numerical matrices fast, but when applied to interval matrices, leads to excess width. We use Rump's interval arithmetic to propose an interval version of Strassen's algorithm whose only excess width is in second order terms.
- Published
- 2004
- Full Text
- View/download PDF
22. Analysis of the Time Complexity of Strassen Algorithm
- Author
-
Xiang Wang
- Subjects
Average-case complexity ,Discrete mathematics ,Matrix (mathematics) ,Multiplication algorithm ,Strassen algorithm ,MathematicsofComputing_NUMERICALANALYSIS ,Time complexity ,Algorithm ,Matrix multiplication ,P-matrix ,Mathematics ,Coppersmith–Winograd algorithm - Abstract
Algorithms of matrix multiplication are widely used in software engineering field. Application of Strassen algorithm makes a significant contribution to optimize the algorithm . Therefore, thorough study based on time complexity of matrix multiplication algorithm is very important. This paper talks about the time complexity of Strassen algorithm and general algorithm for matrix multiplication, and makes a comparison between the two algorithm routines so as to discuss the advantages and disadvantages of Strassen algorithm. The rational utilization plan of matrix multiplication algorithms is also discussed. Keywords-algorithms for matrix multiplication; time complexity; Strassen algorithm I. A GENERAL ALGORITHM FOR MATRIX MULTIPLICATION Let A=(aij)m × n be an m × n matrix, and let B=(bij)n × p be an n × p matrix. Given an (m×p) matrix A with r row partitions and s column partitions and a (p×n) matrix with s row partitions and t column partitions that are compatible with the partitions of A, the matrix product C=A×B. generally we can explain the matrix multiplication through the following expression, i.e.
- Published
- 2013
- Full Text
- View/download PDF
23. Brief announcement
- Author
-
James Demmel, Oded Schwartz, Olga Holtz, Grey Ballard, and Benjamin Lipshitz
- Subjects
Combinatorics ,Discrete mathematics ,Multiplication algorithm ,Strassen algorithm ,Parallel algorithm ,Scale (descriptive set theory) ,Computer Science::Numerical Analysis ,Scaling ,Upper and lower bounds ,Algorithm ,Matrix multiplication ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
A parallel algorithm has perfect strong scaling if its running time on $P$ processors is linear in $1/P$, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales. We obtain a memory-independent communication cost lower bound on classical and Strassen-based distributed-memory matrix multiplication algorithms. These bounds imply that no classical or Strassen-based parallel matrix multiplication algorithm can strongly scale perfectly beyond the ranges already attained by the two parallel algorithms mentioned above. The memory-independent bounds and the strong scaling bounds generalize to other algorithms.
- Published
- 2012
- Full Text
- View/download PDF
24. Communication-optimal parallel algorithm for strassen's matrix multiplication
- Author
-
Benjamin Lipshitz, Oded Schwartz, James Demmel, Olga Holtz, and Grey Ballard
- Subjects
FOS: Computer and information sciences ,Multiplication algorithm ,Freivalds' algorithm ,Computer science ,Parallel algorithm ,010103 numerical & computational mathematics ,0102 computer and information sciences ,Parallel computing ,Computational Complexity (cs.CC) ,01 natural sciences ,Strassen algorithm ,Computer Science - Data Structures and Algorithms ,FOS: Mathematics ,Mathematics - Combinatorics ,Data Structures and Algorithms (cs.DS) ,Mathematics - Numerical Analysis ,0101 mathematics ,Coppersmith–Winograd algorithm ,Computer Science - Numerical Analysis ,Numerical Analysis (math.NA) ,Supercomputer ,Computer Science::Numerical Analysis ,Matrix multiplication ,Computer Science - Computational Complexity ,Computer Science - Distributed, Parallel, and Cluster Computing ,010201 computation theory & mathematics ,Multiplication ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Combinatorics (math.CO) ,F.2.1 ,68W40, 68W10 - Abstract
Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The algorithm outperforms all known parallel matrix multiplication algorithms, classical and Strassen-based, both asymptotically and in practice. A critical bottleneck in parallelizing Strassen's algorithm is the communication between the processors. Ballard, Demmel, Holtz, and Schwartz (SPAA'11) prove lower bounds on these communication costs, using expansion properties of the underlying computation graph. Our algorithm matches these lower bounds, and so is communication-optimal. It exhibits perfect strong scaling within the maximum possible range. Benchmarking our implementation on a Cray XT4, we obtain speedups over classical and Strassen-based algorithms ranging from 24% to 184% for a fixed matrix dimension n=94080, where the number of nodes ranges from 49 to 7203. Our parallelization approach generalizes to other fast matrix multiplication algorithms., 13 pages, 3 figures
- Published
- 2012
- Full Text
- View/download PDF
25. GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm
- Author
-
Roger M. Smith, Craig C. Douglas, Michael A. Heroux, and Gordon Slishman
- Subjects
Numerical Analysis ,ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,Physics and Astronomy (miscellaneous) ,Computer science ,Interface (Java) ,Applied Mathematics ,MathematicsofComputing_NUMERICALANALYSIS ,Parallel computing ,Matrix multiplication ,Computer Science Applications ,Computational Mathematics ,Modeling and Simulation ,Strassen algorithm ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Multiplication ,Arithmetic ,Algorithm ,Matrix method ,Coppersmith–Winograd algorithm ,S-matrix - Abstract
Matrix-matrix multiplication is normally computed using one of the BLAS or a reinvention of part of the BLAS. Unfortunately, the BLAS were designed with small matrices in mind. When huge, well-conditioned matrices are multiplied together, the BLAS perform like the blahs, even on vector machines. For matrices where the coefficients are well conditioned. Winograd's variant of Strassen's algorithm offers some relief, but is rarely available in a quality form on most computers. We reconsider this method and offer a highly portable solution based on the Level 3 BLAS interface.
- Published
- 1994
- Full Text
- View/download PDF
26. KS ring theoretic approach for matrix multiplication
- Author
-
B. Karpagam and S. S. Dharwez
- Subjects
Multiplication algorithm ,Freivalds' algorithm ,Computer science ,Strassen algorithm ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Multiplication ,Scalar multiplication ,Algorithm ,Matrix multiplication ,Matrix chain multiplication ,Coppersmith–Winograd algorithm - Abstract
Matrix multiplication is a very common operation performed in many real world applications. So much of research has been done in reducing the time complexity of multiplying two matrices by researchers like Strassen, Don Coppersmith, Shmuel Winograd etc. All mathematical software also rely upon the common matrix multiplication algorithm. But, if the matrices that has to be applied has some special properties in some applications, then their multiplication complexity can be reduced. So, in this paper a new algorithm for multiplying a special kind of n x n matrices which are used in analyzing electrical circuits is devised. This algorithm outperforms the previous fast recursive algorithm for matrix multiplication.
- Published
- 2011
- Full Text
- View/download PDF
27. Random search algorithm for 2×2 matrices multiplication problem
- Author
-
Yu-Ren Zhou, Sheng-Jie Deng, Jing-Hui Zhu, and Hua-Qing Min
- Subjects
Freivalds' algorithm ,Multiplication algorithm ,Theoretical computer science ,Search algorithm ,Cuthill–McKee algorithm ,Strassen algorithm ,Algorithm ,Matrix multiplication ,Matrix chain multiplication ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
Since Volker Strassen proposed a recursive matrix multiplication algorithm reducing the time complexity to n2.81 in 1968, many scholars have done a lot of research on this basis. In recent years, researchers have proposed using computer algorithms to solve fast matrix multiplication problem. They have found Strassen's algorithm or other algorithms that have the same time complexity as Strassen algorithm by using genetic algorithm. In this paper, we used random search algorithm to find the matrix multiplication algorithms that require fewer multiplications. And we used combining Gaussian elimination for the first time to improve calculation speed; meanwhile we improved the local search technology to enhance the local search capability of the algorithm. In the numerical experiments of 2×2 matrices, the results verified the effectiveness of the algorithm. Compared with the existing genetic algorithm, the new method has obvious advantage of quick search, and found some of new matrix multiplication algorithms.
- Published
- 2010
- Full Text
- View/download PDF
28. A Strassen-like matrix multiplication suited for squaring and higher power computation
- Author
-
Marco Bodrato
- Subjects
Multiplication algorithm ,Exponentiation ,Strassen algorithm ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Arithmetic ,Exponentiation by squaring ,Matrix multiplication ,Matrix chain multiplication ,Polynomial matrix ,Mathematics ,Coppersmith–Winograd algorithm - Abstract
Strassen's method is not the asymptotically fastest known matrix multiplication algorithm, but it is the most widely used for large matrices. Since his manuscript was published, a number of variants have been proposed with different addition complexities. Here we describe a new one. The new variant is at least as good as those already known for simple matrix multiplication, but can save operations either for chain products or for squaring. Moreover it can be proved optimal for these tasks. The largest saving is shown for nth-power computation, in this scenario the additive complexity can be halved, with respect to original Strassen's.
- Published
- 2010
- Full Text
- View/download PDF
29. Finding Small Roots of Bivariate Integer Polynomial Equations: A Direct Approach
- Author
-
Jean-Sébastien Coron
- Subjects
Discrete mathematics ,Polynomial ,Direct method ,people.profession ,Bivariate analysis ,law.invention ,Combinatorics ,Integer ,law ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Computer Science::Symbolic Computation ,Lattice reduction ,Coppersmith ,people ,Cryptanalysis ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
Coppersmith described at Eurocrypt 96 an algorithm for finding small roots of bivariate integer polynomial equations, based on lattice reduction. A simpler algorithm was later proposed in [9], but it was asymptotically less efficient than Coppersmith's algorithm. In this paper, we describe an analogous simplification but with the same asymptotic complexity as Coppersmith. We illustrate our new algorithm with the problem of factoring RSA moduli with high-order bits known; in practical experiments our method is several orders of magnitude faster than [9].
- Published
- 2007
- Full Text
- View/download PDF
30. Adaptive Strassen's matrix multiplication
- Author
-
Paolo D'Alberto and Alexandru Nicolau
- Subjects
Speedup ,Adaptive algorithm ,Computer science ,Strassen algorithm ,Uniprocessor system ,CPU time ,Multiplication ,Parallel computing ,Matrix multiplication ,Coppersmith–Winograd algorithm - Abstract
Strassen's matrix multiplication (MM) has benefits with respect to any (highly tuned) implementations of MM because Strassen's reduces the total number of operations. Strassen achieved this operation reduction by replacing computationally expensive MMs with matrix additions (MAs). For architectures with simple memory hierarchies, having fewer operations directly translates into an efficient utilization of the CPU and, thus, faster execution. However, for modern architectures with complex memory hierarchies, the operations introduced by the MAs have a limited in-cache data reuse and thus poor memory-hierarchy utilization, thereby overshadowing the (improved) CPU utilization, and making Strassen's algorithm (largely) useless on its own.In this paper, we investigate the interaction between Strassen's effective performance and the memory-hierarchy organization. We show how to exploit Strassen's full potential across different architectures. We present an easy-to-use adaptive algorithm that combines a novel implementation of Strassen's idea with the MM from automatically tuned linear algebra software (ATLAS) or GotoBLAS. An additional advantage of our algorithm is that it applies to any size and shape matrices and works equally well with row or column major layout. Our implementation consists of introducing a final step in the ATLAS/GotoBLAS-installation process that estimates whether or not we can achieve any additional speedup using our Strassen's adaptation algorithm. Then we install our codes, validate our estimates, and determine the specific performance.We show that, by the right combination of Strassen's with ATLAS/GotoBLAS, our approach achieves up to 30%/22% speed-up versus ATLAS/GotoBLAS alone on modern high-performance single processors. We consider and present the complexity and the numerical analysis of our algorithm, and, finally, we show performance for 17 (uniprocessor) systems.
- Published
- 2007
- Full Text
- View/download PDF
31. Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm
- Author
-
Jean-Guillaume Dumas, Brice Boyer, Wei Zhou, Clément Pernet, Calculs Algébriques et Systèmes Dynamiques (CASYS), Laboratoire Jean Kuntzmann (LJK), Centre National de la Recherche Scientifique (CNRS)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Université Joseph Fourier - Grenoble 1 (UJF)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Centre National de la Recherche Scientifique (CNRS)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Université Joseph Fourier - Grenoble 1 (UJF)-Université Pierre Mendès France - Grenoble 2 (UPMF), PrograMming and scheduling design fOr Applications in Interactive Simulation (MOAIS), Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Symbolic Computation Group (SCG), University of Waterloo [Waterloo]-David R. Cheriton School of Computer Science, Jeremy Johnson, Hyungju Park, Erich Kaltofen, Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), and Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,Freivalds' algorithm ,Multiplication algorithm ,Computer science ,010102 general mathematics ,010103 numerical & computational mathematics ,Parallel computing ,01 natural sciences ,Matrix chain multiplication ,Matrix multiplication ,Scheduling (computing) ,Strassen algorithm ,Computer Science - Mathematical Software ,0101 mathematics ,Mathematical Software (cs.MS) ,[INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS] ,S-matrix ,Coppersmith–Winograd algorithm - Abstract
International audience; We propose several new schedules for Strassen-Winograd's matrix multiplication algorithm, they reduce the extra memory allocation requirements by three different means: by introducing a few pre-additions, by overwriting the input matrices, or by using a first recursive level of classical multiplication. In particular, we show two fully in-place schedules: one having the same number of operations, if the input matrices can be overwritten; the other one, slightly increasing the constant of the leading term of the complexity, if the input matrices are read-only. Many of these schedules have been found by an implementation of an exhaustive search algorithm based on a pebble game.
- Published
- 2007
- Full Text
- View/download PDF
32. Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms
- Author
-
Frédéric Suter, Frédéric Desprez, Grid Research and Innovation Laboratory (GRAIL), University of California [San Diego] (UC San Diego), University of California-University of California, Algorithms and Scheduling for Distributed Heterogeneous Platforms (GRAAL), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), University of California (UC)-University of California (UC), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
020203 distributed computing ,Computer Networks and Communications ,Computer science ,ScaLAPACK ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,Context (language use) ,0102 computer and information sciences ,02 engineering and technology ,Parallel computing ,computer.software_genre ,01 natural sciences ,Matrix multiplication ,Computer Science Applications ,Theoretical Computer Science ,Computational Theory and Mathematics ,Grid computing ,010201 computation theory & mathematics ,Strassen algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Parallelism (grammar) ,Implementation ,computer ,Algorithm ,Software ,Coppersmith–Winograd algorithm - Abstract
In this paper we study the impact of the simultaneous exploitation of data- and task-parallelism, so called mixed-parallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the Client-Agent(s)-Server(s) model, where data can already be distributed on the platform. For each of those algorithms, we propose two mixed-parallel implementations. The former follows the phases of the original algorithms while the latter has been designed as the result of a list scheduling algorithm. We give a theoretical comparison, in terms of memory usage and execution time, between our algorithms and classical data-parallel implementations. This analysis is corroborated by experiments. Finally, we give some hints about heterogeneous and recursive versions of our algorithms
- Published
- 2004
- Full Text
- View/download PDF
33. A less recursive variant of Karatsuba-Ofman algorithm for multiplying operands of size a power of two
- Author
-
Çetin Kaya Koç and Serdar Süer Erdem
- Subjects
Freivalds' algorithm ,Binary GCD algorithm ,Ramer–Douglas–Peucker algorithm ,Karatsuba algorithm ,Output-sensitive algorithm ,Algorithm design ,Suurballe's algorithm ,Algorithm ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
We propose a new algorithm for fast multiplication of large integers having a precision of 1k computer words, where k is an integer. The algorithm is derived from the Karatsuba-Ofman Algorithm and has the same asymptotic complexity. However, the running time of the new algorithm is slightly better, and it makes one third as many recursive calls.
- Published
- 2004
- Full Text
- View/download PDF
34. Variants of matrix-matrix multiplication for Fortran-90
- Author
-
Craig C. Douglas and Gordon Slishman
- Subjects
Syntax (programming languages) ,Fortran ,Computer science ,Intrinsic function ,General Medicine ,Extension (predicate logic) ,Matrix multiplication ,Matrix (mathematics) ,Strassen algorithm ,Arithmetic ,Algorithm ,computer ,Coppersmith–Winograd algorithm ,computer.programming_language - Abstract
The Fortran-90 standard requires an intrinsic function matmul which multiplies two matrices together to produce a third as the result. However, the standard does not specify which algorithm to use. We consider an extension to the matmul syntax which allows a Winograd variant of Strassen's algorithm to be added. We discuss an implementation that is in a commercial Fortran-90 offering.
- Published
- 1994
- Full Text
- View/download PDF
35. On the 2*2 matrix multiplication
- Author
-
N.H. Bshouty
- Subjects
Discrete mathematics ,Computational complexity theory ,Matrix algebra ,Computer science ,Bilinear interpolation ,Field (mathematics) ,Matrix multiplication ,Matrix chain multiplication ,Coppersmith–Winograd algorithm ,Binary fields - Abstract
In SIAM J. Comput., vol.5, p.187-203 (1976), R.L. Probert proved that 15 additive operations are necessary and sufficient to multiply two 2*2 matrices over the binary field by a bilinear algorithm using seven non-scalar multiplications. The author proves this result for arbitrary field. The algorithm of Winograd is used to classify all such algorithms (S. Winograd, 1971). >
- Published
- 2002
- Full Text
- View/download PDF
36. On the implementation of Strassen's fast multiplication algorithm
- Author
-
Martin Roth and Jacques Cohen
- Subjects
Scheme (programming language) ,Multiplication algorithm ,Computer Networks and Communications ,Matrix multiplication ,Strassen algorithm ,Theory of computation ,Overhead (computing) ,Paging ,Arithmetic ,Algorithm ,computer ,Software ,Information Systems ,Coppersmith–Winograd algorithm ,computer.programming_language ,Mathematics - Abstract
A detailed analysis of Strassen's multiplication algorithm is presented; the analysis consists in deriving a symbolic formula, called time-formula, expressing the time taken to perform matrix multiplications using a combination of Strassen's and the regular method. The variables in the formula represent the time taken to perform the basic operations that can be executed in present day computers; these include arithmetic operations; subscripting, iteration overhead, etc. By binding actual numerical values, corresponding to a given compiler-machine configuration, to the variables one can determine the execution time for that configuration. The derived time-formula corresponds to an optimized and non-recursive version of Strassen's algorithm. This version uses a special scheme for positioning the elements of the matrices, the scheme being particularly suitable for usage in a paging environment. It has been found that Strassen's method can be used in multiplying 40 × 40 matrices (on an Algol-PDP-10 configuration) in a slightly better time than the standard method. Curves expressing the speed-up achieved by Strassen's method are presented. This information is substantiated by benchmarks.
- Published
- 1976
- Full Text
- View/download PDF
37. The connection between two multiplication algorithms
- Author
-
O.M. Makarov
- Subjects
Algebra ,Freivalds' algorithm ,Multiplication algorithm ,Product (mathematics) ,Strassen algorithm ,Computer Science::Mathematical Software ,General Engineering ,Computer Science::Symbolic Computation ,Algorithm ,Matrix multiplication ,Mathematics ,Connection (mathematics) ,Coppersmith–Winograd algorithm - Abstract
IT IS shown that Karatsubi's algorithm, applied to the problem of calculating the product of matrices, is transformed into Strassen's algorithm.
- Published
- 1975
- Full Text
- View/download PDF
38. On multiplication of 2 × 2 matrices
- Author
-
Shmuel Winograd
- Subjects
Combinatorics ,Numerical Analysis ,Algebra and Number Theory ,Discrete Mathematics and Combinatorics ,Multiplication ,Geometry and Topology ,Complex number ,Mathematics ,Coppersmith–Winograd algorithm - Abstract
The two main results of this note are: (i) The minimum number of multiplications required to multiply two 2 X 2 matrices is seven. (ii) The minimum number of multiplications/divisions required to multiply two complex numbers is three.
- Full Text
- View/download PDF
39. ALGORITHMS FOR MATRIX MULTIPLICATION
- Author
-
Richard P. Brent
- Subjects
Numerical analysis ,Strassen algorithm ,Algorithm ,Scaling ,Matrix multiplication ,Coppersmith–Winograd algorithm ,Mathematics - Abstract
Strassen''s and Winograd''s algorithms for matrix multiplication are investigated and compared with the normal algorithm. Floating-point error bounds are obtained, and it is shown that scaling is essential for numerical accuracy using Winograd''s method. In practical cases Winograd''s method appears to be slightly faster than the other two methods, but the gain is, at most, about 20%. Finally, an attempt to generalize Strassen''s method is described.
- Published
- 1970
- Full Text
- View/download PDF
40. A simple proof of Strassen's result
- Author
-
Gideon Yuval
- Subjects
Discrete mathematics ,Computer science ,Simple (abstract algebra) ,Strassen algorithm ,Signal Processing ,Matrix multiplication ,Computer Science Applications ,Information Systems ,Theoretical Computer Science ,Coppersmith–Winograd algorithm - Published
- 1978
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.