22 results on '"*CUDA (Computer architecture)"'
Search Results
2. Numerical Simulation of Compressible Flows on Heterogeneous Computational Architecture.
- Author
-
Kashkovsky, Alexander V., Shershnev, Anton A., and Vashchenkov, Pavel V.
- Subjects
- *
SUPERCOMPUTERS , *INTEL computers , *SOURCE code , *MOTHERBOARDS , *CUDA (Computer architecture) - Abstract
The technology of adaptation of the HyCFS numerical code, which was originally developed for supercomputers with graphical processor units (GPUs), to various computational platforms, such as conventional CPU-based systems and new supercomputers based on the Intel Xeon Phi co-processors is developed. The main idea of adaptation is to use OpenMP threads instead of CUDA threads. This approach provides a possibility of using a unified source code for different platforms. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
3. Iteration Methods Comparison in Parallel Implementation of the Two-Dimensional Liquid Convection Problem.
- Author
-
Popov, V. N. and Tsivinskaya, Yu. S.
- Subjects
- *
ITERATIVE methods (Mathematics) , *CONVECTIVE flow , *JACOBI method , *LINEAR equations , *CUDA (Computer architecture) - Abstract
Two-dimensional liquid convection problem is implemented using parallel computations based on CUDA technology. The Jacobi method and the conjugate gradient method are used to solve the systems of linear equations (SLE) with sparse matrices, obtained by the initial equations approximation. The calculation time on the central processor of the PC is compared with the time when using GPUs, the calculations acceleration with the unknowns number increase is estimated. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
4. Numerical Modeling of Seismic Wave Propagation Generated by the Electromagnetic Pulse Source in Layered Medium.
- Author
-
Varygina, M. P. and Chentsov, E. P.
- Subjects
- *
ELECTROMAGNETIC pulses , *THEORY of wave motion , *RHEOLOGY , *FINITE differences , *SEISMIC waves , *CUDA (Computer architecture) - Abstract
The processes of seismic waves propagation generated by the non-explosive electromagnetic pulse source "Yenisei" are under investigation. Rheological properties of the layered medium including fractured interlayers are taken into account. For numerical implementation, a computational algorithm based on the space-variable two-cyclic splitting method in combination with monotone finite-difference schemes is developed. Parallel software for the analysis of the fields of velocities and stresses in layered medium is designed. Parallelisation of computations is performed by the CUDA technology for supercomputers with graphical accelerators. The results of numerical computations of seismic waves propagation are shown. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
5. Comparison of Parallel Data Processing and its Performance.
- Author
-
Botor, Tomáš and Habiballa, Hashim
- Subjects
- *
PARALLEL processing , *INFORMATION storage & retrieval systems , *CUDA (Computer architecture) , *C++ , *SOURCE code , *GRAPHICS processing units - Abstract
Research is focused on parallelization of code in C++ language, using TBB library, OpenMP directives and CUDA technology. There are also comparison of results include. Thanks results we can recommend the best technology for optimization of source code. We present experimental results of computational efficiency of several types of parallelization techniques including GPU execution units. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
6. CUDA Application on Two-Dimensional CFD Problems.
- Author
-
Tsivinskaya, Yu. S.
- Subjects
- *
CUDA (Computer architecture) , *COMPUTATIONAL fluid dynamics , *MEASUREMENT of flow velocity , *LINEAR equations , *APPROXIMATION theory - Abstract
Parallel computations based on CUDA technology are applied for solving two-dimensional problems. Heat distribution and velocity field calculation in a liquid is considered. Various iterative methods are used to solve simultaneous linear equations with sparse matrices, obtained by the initial equations approximation. The efficiency of using GPUs to accelerate calculations is estimated, which grows with the unknowns number increase. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
7. Parallelized Implementation of Dynamical Particle System.
- Author
-
Mašek, Jan, Frantík, Petr, and Vořechovský, Miroslav
- Subjects
- *
DYNAMICAL systems , *SIMULATION methods & models , *CUDA (Computer architecture) , *APPLIED mathematics , *COMPUTER architecture - Abstract
The paper presents approaches to implementation of solution of discrete dynamical system of mutually repelling particles. Two platforms: a single-thread JAVA process and parallelized CUDA C solution, are employed for the dynamical simulation. Qualities of both platforms are discussed and explained as their performance when solving two proposed interaction laws is compared. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
8. Study of Homogeneity and Inhomogeneity Phantom in CUDA EGS for Small Field Dosimetry.
- Author
-
Yani, Sitti, Rhani, Mohamad Fahdillah, Haryanto, Freddy, and Arif, Idam
- Subjects
- *
IMAGING phantoms , *RADIATION dosimetry , *CUDA (Computer architecture) , *X-ray imaging , *MONTE Carlo method - Abstract
CUDA EGS was CUDA implementation to simulate transport photon in a material based on Monte Carlo algorithm for X-ray imaging. The objective of this study was to investigate the effect of inhomogeneities in inhomogeneity phantom for small field dosimetry (1×1, 2×2, 3×3, 4×4 and 5×5 cm2). Two phantoms, homogeneity and inhomogeneity phantom were used. The interaction in homogeneity and inhomogeneity phantom was dominated by Compton interaction and multiple scattering. The CUDA EGS can represent the inhomogeneity effect in small field dosimetry by combining the grayscale curve between homogeneity and inhomogeneity phantom. The grayscale curve in inhomogeneity phantom is not asymmetric because of the existence of different material in phantom. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
9. On Some Questions in Computer Modeling of the Reachability Sets Constructing Problems.
- Author
-
Ushakov, V. N., Parshikov, G. V., and Matviychuk, A. R.
- Subjects
- *
REACHABLE sets (Set theory) , *PROBLEM solving , *APPROXIMATION theory , *GRAPHICS processing units , *CUDA (Computer architecture) - Abstract
The research considers the problem of constructing the reachability sets of non-linear dynamical system in n-dimensional Euclidean space on the fixed time interval. The approximate solution methods of the reachability sets constructing are considered in this research as well as the accuracy estimation for this methods is given. The research contains the computational experiments on computer modeling of described reachability sets constructing methods, which use the algorithms implemented for two computation technologies CPU as well as GPU (using CUDA technology). In this research the description and comparison of approaches to the computer modeling of the problem are given. Furthermore, the CPU-based computer modeling result comparison with the result obtained on GPU based on CUDA technology are presented. Besides, this research discusses some the side issues appeared during computer modeling, the issues raised during the computer algorithms implementation, as well as the ways to eliminate these issues or reduce their impact. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
10. Research on the simulation of PF-LBM model based on MPI+CUDA mixed granularity parallel.
- Author
-
Zhu, Changsheng, Liu, Jieqiong, Feng, Li, and Deng, Xin
- Subjects
- *
SIMULATION methods & models , *LEAN body mass , *GRANULAR materials , *CUDA (Computer architecture) , *MAGNETIC particle imaging - Abstract
A microstructure numerical model is an intensive computational problem, for which the simulation time is too long and the simulation scale is too small. To solve these two problems, in this article, we use MPI+CUDA hybrid particle heterogeneous parallel computing to implement the dendrite growth simulation of a PF-LBM phase-field 3D model. Message Passing Interface (MPI) can be used to conduct coarse granularity division, to break through the limitation of the simulate scale in a single machine. In each node, fine-grained division is implemented by the Compute Unified Device Architecture (CUDA) parallel way to realize the completely parallelism intra-node, and to improve overall computational efficiency. At the same time, in this article, the "pseudo three-dimensional array" programming method is brought up in CUDA programming, and also to improve the CUDA random number generation method, in order to simplify the CUDA array programming and reduce the CUDA random number generation time purposes. Experiments show that at the same simulation scale, the speed-up ratio with 21 nodes MPI+CUDA was 57, which was increased 54% over the 21 nodes MPI. Under the condition of computing efficiency close, the largest simulation scale with 21 nodes MPI+CUDA was 4203, which is 13 times to single GPU. Therefore, the MPI + CUDA hybrid granularity parallel method proposed in this paper also has the advantages of high computational efficiency of the GPU and MPI to expand the simulation scale. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
11. Accelerating large-scale phase-field simulations with GPU.
- Author
-
Xiaoming Shi, Houbing Huang, Guoping Cao, and Xingqiao Ma
- Subjects
- *
SIMULATION methods & models , *GRAPHICS processing units , *FOURIER transforms , *HOMOGENEITY , *CUDA (Computer architecture) - Abstract
A new package for accelerating large-scale phase-field simulations was developed by using GPU based on the semi-implicit Fourier method. The package can solve a variety of equilibrium equations with different inhomogeneity including long-range elastic, magnetostatic, and electrostatic interactions. Through using specific algorithm in Compute Unified Device Architecture (CUDA), Fourier spectral iterative perturbation method was integrated in GPU package. The Allen-Cahn equation, Cahn-Hilliard equation, and phase-field model with long-range interaction were solved based on the algorithm running on GPU respectively to test the performance of the package. From the comparison of the calculation results between the solver executed in single CPU and the one on GPU, it was found that the speed on GPU is enormously elevated to 50 times faster. The present study therefore contributes to the acceleration of large-scale phase-field simulations and provides guidance for experiments to design large-scale functional devices. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
12. Multispectral Image Segmentation Using Parallel Mean Shift Algorithm and CUDA Technology.
- Author
-
Zghidi, Hafedh, Walczak, Maksym, and Świtońskia, Adam
- Subjects
- *
IMAGE segmentation , *ALGORITHMS , *CUDA (Computer architecture) , *MULTISPECTRAL imaging , *NOISE - Abstract
We present a parallel mean shift algorithm running on CUDA and its possible application in segmentation of multispectral images. The aim of this paper is to present a method of analyzing highly noised multispectral images of various objects, so that important features are enhanced and easier to identify. The algorithm finds applications in analysis of multispectral images of eyes so that certain features visible only in specific wavelengths are made clearly visible despite high level of noise, for which processing time is very long. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
13. Some Features of the CUDA Implementation of the Semi-Lagrangian Method for the Advection Problem.
- Author
-
Efremov, A., Karepova, E., and Vyatkin, A.
- Subjects
- *
CUDA (Computer architecture) , *LAGRANGE equations , *ADVECTION-diffusion equations , *COMPUTER algorithms , *FEATURE extraction - Abstract
In the paper the semi-Lagrangian method is considered in the context of its implementation with the CUDA technology. We have scrutinized the bottleneck of our sequential algorithm; its parallel versions are studied in detail; and the main reason of poor CUDA performance is clarified. As the result, we revise the computation of partial integrals in order to improve the efficiency of the algorithm. Numerical experiments demonstrate good CUDA performance of the revised version of the algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
14. GPU Accelerated Flow Computation by the Streamfunction-velocity (ψ-ν) Formulation.
- Author
-
Kalita, Jiten C., Upadhyaya, Parikshit, and Gupta, Murli M.
- Subjects
- *
MATHEMATICAL optimization , *ITERATIVE methods (Mathematics) , *INCOMPRESSIBLE flow , *INTEGRATED circuits , *BIHARMONIC equations , *NAVIER-Stokes equations , *CUDA (Computer architecture) - Abstract
In this work, we present an optimization strategy for implementing the BiCGStab iterative solver on graphic processing units (GPU) for computing incompressible viscous flows governed by the unsteady Navier-Stokes (N-S) equations on a CUDA platform. A recently developed ψ-ν formulation is used to discretize the biharmonic form of the N-S equation and we obtain remarkable speed up of 40 times on finer grids for the lid-driven square cavity flow. The GPU implementation enabled us to compute the flow in extremely finer grids and very small scales were resolved with remarkable accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
15. Multi-GPU Kinetic Solvers using MPI and CUDA.
- Author
-
Zabelok, Sergey, Arslanbekov, Robert, and Kolobov, Vladimir
- Subjects
- *
GRAPHICS processing units , *CUDA (Computer architecture) , *LATTICE Boltzmann methods , *MESSAGE passing (Computer science) , *KERNEL operating systems , *HETEROGENEOUS computing - Abstract
This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. The main challenge of porting UFS to graphics processing units (GPUs) comes from the dynamically adapted mesh, which causes irregular data access. We describe the implementation of CUDA kernels for three modules in UFS: the direct Boltzmann solver using discrete velocity method (DVM), the DSMC module, and the Lattice Boltzmann Method (LBM) solver, all using octree Cartesian mesh with adaptive Mesh Refinement (AMR). Double digit speedup on single GPU and good scaling for multi-GPU has been demonstrated. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
16. A GPU Algorithm for Minimum Vertex Cover Problems.
- Author
-
Kouta Toume, Daiki Kinjo, and Morikazu Nakamura
- Subjects
- *
GRAPHICS processing units , *GRAPH theory , *CUDA (Computer architecture) , *COMPUTER algorithms , *PARALLEL computers , *DATA mining - Abstract
The minimum vertex cover problem is one of the fundamental problems in graph theory and is known to be NPhard. For data mining in large-scale structured systems, we proposes a GPU algorithm for the minimum vertex cover problem. The algorithm is designed to derive sufficient parallelism of the problem for the GPU architecture and also to arrange data on the device memory for efficient coalesced accessing. Through the experimental evaluation, we demonstrate that our GPU algorithm is quite faster than CPU programs and the speedup becomes much evident when the graph size is enlarged. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
17. Real-time maximum a-posteriori image reconstruction for fluorescence microscopy.
- Author
-
Jabbar, Anwar A., Dilipkumar, Shilpa, C. K., Rasmi, Rajan, K., and Mondal, Partha P.
- Subjects
- *
IMAGE reconstruction , *FLUORESCENCE microscopy , *THREE-dimensional imaging , *GRAPHICS processing units , *CUDA (Computer architecture) , *IMAGE reconstruction algorithms , *CENTRAL processing units - Abstract
Rapid reconstruction of multidimensional image is crucial for enabling real-time 3D fluorescence imaging. This becomes a key factor for imaging rapidly occurring events in the cellular environment. To facilitate real-time imaging, we have developed a graphics processing unit (GPU) based real-time maximum a-posteriori (MAP) image reconstruction system. The parallel processing capability of GPU device that consists of a large number of tiny processing cores and the adaptability of image reconstruction algorithm to parallel processing (that employ multiple independent computing modules called threads) results in high temporal resolution. Moreover, the proposed quadratic potential based MAP algorithm effectively deconvolves the images as well as suppresses the noise. The multi-node multi-threaded GPU and the Compute Unified Device Architecture (CUDA) efficiently execute the iterative image reconstruction algorithm that is ≈200-fold faster (for large dataset) when compared to existing CPU based systems. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
18. CUDA Memory Limitation in Finite Element Optimization to Reconstruct Cracks.
- Author
-
Sivasuthan, Sivamayam, Karthik, Victor U., and Hoole, S. Ratnajeevan H.
- Subjects
- *
CUDA (Computer architecture) , *IMAGE reconstruction algorithms , *FINITE element method , *NONDESTRUCTIVE testing , *MATHEMATICAL optimization - Abstract
In the nondestructive evaluation (NDE) of steel plates, besides the detection of cracks, what is also important is their characterization. Characterization is necessary for determining whether any discovered crack demands withdrawal of the part from service. In eddy current crack identification the response of a part to a coil is compared to the response without a crack. When different, the presence of the crack is flagged. But to characterize it, the computed response from eddy current analysis with a crack described by parameters is optimized to match measurements. This is heavy computation. Recently, Graphical Processing Unit (GPU) computing has had great success in many very large numerical computations. In this work we discuss the often undiscussed GPU memory limitation in Finite Element Optimization. In GPU computing the memory of the NVIDIA GPU is limited (4 GB on a PC today). This paper assesses the memory limits in terms of matrix size in light of the various ways to store a large matrix in order to overcome these limits. We revive old element-by-element finite element solvers from the early 1980s of working on a highly memory limited PC 282 to launch a GA kernel on thousands of CUDA threads exploiting the NVIDIA GPU architecture. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
19. Software Optimization for Electrical Conductivity Imaging in Polycrystalline Diamond Cutters.
- Author
-
Bogdanov, G., Wiggins, J., Bertagnolli, K., and Ludwig, R.
- Subjects
- *
COMPUTER software , *MATHEMATICAL optimization , *ELECTRIC conductivity , *POLYCRYSTALS , *DIAMONDS , *CUDA (Computer architecture) - Abstract
We previously reported on an electrical conductivity imaging instrument developed for measurements on polycrystalline diamond cutters. These cylindrical cutters for oil and gas drilling feature a thick polycrystalline diamond layer on a tungsten carbide substrate. The instrument uses electrical impedance tomography to profile the conductivity in the diamond table. Conductivity images must be acquired quickly, on the order of 5 sec per cutter, to be useful in the manufacturing process. This paper reports on successful efforts to optimize the conductivity reconstruction routine, porting major portions of it to NVIDIA GPUs, including a custom CUDA kernel for Jacobian computation. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
20. High Speed Finite Element Simulations on the Graphics Card.
- Author
-
Huthwaite, P. and Lowe, M. J. S.
- Subjects
- *
FINITE element method , *SIMULATION methods & models , *CUDA (Computer architecture) , *MOTHERBOARDS , *ELASTODYNAMICS - Abstract
A software package is developed to perform explicit time domain finite element simulations of ultrasonic propagation on the graphical processing unit, using Nvidia's CUDA. Of critical importance for this problem is the arrangement of nodes in memory, allowing data to be loaded efficiently and minimising communication between the independently executed blocks of threads. The initial stage of memory arrangement is partitioning the mesh; both a well established 'greedy' partitioner and a new, more efficient 'aligned' partitioner are investigated. A method is then developed to efficiently arrange the memory within each partition. The technique is compared to a commercial CPU equivalent, demonstrating an overall speedup of at least 100 for a non-destructive testing weld model. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
21. Graphics processing unit accelerated three-dimensional model for the simulation of pulsed low-temperature plasmas.
- Author
-
Fierro, Andrew, Dickens, James, and Neuber, Andreas
- Subjects
- *
LOW temperature plasmas , *GRAPHICS processing units , *THREE-dimensional modeling , *BOLTZMANN'S equation , *CUDA (Computer architecture) , *SIMULATION methods & models - Abstract
A 3-dimensional particle-in-cell/Monte Carlo collision simulation that is fully implemented on a graphics processing unit (GPU) is described and used to determine low-temperature plasma characteristics at high reduced electric field, E/n, in nitrogen gas. Details of implementation on the GPU using the NVIDIA Compute Unified Device Architecture framework are discussed with respect to efficient code execution. The software is capable of tracking around 10 × 106 particles with dynamic weighting and a total mesh size larger than 108 cells. Verification of the simulation is performed by comparing the electron energy distribution function and plasma transport parameters to known Boltzmann Equation (BE) solvers. Under the assumption of a uniform electric field and neglecting the build-up of positive ion space charge, the simulation agrees well with the BE solvers. The model is utilized to calculate plasma characteristics of a pulsed, parallel plate discharge. A photoionization model provides the simulation with additional electrons after the initial seeded electron density has drifted towards the anode. Comparison of the performance benefits between the GPU-implementation versus a CPU-implementation is considered, and a speed-up factor of 13 for a 3D relaxation Poisson solver is obtained. Furthermore, a factor 60 speed-up is realized for parallelization of the electron processes. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
22. A CUBLAS-CUDA Implementation of PCG Method of an Ocean Circulation Model.
- Author
-
Farina, R., Cuomo, S., and De Michele, P.
- Subjects
- *
CUDA (Computer architecture) , *OCEAN circulation , *CONJUGATE gradient methods , *NUMERICAL solutions to equations , *LAPLACE'S equation - Abstract
A numerical model of an ocean global circulation is presented. It consists of the discretization of Laplace's problem by means of finite differences scheme of second order that gives a linear system solved by Preconditioned Conjugate Gradient Method (PCG). In this work, we observe that the performance and the accuracy of PCG solver depend on the grid resolution and the ratio of Laplace's coefficients. Moreover, a case study of an implementation of PCG solver with diagonal preconditioner on multi-core GPU architecture based on CUBLAS library is proposed. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.