A high-performance connected components implementation for GPUs

Authors :: Jayadharini Jaiganesh
Martin Burtscher
Source :: HPDC
Publication Year :: 2018
Publisher :: ACM, 2018.
Abstract: Computing connected components is an important graph algorithm that is used, for example, in medicine, image processing, and biochemistry. This paper presents a fast connected-components implementation for GPUs called ECL-CC. It builds upon the best features of prior algorithms and augments them with GPU-specific optimizations. For example, it incorporates a parallelism-friendly version of pointer jumping to speed up union-find operations and uses several compute kernels to exploit the multiple levels of hardware parallelism. The resulting CUDA code is asynchronous and lock free, employs load balancing, visits each edge exactly once, and only processes edges in one direction. It is 1.8 times faster on average than the fastest prior GPU implementation running on a Titan X and faster on most of the eighteen real-world and synthetic graphs we tested.

Subjects :: Connected component
Speedup
Computer science
020207 software engineering
02 engineering and technology
Parallel computing
Load balancing (computing)
Pointer jumping
CUDA
Titan (supercomputer)
Asynchronous communication
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Non-blocking algorithm

Database :: OpenAIRE
Journal :: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
Accession number :: edsair.doi...........1d2251e747cf8dc2e4114dac5db0b490

Tools