Start Over

SIMD-Constrained Lookup Table for Accelerating Variable-Weighted Convolution on x86/64 CPUs

Authors :: Yuki Naganawa
Hirokazu Kamei
Yamato Kanetaka
Haruki Nogami
Yoshihiro Maeda
Norishige Fukushima
Source :: IEEE Access, Vol 12, Pp 15800-15819 (2024)
Publication Year :: 2024
Publisher :: IEEE, 2024.
Abstract: Convolution is the inner product of the neighborhood signal and weights and plays a fundamental role in image processing; thus, acceleration of convolution is essential. Among convolutions, variable-weighted convolution is used in adaptive filters and edge-preserving smoothing to realize various applications. Some weights are replaced with lookup tables (LUTs) to accelerate these filters. LUT reference is a classical acceleration method. However, the difference between the growth rate in computing speed and memory I/O speed has limited the scope of utilization of LUT references. Speedup would be possible if registers could be used as LUTs, but their small size makes them difficult to utilize. Therefore, this study proposes a downsampling method to fit LUTs into SIMD registers, which are relatively large and an efficient reference method for register-LUTs. Experimental results show that the proposed method can reproduce an accuracy in PSNR of 65.52 (+25.11) dB, while a simple full-size LUT in the register size can only reproduce 40.41 dB. Using a wider register width, the PSNR was 78.63 (+38.22) dB with AVX-512 and 84.5 (+44.09) dB with bfloat16. The fastest proposed method was on average 4.82/3.72 times faster than direct vector computing, 2.99/3.10 times faster than vector addressing, and 3.79/7.80 times faster than scalar addressing on the AVX2/AVX-512 computers while exceeding the display limit of 60 dB for 8-bit displays. Taking into account these speed/accuracy trade-offs, the performance of the proposed method was superior. This paper shows that LUT references can be realized with small SIMD registers in convolution. The proposed method is expected to be extended to adaptive filters, convolutional neural networks, and other image processing applications by accelerating the approximation with this register-LUT. Our code is available at https://fukushimalab.github.io/registerLUT4conv/.

Subjects :: Approximate computing
bilateral filtering
high-dimensional kernel filtering
high-performance computing
image filtering
nonlinear filters
Electrical engineering. Electronics. Nuclear engineering
TK1-9971

Details

Language :: English
ISSN :: 21693536
Volume :: 12
Database :: Directory of Open Access Journals
Journal :: IEEE Access
Publication Type :: Academic Journal
Accession number :: edsdoj.82f5f42a4b724fa295220c97c71ff39b
Document Type :: article
Full Text :: https://doi.org/10.1109/ACCESS.2024.3354720

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

SIMD-Constrained Lookup Table for Accelerating Variable-Weighted Convolution on x86/64 CPUs

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

SIMD-Constrained Lookup Table for Accelerating Variable-Weighted Convolution on x86/64 CPUs

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources