Author: "Deng, Dazheng" / Topic: energy consumption - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Deng, Dazheng"' showing total 2 results

Start Over Author "Deng, Dazheng" Topic energy consumption

2 results on '"Deng, Dazheng"'

1. Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning.

Author: Wang, Yang, Qin, Yubin, Deng, Dazheng, Wei, Jingchuan, Chen, Tianbao, Lin, Xinhan, Liu, Leibo, Wei, Shaojun, and Yin, Shouyi
Subjects: FEEDFORWARD neural networks, WEIGHT training, RANDOM access memory, KNOWLEDGE transfer, ENERGY consumption
Abstract: Transfer learning, which transfers knowledge from source datasets to target datasets, is practical for adaptive deep neural network (DNN) applications. When considering user privacy and communication bandwidth issues, edge devices’ training is essential for transfer learning. Nevertheless, training requires repeating feedforward (FF), backpropagation (BP), and weight gradient (WG) millions of times, introducing prohibitive computation for edge devices. A promising method to reduce training computation is sparse DNN training (SDT), which dynamically prunes weights during training iterations and performs FF, BP, and WG only with unpruned weights. However, SDT suffers implicit redundancy and reuse imbalance for convolution layers. Besides, it turns bottlenecks into batch normalization (BN) layers. Therefore, it is challenging to achieve energy-efficient SDT computing. This article proposes a processor, Trainer, solving the above challenges with three features. First, a speculation mechanism removes implicit redundant operations, which have nonzeros’ input, weight, or output, but are ineffective for training. Second, a dynamic sparsity adaptive dataflow tackles the reuse imbalance, improving energy efficiency (EE) for dynamic sparse convolution in SDT. Third, a computational dependence decoupled BN unit eliminates BN’s repeated data access to reduce training energy and time. Trainer is fabricated in 28-nm CMOS technology and occupies 20.96 mm2 of area. It achieves a peak EE of 173.28TFLOPS/W@FP16 (276.55TFLOPS/W@FP8) for a 90% activation sparsity and 90% weight sparsity. The sparsity to EE conversion ratio is 80.9, outperforming the previous work by 1.55 $\times $. When training a ResNet18 model with SDT, Trainer reduces energy by 2.23 $\times $ and time by 1.76 $\times $ than the state-of-the-art sparse training processor. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

2. PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing.

Author: Wang, Yang, Deng, Dazheng, Liu, Leibo, Wei, Shaojun, and Yin, Shouyi
Subjects: *ENERGY consumption, *RANDOM access memory, *ENERGY transfer
Abstract: Edge device deep neural network (DNN) training is practical to improve model adaptivity for unfamiliar datasets while avoiding privacy disclosure and huge communication cost. Nevertheless, apart from feed-forward (FF) as inference, DNN training still requires back-propagation (BP) and weight gradient (WG), introducing power-consuming floating-point computing requirements, hardware underutilization, and energy bottleneck from excessive memory access. This paper proposes a DNN training processor named PL-NPU to solve the above challenges with three innovations. First, a posit-based logarithm-domain processing element (PE) adapts to various training data requirements with a low bit-width format and reduces energy by transferring complicated arithmetics into simple logarithm domain operation. Second, a reconfigurable inter-intra-channel-reuse dataflow dynamically adjusts the PE mapping with a regrouping omega network to improve the operands reuse for higher hardware utilization. Third, a pointed-stake-shaped codec unit adaptively compresses small values to variable-length data format while compressing large values to fixed-length 8b posit format, reducing the memory access for breaking the training energy bottleneck. Simulated with 28nm CMOS technology, the proposed PL-NPU achieves a maximum frequency of 1040MHz with 343mW and 5.28mm $\mathbf {^{2}}$. The peak energy efficiency is 3.87TFLOPS/W for 0.6V at 60MHz. Compared with the state-of-the-art training processor, PL-NPU reaches $3.75\times $ higher energy efficiency and offers $1.68\times $ speedup when training ResNet18. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

2 results on '"Deng, Dazheng"'

1. Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning.

2. PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

2 results on '"Deng, Dazheng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources