Author: "Hu, Jianhao" / Publication Type: Magazines - Searchworks@Jio Institute Digital Library Search Results

1. Low Complexity State Metric Memory Reduction for Turbo Decoding With Stochastic Quantization

Author: Hu, Shuai, Han, Kaining, and Hu, Jianhao
Abstract: The size of the state metrics cache (SMC) has a predominant impact on the overall hardware consumption of the Turbo decoder. This brief presents a low complexity SMC reduction algorithm based on the proposed stochastic quantization (SQ) technique, which reduces the size of the SMC by randomly quantizing the state metrics to different small bit-width numbers. The selection of the random source and the updating method of the extrinsic information are further explored to minimize the performance loss caused by bit-width reduction. The simulation and synthesis results show that the proposed algorithm can achieve the best bit error rate (BER) performance with the lowest hardware consumption among the compared SMC reduction algorithms.
Published: 2024
Full Text: View/download PDF

2. High Throughput and Hardware Efficient Hybrid LDPC Decoder Using Bit-Serial Stochastic Updating

Author: Hu, Shuai, Han, Kaining, Zhu, Yubin, Shen, Guodong, Wang, Fujie, and Hu, Jianhao
Abstract: Hybrid low-density parity-check (LDPC) decoding combines conventional Belief-Propagation (BP) algorithm with stochastic decoding to achieve high performance and low complexity simultaneously. However, lossy and inefficient stochastic-to-binary (S2B) conversion brings extra performance degradation and decoding latency. In this paper, a bit-serial stochastic updating based hybrid decoding (BSSU-HD) is proposed, which employs fully correlated stochastic (FCS) check nodes (CNs) and probability tracers assisted variable nodes (VNs) to accomplish accurate and efficient S2B conversion. Two strategies, including random source selection and tracing speed switching, are proposed to further improve performance and convergence. A BSSU LDPC decoder for IEEE 802.3an is designed in a 65-nm CMOS process, which occupies 4.6 mm2 silicon area and achieves a throughput of 200.8 Gb/s at $E_{b}/N_{0} = 4.4$ dB with 500 MHz clock frequency from a 1.2 V supply voltage. The power and energy efficiency are 2.933 W and 14.61 pJ/bit, respectively. To the best of our known, it achieves the best decoding performance, the highest throughput and hardware efficiency among state-of-the-art IEEE 802.3an LDPC decoders. We also verify that the BSSU-HD can achieve better performance for multi-rate 5th generation (5G) New Ratio (NR) LDPC codes than conventional algorithm, which greatly extends the application of the stochastic decoding.
Published: 2023
Full Text: View/download PDF

3. A Nonlinear Function Logic Computing Architecture With Low Switching Activity

Author: Zhu, Yubin, Zhang, Yanyan, Han, Kaining, and Hu, Jianhao
Abstract: Nonlinear functions are widely involved in modern digital signal processing systems, which are usually calculated by polynomial approximation, CORDIC algorithm, or look-up tables. Due to the quite complex logic computing architectures, these methods suffer from high switching activity of logic elements, resulting in tremendous dynamic power consumption. Reducing the switching activity is believed an efficient way to save dynamic power. In this brief, we first analyzed and proved the low switching activity feature of the conventional unary number representation. Afterward, a novel multi-hot unary number representation method and the corresponding logic computing architecture with low switching activity are proposed for nonlinear function, which significantly reduces switching activity and power consumption. Moreover, the proposed nonlinear function logic computing architecture is extended to single input multiple output nonlinear function calculation, which shares the multi-hot encoder to further reduce the hardware cost. According to the post-synthesis power evaluation using PrimePower tools, the proposed architecture achieves significant switching activity and dynamic power reduction compared to conventional computing architectures.
Published: 2023
Full Text: View/download PDF

4. Hybrid Stochastic LDPC Decoder With Fully Correlated Stochastic Computation.

Author: Hu, Shuai, Han, Kaining, Wang, Fujie, and Hu, Jianhao
Subjects: LOW density parity check codes, BINARY sequences, ITERATIVE decoding
Abstract: The ultra-low hardware consumption feature of stochastic decoding has made it a potential candidate for the implementation of low-density parity-check(LDPC) decoders. However, the existing stochastic LDPC decoders still suffer from performance degradation and relatively high decoding cycles caused by the correlation among stochastic bit streams. In this paper, we propose Hybrid Stochastic(HS) decoding, which achieves high performance, high throughput, and high hardware efficiency by jointly using our proposed novel stochastic check node(CN) and Two’s Complement(TCS) variable node(VN) to realize Min-Sum Algorithm(MSA) and its enhancements. Fully correlated stochastic bit streams are used to entirely eliminate the indeterminacy caused by the correlation, which results in high performance and fast convergence and inherits the low complexity of stochastic decoders at the same time. We demonstrate the HS decoding by designing a (2048,1723) decoder in a 65 nm process, which achieves the highest Bit-Error-Ratio(BER) performance, highest throughput, and top hardware efficiency among existing stochastic LDPC decoders. We also demonstrate that HS decoding can achieve excellent decoding performance for different code rates and lengths 5G New Radio(NR) LDPC codes. Thus, HS decoding can be adopted in wide applications. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

5. Symbol detection based on temporal convolutional network in optical communications

Author: Luo, Yingzhe and Hu, Jianhao
Abstract: Deep learning (DL) is one of the fastest developing areas in artificial intelligence, it has been recently gained studies and application in computer vision, automatic driving, automatic speech recognition, and communication. This paper uses the DL method to design a symbol detection algorithm in receiver for optical communication systems. The proposed DL based method is implemented by a non-causal temporal convolutional network (ncTCN), which is a convolutional neural network and appropriate for sequence processing. Meanwhile, we adopt three methods to realize the training process for multiple signal-to-noise ratios of the AWGN channel. Furthermore, we apply two nonlinear activation functions for the noise robustness to the proposed ncTCN. Without losing generality, we apply the ncTCN-based receiver to the 16-ary quadrature amplitude modulation optical communication system in the simulation experiment. According to the experiment results, the proposed method can obtain some bit error rate performance gain compared to some conventional receivers.
Published: 2022
Full Text: View/download PDF

6. Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator.

Author: Xia, Zihan, Chen, Jienan, Huang, Qiu, Luo, Jinting, and Hu, Jianhao
Subjects: CONVOLUTIONAL neural networks, NATURAL language processing, NEUROPLASTICITY, MATRIX multiplications, LOGIC circuits
Abstract: Deep convolutional neural networks (DCNNs) have achieved state-of-the-art performance in classification, natural language processing (NLP), and regression tasks. However, there is still a great gap between DCNNs and the human brain in terms of computation efficiency. Inspired by neural synaptic plasticity and stochastic computing (SC), we propose neural synaptic plasticity-inspired computing (NSPC) to simulate the human brain’s neural network activity for inference tasks with simple logic gates. The multiplication and accumulation (MAC) is transformed by the wire connectivity in NSPC, which only requires bundles of wires and small width adders. To this end, the NSPC imitates the structure of neural synaptic plasticity from a circuit wires connection perspective. Furthermore, from the principle of NSPC, we use a data mapping method to convert the convolution operations to matrix multiplications. Based on the methodology of NSPC, fully-pipelined and low latency architecture is designed. The proposed NSPC accelerator exhibits high hardware efficiency while maintaining a comparable network accuracy level. The NSPC based DCNN accelerator (NSPC-CNN) processes DCNN at $1.5625M$ images/ $s$ with a power dissipation of $15.42~W$ and an area of $36.4~mm^{2}$. The NSPC based deep neural network (DNN) accelerator (NSPC-DNN) that implements three fully connected layers DNN consumes only $6.6~mm^{2}$ area and $2.93~W$ power, and achieves a throughput of $400M$ images/ $s$. Compared with conventional fixed-point implementations, the NSPC-CNN achieves $2.77 \times $ area efficiency, $2.25 \times $ power efficiency; the proposed NSPC-DNN exhibits $2.31 \times $ area efficiency and $2.09 \times $ power efficiency. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

7. Low-Cost Implementation Techniques for Interleave Division Multiple Access

Author: Hu, Yang, Liang, Chulong, Hu, Jianhao, and Ping, Li
Abstract: We consider a low-cost code shift division multiple-access (CSDMA) scheme, in which user-specific shifting is used to replace user-specific interleaving in interleave division multiple access (IDMA). We also outline a low-cost Gaussian approximation-based linear minimum mean square error message passing detection technique for CSDMA. We show that CSDMA can offer almost the same performance as the original IDMA in low-density parity-check or turbo coded systems, but with considerably lower implementation cost.
Published: 2018
Full Text: View/download PDF

8. A pseudo-random sequence generation scheme based on RNS and permutation polynomials

Author: Ma, Shang, Liu, Jianfeng, Yang, Zeguo, Zhang, Yan, and Hu, Jianhao
Abstract: Long period pseudo-random sequence plays an important role in modern information processing systems. Base on residue number system (RNS) and permutation polynomials over finite fields, a pseudorandom sequence generation scheme is proposed in this paper. It extends several short period random sequences to a long period pseudo-random sequence by using RNS. The short period random sequences are generated parallel by the iterations of permutation polynomials over finite fields. Due to the small dynamic range of each iterative calculation, the bit width in hardware implementation is reduced. As a result, we can use full look-up table (LUT) architecture to achieve high-speed sequence output. The methods to find proper permutation polynomials to generate long period sequences and the optimization algorithm of Chinese remainder theorem (CRT) mapping are also proposed in this paper. The period of generated pseudorandom sequence can exceed 2100easily based on common used field programmable gate array (FPGA) chips. Meanwhile, this scheme has extensive freedom in choosing permutation polynomials. For example, 10905 permutation polynomials meet the long period requirement over the finite field Fqwith q≢ 1(mod 3) and q⩽ 503. The hardware implementation architecture is simple and multiplier free. Using Xilinx XC7020 FPGA chip, we implement a sequence generator with the period over 250, which only costs 20 18kb-BRAMs (block RAM) and a small amount of logics. And the speed can reach 449.236 Mbps. The National Institute of Standards and Technology (NIST) test results show that the sequence has good random properties.
Published: 2018
Full Text: View/download PDF

9. A Low Complexity Sparse Code Multiple Access Detector Based on Stochastic Computing.

Author: Han, Kaining, Hu, Jianhao, Chen, Jienan, and Lu, Hao
Subjects: *MULTIPLE access protocols (Computer network protocols), *MIMO systems, *WIRELESS communications, *COMPUTING platforms, *COMPUTER architecture
Abstract: Sparse code multiple access (SCMA) is a promising multiple access technology candidate for the next-generation communication system, which can dramatically improve spectral efficiency. However, the major challenge of SCMA is the very high detection complexity. Stochastic computing is a new number representation, which can carry out complex computations with very simple logics. In this paper, we extend the application of stochastic computing to SCMA detection and propose a low complexity stochastic SCMA detector. We also design three novel stochastic logic architectures: a new low hardware cost bit stream generation architecture, a low hardware cost stochastic function node update architecture and a fast converging stochastic variable node update architecture. Analysis and simulation results show that the proposed stochastic SCMA detector saves 69% complexity compared with the traditional SCMA detectors with a comparable bit error rate performance. The synthesis results with SIMC 65-nm CMOS technology show that the proposed stochastic SCMA detector achieves 640 Mbps total system throughput with only 1.45-mm $^{{{2}}}$ cell area. [ABSTRACT FROM PUBLISHER]
Published: 2018
Full Text: View/download PDF

10. Hardware Efficient Massive MIMO Detector Based on the Monte Carlo Tree Search Method

Author: Chen, Jienan, Fei, Chao, Lu, Hao, Sobelman, Gerald E., and Hu, Jianhao
Abstract: A Monte Carlo tree search (MCTS)-based large-scale multiple-input, multiple-output (MIMO) detector is proposed. We describe how the MCTS algorithm, which has been successfully used in decision-making and game-playing problems, can be applied to MIMO detection. In particular, we discuss how the tree policy, default policy, simulation, and backpropagation steps of MCTS can be adapted to MIMO detection. We also describe some optimizations that reduce both the bit error rate and the computational complexity. The proposed MCTS MIMO detector exhibits performance that is comparable to existing methods while having a lower computational load. The design has been implemented in a 65-nm CMOS technology. For a $64 \times 8$ MIMO system, it achieves a throughput of 665 Mbps with a core area of 1.43 mm2, and it exhibits higher hardware efficiency than previous MIMO detector designs in the literature.
Published: 2017
Full Text: View/download PDF

11. An Intra-Iterative Interference Cancellation Detector for Large-Scale MIMO Communications Based on Convex Optimization.

Author: Chen, Jienan, Zhang, Zhenbing, Lu, Hao, Hu, Jianhao, and Sobelman, Gerald E.
Subjects: CONVEX functions, MEAN square algorithms, ERROR rates, COMPLEMENTARY metal oxide semiconductors, MIMO systems
Abstract: This paper proposes an intra-iterative interference cancellation (IIC) detector based on convex optimization for large-scale MIMO systems. By utilizing Newton's method to solve the optimization problem, a hardware-friendly detector is implemented in a 65 nm CMOS technology. The proposed detector has a throughput of 3.6 Gb/s with a 600 MHz operating frequency. The simulation results indicate that the block-error rate (BLER) performance of the proposed method can approach that of the minimum mean square error (MMSE) detector. The design is found to be more efficient than other recently proposed MIMO detector implementations. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

12. Stochastic Iterative MIMO Detection System: Algorithm and Hardware Design.

Author: Chen, Jienan, Hu, Jianhao, and Sobelman, Gerald E.
Subjects: *MIMO systems, *WIRELESS communications, *STOCHASTIC approximation, *ITERATIVE methods (Mathematics), *MARKOV chain Monte Carlo, *GIBBS phenomenon, *LOGIC circuits
Abstract: In this paper, we propose a Stochastic iterative multiple-input multiple-output (SIM) detection system based on the Markov chain Monte Carlo (MCMC) method. To improve the detection performance, the Gibbs sampler of the MCMC detector in the SIM is updated by the decoded bits from a channel decoder directly. The channel decoder is part of the updating unit that generates the new samples in the MCMC updating process. We also implement the SIM in a fully parallel scheme, which achieves a high detection speed. As a case study, we have designed and synthesized a 128-parallel 4\times4 16-QAM SIM system using a CMOS 130 nm technology with a core area of 1.98 mm^2 and 457K logic gates. The SIM detection system can achieve a throughput of 787.5Mbps with a frame error rate (FER) 10^-3 at Eb/N0=7dB, equaling the FER of a traditional iterative MIMO detection with four outer iterations. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

13. Prediction of low-LET ion induced single event upset cross sections for advanced SRAM

Author: Zhou, Wanting, Hu, Jianhao, and Li, Lei
Abstract: This paper describes a simple circuit-level simulation-based approach to predict single event upset cross section induced by low-linear energy transfer (LET) ions for advanced bulk static random access memory (SRAM). A basic Simulation Program with Integrated Circuit Emphasis (SPICE) model with effective collection depth considered is developed for performing single event analysis quickly and efficiently. Through this circuit-level simulation model, radiation effects can be shown as the SPICE-simulated curve of LETs versus the corresponding affected distances, which are used for upset cross-section prediction. Furthermore, a fine-grain geometric model for cross-section prediction with fine sensitivity coefficient considered is utilized in the prediction. The calculated results based on this method are in good agreement with experimentally measured results reported for six-transistor SRAM fabricated in 90 nm and 65 nm process technologies.
Published: 2013
Full Text: View/download PDF

14. A 2n scaling scheme for signed RNS integers and its VLSI implementation

Author: Ma, Shang, Hu, JianHao, Ye, YanLong, Zhang, Lin, and Ling, Xiang
Abstract: Abstract: High efficient implementation of scaling in residue number system (RNS) is one of the critical issues for the applications of RNS in digital signal processing (DSP) systems. In this paper, an efficient scaling algorithm for signed integers in RNS is proposed firstly through introducing a correction constant in negative integers scaling procedure. Based on the proposed scaling algorithm, an efficient RNS 2 n scaling implementation method is presented, in which Chinese remainder theorem (CRT) and a redundant modulus are used to perform the base extension to obtain the least significant n bits of RNS integers. With the redundant modulus, the RNS sign detection can be achieved by the parity detection. And then, an approach to update the residue digit of the redundant channel is also proposed. Meanwhile, this paper provides a method of computing the correction constant of the redundant channel in negative integers scaling. The analysis results indicate that the complexity of the proposed scaling algorithm grows linearly with the word-length of the RNS dynamic range without using Look-up Table (LUT). Furthermore, the proposed algorithm is employed for a specific moduli set 2 n scaling. The synthesis results show that the critical path of the proposed algorithm is shortened by 12%, the area and power consumption performance is improved by about 35%, compared to the existing cascading 2 n scaling method for very large scale integration (VLSI) implementation under the same restriction. Besides, the VLSI layout indicates that the parallel structure is simpler.
Published: 2010
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

14 results on '"Hu, Jianhao"'

1. Low Complexity State Metric Memory Reduction for Turbo Decoding With Stochastic Quantization

2. High Throughput and Hardware Efficient Hybrid LDPC Decoder Using Bit-Serial Stochastic Updating

3. A Nonlinear Function Logic Computing Architecture With Low Switching Activity

4. Hybrid Stochastic LDPC Decoder With Fully Correlated Stochastic Computation.

5. Symbol detection based on temporal convolutional network in optical communications

6. Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator.

7. Low-Cost Implementation Techniques for Interleave Division Multiple Access

8. A pseudo-random sequence generation scheme based on RNS and permutation polynomials

9. A Low Complexity Sparse Code Multiple Access Detector Based on Stochastic Computing.

10. Hardware Efficient Massive MIMO Detector Based on the Monte Carlo Tree Search Method

11. An Intra-Iterative Interference Cancellation Detector for Large-Scale MIMO Communications Based on Convex Optimization.

12. Stochastic Iterative MIMO Detection System: Algorithm and Hardware Design.

13. Prediction of low-LET ion induced single event upset cross sections for advanced SRAM

14. A 2n scaling scheme for signed RNS integers and its VLSI implementation

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

14 results on '"Hu, Jianhao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources