801 results on '"Vlsi architecture"'
Search Results
152. VLSI Architectures for Digital Video Signal Processing
- Author
-
Pirsch, P., Dewilde, Patrick, editor, and Vandewalle, Joos, editor
- Published
- 1992
- Full Text
- View/download PDF
153. A High Parallelism hardware architecture design of the H.264/AVC integer motion estimation for application in real-time DTTV transmissions.
- Author
-
Santos Lunarejo, José Luis and Cárdenas, Carlos Silva
- Subjects
DIGITAL video standards ,HIGH definition video recording ,COMPUTER input-output equipment design & construction ,MOTION estimation (Signal processing) ,ALGORITHMS ,FIELD programmable gate arrays - Abstract
The H.264/AVC is the Standard Videoformat used by the SBTVD (Sistema Brasileiro de Televisão Digital), with presence in almost all the countries in South America, that allows transmissions in Full High Definition (Full HD) video quality. So this work presents a hardware architecture design of the Motion Estimation algorithm used in the Standard, as the higher computational processing is located in this part, so we take advantage of the high parallelism characteristics of the designs made in hardware to achieve faster processing and hence real-time broadcasts. The design was described using VHDL and synthesized to the Altera Cyclone II FPGA being able to process Full HD video (1920x1080 pixels) in real-time. The results establish a maximum operation frequency of 183.55 MHz, and at this speed it can process 35 frames per second. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
154. A Cascadable VLSI Architecture for the Realization of Large Binary Associative Networks
- Author
-
Poechmueller, Werner, Glesner, Manfred, Delgado-Frias, José G., editor, and Moore, William R., editor
- Published
- 1991
- Full Text
- View/download PDF
155. Hardware Efficient Pseudo-Random Number Generator Using Chen Chaotic System on FPGA
- Author
-
Mangal Deep Gupta and R. K. Chauhan
- Subjects
Pseudorandom number generator ,Vlsi architecture ,biology ,business.industry ,Computer science ,Chaotic ,Latency (audio) ,Cryptography ,General Medicine ,biology.organism_classification ,Computer Science::Hardware Architecture ,Chen ,Hardware and Architecture ,NIST ,Electrical and Electronic Engineering ,business ,Field-programmable gate array ,Computer hardware ,Computer Science::Cryptography and Security - Abstract
This paper introduces an FPGA implementation of a pseudo-random number generator (PRNG) using Chen’s chaotic system. This paper mainly focuses on the development of an efficient VLSI architecture of PRNG in terms of bit rate, area resources, latency, maximum length sequence, and randomness. First, we analyze the dynamic behavior of the chaotic trajectories of Chen’s system and set the parameter’s value to maintain low hardware design complexity. A circuit realization of the proposed PRNG is presented using hardwired shifting, additions, subtractions, and multiplexing schemes. The benefit of this architecture, all the binary multiplications (except [Formula: see text] and [Formula: see text] operations are performed using hardwired shifting. Moreover, the generated sequences pass all the 15 statistical tests of NIST, while it generates pseudo-random numbers at a uniform clock rate with minimum hardware complexity. The proposed architecture of PRNG is realized using Verilog HDL, prototyped on the Virtex-5 FPGA (XC5VLX50T) device, and its analysis has been done using the Matlab tool. Performance analysis confirms that the proposed Chen chaotic attractor-based PRNG scheme is simple, secure, and hardware efficient, with high potential to be adopted in cryptography applications.
- Published
- 2021
- Full Text
- View/download PDF
156. Power-Efficient Sum of Absolute Differences Hardware Architecture Using Adder Compressors for Integer Motion Estimation Design.
- Author
-
Silveira, Bianca, Paim, Guilherme, Abreu, Brunno, Grellert, Mateus, Diniz, Claudio Machado, da Costa, Eduardo Antonio Cesar, and Bampi, Sergio
- Subjects
- *
ADDERS (Digital electronics) , *MOTION estimation (Signal processing) , *VIDEO coding , *ELECTRONIC equipment design - Abstract
Sum of absolute differences (SAD) calculation is one of the most time-consuming operations of video encoders compatible with the high efficiency video coding standard. SAD hardware architectures employ an adder tree to accumulate the coefficients from absolute difference between two video blocks. This paper exploits different adder compressors structures into the SAD hardware architecture. The architectures were synthesized to 45-nm CMOS standard cells. Synthesis results show that SAD architecture using 8–2 compressor composed with 4–2 compressors and Kogge–Stone adder in the recombination line reduces power dissipation by 25.5% on average when compared with the SAD architecture using conventional adders from a state-of-the-art synthesis tool. Our throughput analysis shows that the designed SAD units are capable of encoding full HD ( $1920\times 1080$ ) videos in real time at 30 frames/s. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
157. Occam and the transputer
- Author
-
May, David, Shepherd, Roger, Goos, G., editor, Hartmanis, J., editor, Barstow, D., editor, Brauer, W., editor, Brinch Hansen, P., editor, Gries, D., editor, Luckham, D., editor, Moler, C., editor, Pnueli, A., editor, Seegmüller, G., editor, Stoer, J., editor, Wirth, N., editor, and Rozenberg, Grzegorz, editor
- Published
- 1990
- Full Text
- View/download PDF
158. An Area-Efficient Variable-Size Fixed-Point DCT Architecture for HEVC Encoding
- Author
-
Maurizio Martina, Guido Masera, and Maurizio Masera
- Subjects
Standards ,Complexity theory ,Computer architecture ,DCT ,Discrete Cosine Transform ,Discrete cosine transforms ,Hardware ,High Efficiency Video Coding ,Throughput ,VLSI Architecture ,Computer science ,Variable size ,02 engineering and technology ,Fixed point ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Discrete cosine transform ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,Architecture ,Encoder ,Algorithm - Abstract
This paper proposes an area-efficient fixed-point architecture for the computation of the discrete cosine transform (DCT) of multiple sizes in high efficiency video coding (HEVC). This result is obtained by comparing different DCT factorizations in order to find the most suitable one for implementation in the HEVC encoder. The recursive structure of fast algorithms, which decompose the $N$ -point DCT by means of two $N/2$ -point DCTs, is exploited to execute computations of small-size DCTs in parallel, thus maximizing the hardware re-usability while maintaining a constant throughput. The simulation results prove that the proposed solution features reduced rate-distortion loss es, with relevant complexity saving compared with the state-of-the-art implementations. Finally, the proposed architecture is exploited to design two families of architectures for the 2D-DCT, namely, folded and full-parallel.
- Published
- 2020
- Full Text
- View/download PDF
159. A Universal Approximation Method and Optimized Hardware Architectures for Arithmetic Functions Based on Stochastic Computing
- Author
-
Dong Hongxi, Zhongfeng Wang, Zhonghai Lu, Qiu Yu'ou, Hongbing Pan, Zidi Qin, and Muhan Zheng
- Subjects
010302 applied physics ,Polynomial ,Stochastic computing ,VLSI architecture ,General Computer Science ,Computer science ,General Engineering ,02 engineering and technology ,arithmetic functions ,01 natural sciences ,020202 computer hardware & architecture ,Computational science ,Computer Science::Hardware Architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Arithmetic function ,General Materials Science ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Complex number ,approximation ,lcsh:TK1-9971 - Abstract
Stochastic computing (SC) has been applied on the implementations of complex arithmetic functions. Complicated polynomial-based approximations lead to large hardware complexity of previous SC circuits for arithmetic functions. In this paper, a novel piecewise approximation method based on Taylor series expansion is proposed for complex arithmetic functions. Efficient implementations based on unipolar stochastic logic are presented for the monotonic functions. Furthermore, detailed optimization schemes are provided for the non-monotonic functions. Using NAND and AND gates as main computing elements, the optimized hardware architectures have extremely low complexity. The experimental results show that a broad range of arithmetic functions can be implemented with the proposed SC circuits, and the mean absolute errors can achieve the order of 1 × 10-3. Compared with the state-of-the-art works, the approximation precision for some typical functions can be increased by more than 8× with our method. In addition, the proposed circuits outperform the previous methods in hardware complexity and critical path significantly.
- Published
- 2020
160. Efficient VLSI Architecture for Modified Blowfish Algorithm for Military Applications
- Author
-
Mushfiqua Yamen
- Subjects
Vlsi architecture ,Blowfish algorithm ,Computer architecture ,Computer science - Published
- 2019
- Full Text
- View/download PDF
161. VLSI Architecture for Systolic-Like Modular Multipliers over GF (2m) Build on Irreducible All-One Polynomials
- Author
-
Soumya. M
- Subjects
Vlsi architecture ,Computer science ,business.industry ,Arithmetic ,Modular design ,business - Published
- 2019
- Full Text
- View/download PDF
162. VLSI Architecture for Optimization Transform Technique based on Compression of ECG Signals
- Author
-
M Hatem, Ashraf Mohamed, and Walied Elnahel
- Subjects
Vlsi architecture ,business.industry ,Computer science ,Compression (functional analysis) ,Ecg signal ,business ,Computer hardware - Published
- 2019
- Full Text
- View/download PDF
163. Memory Efficient VLSI Implementation of Real-Time Motion Detection System Using FPGA Platform
- Author
-
Sanjay Singh, Atanendu Sekhar Mandal, Chandra Shekhar, and Anil Vohra
- Subjects
real-time motion detection ,VLSI architecture ,FPGA implementation ,video surveillance system ,smart camera system ,Photography ,TR1-1050 ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Motion detection is the heart of a potentially complex automated video surveillance system, intended to be used as a standalone system. Therefore, in addition to being accurate and robust, a successful motion detection technique must also be economical in the use of computational resources on selected FPGA development platform. This is because many other complex algorithms of an automated video surveillance system also run on the same platform. Keeping this key requirement as main focus, a memory efficient VLSI architecture for real-time motion detection and its implementation on FPGA platform is presented in this paper. This is accomplished by proposing a new memory efficient motion detection scheme and designing its VLSI architecture. The complete real-time motion detection system using the proposed memory efficient architecture along with proper input/output interfaces is implemented on Xilinx ML510 (Virtex-5 FX130T) FPGA development platform and is capable of operating at 154.55 MHz clock frequency. Memory requirement of the proposed architecture is reduced by 41% compared to the standard clustering based motion detection architecture. The new memory efficient system robustly and automatically detects motion in real-world scenarios (both for the static backgrounds and the pseudo-stationary backgrounds) in real-time for standard PAL (720 × 576) size color video.
- Published
- 2017
- Full Text
- View/download PDF
164. Real-Time FPGA-Based Object Tracker with Automatic Pan-Tilt Features for Smart Video Surveillance Systems
- Author
-
Sanjay Singh, Chandra Shekhar, and Anil Vohra
- Subjects
real-time object tracking ,VLSI architecture ,FPGA implementation ,video surveillance system ,smart camera system ,Photography ,TR1-1050 ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The design of smart video surveillance systems is an active research field among the computer vision community because of their ability to perform automatic scene analysis by selecting and tracking the objects of interest. In this paper, we present the design and implementation of an FPGA-based standalone working prototype system for real-time tracking of an object of interest in live video streams for such systems. In addition to real-time tracking of the object of interest, the implemented system is also capable of providing purposive automatic camera movement (pan-tilt) in the direction determined by movement of the tracked object. The complete system, including camera interface, DDR2 external memory interface controller, designed object tracking VLSI architecture, camera movement controller and display interface, has been implemented on the Xilinx ML510 (Virtex-5 FX130T) FPGA Board. Our proposed, designed and implemented system robustly tracks the target object present in the scene in real time for standard PAL (720 × 576) resolution color video and automatically controls camera movement in the direction determined by the movement of the tracked object.
- Published
- 2017
- Full Text
- View/download PDF
165. High-Throughput VLSI Architecture for GRAND Markov Order
- Author
-
Abbas, Syed Mohsin, Jalaleddine, Marwan, Gross, Warren J., Abbas, Syed Mohsin, Jalaleddine, Marwan, and Gross, Warren J.
- Abstract
Guessing Random Additive Noise Decoding (GRAND) is a recently proposed Maximum Likelihood (ML) decoding technique. Irrespective of the structure of the error correcting code, GRAND tries to guess the noise that corrupted the codeword in order to decode any linear error-correcting block code. GRAND Markov Order (GRAND-MO) is a variant of GRAND that is useful to decode error correcting code transmitted over communication channels with memory which are vulnerable to burst noise. Usually, interleavers and de-interleavers are used in communication systems to mitigate the effects of channel memory. Interleaving and de-interleaving introduce undesirable latency, which increases with channel memory. To prevent this added latency penalty, GRAND-MO can be directly used on the hard demodulated channel signals. This work reports the first GRAND-MO hardware architecture which achieves an average throughput of up to 52 Gbps and 64 Gbps for a code length of 128 and 79 respectively. Compared to the GRANDAB, hard-input variant of GRAND, the proposed architecture achieves 3 dB gain in decoding performance for a target FER of 10-5. Similarly, comparing the GRAND-MO decoder with a decoder tailored for a (79, 64) BCH code showed that the proposed architecture achieves 33% higher worst case throughput and 2 dB gain in decoding performance. © 2021 IEEE.
- Published
- 2021
166. Algorithm and Architecture Design of the H.265/HEVC Intra Encoder.
- Author
-
Pastuszak, Grzegorz and Abramowski, Andrzej
- Subjects
- *
VIDEO coding , *VIDEO compression , *FIELD programmable gate arrays , *VERY large scale circuit integration , *ALGORITHM research - Abstract
Improved video coding techniques introduced in the H.265/High Efficiency Video Coding (HEVC) standard allow video encoders to achieve better compression efficiencies. On the other hand, the increased complexity requires a new design methodology able to face challenges associated with ever higher spatiotemporal resolutions. This paper presents a computationally scalable algorithm and its hardware architecture able to support intra encoding up to 2160p@30 frames/s resolution. The scalability allows a tradeoff between the throughput and the compression efficiency. In particular, the encoder is able to check a variable number of candidate modes. The rate estimation based on bin counting and the distortion estimation in the transform domain simplify the rate–distortion analysis and enable the evaluation of a great number of candidate intra modes. The encoder preselects candidate modes by the processing of $8\times 8$ predictions computed from original samples. The preselection shares hardware resources used for the processing of predictions generated from reconstructed samples. To support intra $4\times 4$ modes for the 2160p@30 frames/s resolution, the encoder incorporates a separate reconstruction loop. The processing of blocks with different sizes is interleaved to compensate for the delay of reconstruction loops. Implementation results show that the encoder utilizes 1086k gates and 52-kB on-chip memories for TSMC 90 nm. The main reconstruction loop can operate at 400 MHz, whereas the remaining modules work at 200 MHz. For 2160p@30 frames/s videos, the average BD-rate is 5.46% compared with that of the HM software. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
167. A fast VLSI architecture of a hierarchical block matching algorithm for motion estimation.
- Author
-
Ghosh, Kausik and Dhar, Anindya
- Abstract
In this paper, an efficient VLSI architecture of a hierarchical block matching algorithm has been proposed for motion estimation. At the lowest resolution level, two motion vector (MV) candidates are selected to get better performance. In the next search level, these two candidates provide the center points for local searches to get one MV candidate. Then, at next level and the finest level, one MV candidate is chosen from one local search area (LSA), defined by the MV candidate, obtained from lower resolution level. This architecture requires nine processing elements and data are processed in such a way that calculation to obtain frames of different resolution is overlapped with the MV calculation. Simulation results indicate that this architecture is more area-efficient and faster than many full-search, three-step-search and multiresolution architectures which makes it suitable for SD and HD video. To avoid the delay due to pipelining, the MVs of all the macro-blocks are calculated for one resolution level and stored in RAM to get LSA for next resolution level. This architecture with about 16 K gates is implemented for a search range of [−15, +15]. As this architecture requires only two-port memory, which is very common in most consumer electronics systems, it can be integrated easily in any existing system at the expense of a very small area. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
168. VLSI Architecture for Configurable and Low-Complexity Design of Hard-Decision Viterbi Decoding Algorithm.
- Author
-
Wicaksana Putra, Rachmad Vidya and Adiono, Trio
- Subjects
VERY large scale circuit integration ,VITERBI decoding ,DECODING algorithms ,DATA encryption ,ERROR correction (Information theory) - Abstract
Convolutional encoding and data decoding are fundamental processes in convolutional error correction. One of the most popular error correction methods in decoding is the Viterbi algorithm. It is extensively implemented in many digital communication applications. Its VLSI design challenges are about area, speed, power, complexity and configurability. In this research, we specifically propose a VLSI architecture for a configurable and low-complexity design of a hard-decision Viterbi decoding algorithm. The configurable and lowcomplexity design is achieved by designing a generic VLSI architecture, optimizing each processing element (PE) at the logical operation level and designing a conditional adapter. The proposed design can be configured for any predefined number of trace-backs, only by changing the trace-back parameter value. Its computational process only needs N + 2 clock cycles latency, with N is the number of trace-backs. Its configurability function has been proven for N = 8, N = 16, N = 32 and N = 64. Furthermore, the proposed design was synthesized and evaluated in Xilinx and Altera FPGA target boards for area consumption and speed performance. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
169. An Efficient Architecture for Modified Lifting-Based Discrete Wavelet Transform
- Author
-
Pinto, Rohan and Shama, Kumara
- Published
- 2020
- Full Text
- View/download PDF
170. 2 n R NS Scalpers for Extended 4 -Moduli Sets.
- Author
-
Sousa, Leonel
- Subjects
- *
VERY large scale circuit integration , *NUMBER systems , *SET theory , *MODULI theory , *ALGORITHMS , *FIELD programmable gate arrays - Abstract
Scaling is a key important arithmetic operation and is difficult to perform in Residue Number Systems (RNS). This paper proposes a comprehensive approach for designing efficient and accurate 2^n
RNS scalers for important classes of moduli sets that have large dynamic ranges. These classes include the traditional 3-moduli set, but the exponent of the power of two modulo is augmented by a variable value x ( \lbrace 2^n-1, 2^n\underline+x, 2^n+1 \rbrace and $146$ percent when the energy required per scaling is measured. The proposed scalers are not only flexible and cost-effective, but they are also suitable for designing and implementing energy-constrained devices, particularly mobile systems. [ABSTRACT FROM PUBLISHER]- Published
- 2015
- Full Text
- View/download PDF
171. High-Throughput Power-Efficient VLSI Architecture of Fractional Motion Estimation for Ultra-HD HEVC Video Encoding.
- Author
-
He, Gang, Zhou, Dajiang, Li, Yunsong, Chen, Zhixiang, Zhang, Tianruo, and Goto, Satoshi
- Subjects
VIDEO compression ,INTERPOLATION ,COMPLEMENTARY metal oxide semiconductors ,METAL oxide semiconductors ,COMPUTATIONAL complexity ,DIGITAL electronics - Abstract
Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for $7680\,\times \,4320~30$ frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52% improvement on power efficiency, relative to previous works in H.264. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
172. Fast Motion Estimation Algorithm and Design for Real Time QFHD High Efficiency Video Coding.
- Author
-
Jou, Shiaw-Yu, Chang, Shan-Jung, and Chang, Tian-Sheuan
- Subjects
- *
MOTION estimation (Signal processing) , *VIDEO coding , *DATA transmission systems , *BANDWIDTH compression , *CODING theory - Abstract
Motion estimation (ME) in the latest High Efficiency Video Coding standard adopts the quadtree coding structure and up to a $64\times 64$ prediction unit (PU) size to improve the coding gain. However, these techniques also have serious design problems regarding the complexity, data dependency, external memory bandwidth, and on-chip buffer size compared with previous standards, especially for real-time ultrahigh-definition video coding. To solve these problems, this paper proposes an efficient ME design with a joint algorithm and architecture optimization. To reduce complexity, we propose a predictive integer ME (IME) algorithm that selects the most probable search directions and steps through a statistical analysis to reduce the number of search points by 90.5%. We also employ a PU size-dependent fractional ME (FME) algorithm to reduce the interpolation filtering by 62.4% compared with the reference software. To resolve the corresponding dependency, we cascade the IME and FME computations via interlaced scheduling and propose an early motion vector prediction candidate approach. We use this scheduling with a 16\times 16$ processing unit to compute the partial matching cost of all PUs with the same 16\times 16$ current block in an interlaced order and share their common reference block to reduce the on-chip buffer size and off-chip memory bandwidth. The bandwidth is further reduced by a cache with double Z scan indexed addressing to simplify the cache controller. Implementation with a Taiwan Semiconductor Manufacturing Company 90-nm CMOS process supports the real-time encoding of at 60 frames/s operated at 270 MHz with 778.7k logic gates and 17.4 KB of on-chip memory. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
173. High‐speed low‐power very‐large‐scale integration architecture for dual‐standard deblocking filter.
- Author
-
Srinivasarao, Batta Kota Naga, Chakrabarti, Indrajit, and Ahmad, Mohammad Nawaz
- Abstract
H.264/AVC is regarded as a popular video coding standard, and is widely used in multimedia applications. However, with an increasing demand for better quality videos, high efficiency video coding (HEVC) is all set to serve as the successor to H.264/AVC for higher resolution video applications. Since a majority of the multimedia devices have already been operating based on the H.264/AVC standard, it may not be worthwhile to completely replace the existing software and hardware components by different modules in order to adopt HEVC in such devices. Need is therefore felt to design a decoder for supporting H.264/AVC as well as HEVC, rather than attempting individual designs. This paper introduces a new dual‐standard deblocking filter architecture, which supports both H.264/AVC and HEVC standards. Algorithmic verification has been done in Matlab and then an appropriate VLSI architecture has been implemented on FPGA as well as in ASIC domain. The proposed architecture takes 26 clock cycles for H.264/AVC and 14 cycles for HEVC to complete the filtering of a 16 × 16 pixel block. It consumes 5.80 mW normalised power and occupies an area equivalent to 70.1k equivalent gate at frequency of 100 MHz. The proposed architecture takes 8.42 ms to filter the 4K ultra high definition (UHD) (3840 × 2160) frame in H.264 standard, and it takes 18 ms to filter the 8K UHD (7680 × 4320) frame in HEVC standard. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
174. An Efficient Adaptive Binary Range Coder and Its VLSI Architecture.
- Author
-
Belyaev, Evgeny, Liu, Kai, Gabbouj, Moncef, and Li, YunSong
- Subjects
- *
ENCODING , *ENTROPY (Information theory) , *VERY large scale circuit integration , *PROBABILITY theory , *DATA compression , *ALGORITHMS , *FIELD programmable gate arrays , *VIDEO codecs - Abstract
In this paper, we propose a new hardware-efficient adaptive binary range coder (ABRC) and its very-large-scale integration (VLSI) architecture. To achieve this, we follow an approach that allows to reduce the bit capacity of the multiplication needed in the interval division part and shows how to avoid the need to use a loop in the renormalization part of ABRC. The probability estimation in the proposed ABRC is based on a lookup table free virtual sliding window. To obtain a higher compression performance, we propose a new adaptive window size selection algorithm. In comparison with an ABRC with a single window, the proposed system provides a faster probability adaptation at the initial encoding/decoding stage, and more accurate probability estimation for very low entropy binary sources. We show that the VLSI architecture of the proposed ABRC attains a throughput of 105.92 MSymbols/s on the FPGA platform, and consumes 18.15 mW for the dynamic part power. In comparison with the state-of-the-art MQ-coder (used in JPEG2000 standard) and the M-coder (used in H.264/Advanced Video Coding and H.265/High Efficiency Video Coding standards), the proposed ABRC architecture provides comparable throughput, reduced memory, and power consumption. Experimental results obtained for a wavelet video codec with JPEG2000-like bit-plane entropy coder show that the proposed ABRC allows to reduce the bit rate by 0.8%–8% in comparison with the MQ-coder and from 1.0%–24.2% in comparison with the M-coder. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
175. Low Power Motion Estimation Design Based on Non-Uniform Pixel Truncation.
- Author
-
Rong, Yaocheng, Yu, Quanhe, An, Da, and He, Yun
- Abstract
Motion estimation (ME) consumes large hardware cost and power (50 %-90 %) in most video encoders. Thus reducing the power consumption of ME is a major concern of low power video encoder design. A lot of fast ME approaches have been proposed but many of them focus on algorithm speed up rather than power consumption. Pixel truncation can effectively reduce the hardware cost and the power consumption. Classical pixel truncation is usually performed uniformly within the search window. However, it suffers great compression efficiency degradation when high power reduction ratio is required. This paper proposes a low power motion estimation scheme based on non-uniform pixel truncation. By observing the unequal distribution of motion vectors, we divide the search window into different sub-regions and employ different numbers of truncated bits (NTB) in these sub-regions. NTB pairs are appropriately examined to achieve a better tradeoff between the compression efficiency and the hardware cost. Then hardware architecture is designed to evaluate the hardware cost of the algorithm and a carefully designed threshold parameter is introduced to make the algorithm more hardware-friendly. Test results demonstrate that the proposed hardware-friendly algorithm achieves 21 % and 49 % power saving in the 2D and 1D ME architecture, respectively, both with negligible compression efficiency degradation. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
176. Efficient architecture of adaptive rood pattern search technique for fast motion estimation.
- Author
-
Biswas, Baishik, Mukherjee, Rohan, and Chakrabarti, Indrajit
- Subjects
- *
ADAPTIVE routing (Computer network management) , *SEARCH engines , *PARAMETER estimation , *VERY large scale circuit integration , *VIDEO processing - Abstract
This paper presents efficient VLSI architecture for fast Motion Estimation (ME) using Adaptive Rood Pattern Search (ARPS) technique. The proposed architecture uses a single processing element (PE) and simplified memory addressing to reduce the hardware complexity. The addressing logic, which is presently applied to 352 × 288 CIF frames, can be easily extended to frames of higher resolutions. The proposed architecture uses optimum area while satisfying speed requirements for real-time video processing. Implemented in Verilog HDL and mapped to Virtex 6 (XC6VLX75T-3) FPGA, the architecture uses only 165 slice registers and 273 slice LUTs. The architecture can process 240 frames per second while operating at a maximum frequency of 320 MHz. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
177. High Performance VLSI Architecture for Three-Step Search Algorithm.
- Author
-
Mukherjee, Rohan, Sheth, Keyur, Dhar, Anindya, Chakrabarti, Indrajit, and Sengupta, Somnath
- Subjects
- *
VERY large scale circuit integration , *SEARCH algorithms , *VIDEO coding , *COMPUTATIONAL complexity , *VIDEO processing - Abstract
Motion estimation is the most computationally intensive part of any video coding standard. The three-step search algorithm is a popular fast search technique to reduce complexity in motion estimation. In this paper, we propose a novel architecture for the three-step search technique that simplifies memory addressing and reduces hardware complexity. The proposed architecture minimizes the area while maintaining the speed requirements for real-time video processing. Implemented in Verilog HDL on Virtex-5 technology and synthesized using Xilinx ISE Design Suite 14.1, the critical path in the hardware is 6.536 ns and the equivalent area is calculated to be 2.3 K gate equivalent. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
178. Computation-constrained dynamic search range control for real-time video encoder.
- Author
-
Ji, Xianghu, Jia, Huizhu, Liu, Jie, Xie, Xiaodong, and Gao, Wen
- Subjects
- *
REAL-time control , *VIDEO codecs , *SEARCH algorithms , *COMPUTER vision , *COMPUTATIONAL complexity , *BOUNDARY value problems - Abstract
Search range (SR) is a key parameter on the search quality control for motion estimation (ME) of a real-time video encoder. Dynamic search range (DSR) is a commonly employed algorithm to reduce the computational complexity of ME in a video encoder. In this paper, we model an effective predicted motion vector (PMV) deviation metric to predict the relationship between SR and motion vector difference (MVD), according to the prediction differences of both temporal and spatial motions of neighboring blocks. In addition, a computation-constrained DSR (CDSR) control algorithm is proposed to manage the computational complexity while maximizing video coding quality in a real-time computational constrained scenario. The SR is dynamically determined by three factors: motion complexity, user-defined probability and computation budget. Compared to the conventional DSR algorithms, the proposed CDSR is an effective and quantifiable algorithm to allocate more computation budget to the blocks with high PMV deviations (such as motion object boundary), and less computation budget to the well-matched motion predicted blocks, while maintaining a constrained computation requirement. Experimental results show that the proposed CDSR control algorithm is an effective method to manage the computation consumption of the DSR algorithm while keeping similar rate–distortion (RD) performance. It can achieve about 0.1–0.3 dB average PSNR improvement when the computation consumption is restricted to a specific level as compared with its equivalent Fixed SR algorithm and can achieve about 50–90% computation savings when compared to the benchmarks. For ME with high performance Processing Element (PE) engine, the quality degradation caused by the proposed CDSR algorithm can be ignored. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
179. A Low-Memory-Access Length-Adaptive Architecture for 2 $$^n$$ -Point FFT.
- Author
-
Chen, Kuan-Hung
- Subjects
- *
FOURIER transforms , *ADAPTIVE control systems , *WIRELESS communications , *DIGITAL signal processing , *SYSTEMS design - Abstract
Fast Fourier transformation (FFT) is widely used in modern wireless communication and digital signal processing. Because memory access is a major cause of power dissipated by the long-length FFT architecture, this paper explores the design space expanded by FFT size and radix number in detail and presents a novel low-memory-access length-adaptive architecture for computing any long-length 2 $$^n$$ -point FFT. The proposed hardware solution possesses the following three attractive features to reflect its novelty as compared to the existing designs. First, the authors identified that memory consumes major energy dissipation of a FFT processor and proposed to reduce memory access through decreasing the number of FFT butterfly stages. The second one is that we adopt the design concept of programmable processors to provide the flexibility in dynamically configuring the hardware for computing variable-length FFT without sacrificing the hardware utilization as contrary to the feed-forward architecture. Finally, a 16-bank memory organization is proposed to achieve conflict-free FFT operations for various radixes. Such low-memory-access length-adaptive architecture can reduce almost 70 % memory access or 30 % power consumption for FFT computation. After being implemented through 1P6M TSMC 0.18- $$\upmu $$ m CMOS technology, this work costs a core area of only 4.49 mm $$^{2}$$ and meets the FFT real-time performance requirements of DVB-T2 systems when operated at 20 MHz frequency. The proposed design consumes only 1.44 nJ of energy per sample for computing FFTs. Through adopting the proposed low-memory-access algorithm, flexible length-adaptive architecture, and efficient 16-bank memory organization, 56 % power dissipation of the whole FFT chip can be saved. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
180. VLSI Implementation of Enhanced Edge Preserving Impulse Noise Removal Technique.
- Author
-
Deepa, P. and Vasanthanayaki, C.
- Abstract
In many applications, image and video signals are corrupted by impulse noise during acquisition or transmission. Hence there is a need for an efficient and consumer friendly impulse noise removal technique. In this paper, an efficient low cost VLSI architecture for the edge preserving impulse noise removal technique has been proposed. The architecture comprises of two line buffers, register banks, impulse noise detector, edge oriented noise filter and impulse arbiter. The storage space required for the proposed hardware is two line buffer rather than full frame memory. Moreover, proposed algorithm involves only fixed size window instead of variable window size. These two greatly reduces storage requirement as well as computation complexity. The impulse noise detector turns off the remaining circuitry if the current pixel is noise free, thus reducing power consumption. Further, the four stage pipeline architecture greatly improves the speed of operation. The implemented edge preserving algorithm results in better visual quality for denoised image. Thus the proposed architecture has less complexity, less storage requirement, low power consumption and improved speed of operation. The architecture has been implemented in Xilinx 9.2i and the results are tabulated for various images. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
181. FPGA implementation for generation of six phase pulse compression sequences.
- Author
-
Verma, M K and Premananda, B S
- Abstract
Pulse Compression technique is most widely used in Radar signal processing applications. For better Pulse Compression, peak signal to side lobe ratio i.e. Merit factor should be as high as possible so that the unwanted clutter gets suppressed. To achieve this, Phase Coded Pulse Compression sequences are widely used. The simple phase code is obtained from the Binary Pulse compression sequences but matched filtering of radar signals creates unwanted side lobes which may mask important information. The study of Poly Phase Pulse Compression sequences is carried out since these sequences have low side lobes and are better Doppler tolerant. When we move from Binary to Ternary and Ternary to Six Phase Pulse Compression VLSI systems, the memory requirements are increased, there by the area is increased and the real time implementation needs optimization of speed, area and power consumption. The paper concentrates on the design of an optimized model which can reduce these constraints. The proposed FPGA implementation can efficiently generate Six Phase Pulse Compression sequences while improving some of the parameters like area and speed when compared to previous methods. This module is implemented on FPGA as it provides the flexibility of re-configurability and reprograms ability. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
182. Efficient intra prediction VLSI architecture for HEVC standard.
- Author
-
Zhu, Hongxiang, Zhou, Wei, Qing, Dong, and Huang, Xiaodong
- Abstract
Intra prediction with fine directions is a critical feature in the new High Efficiency Video Coding (HEVC) standard because it provides significant performance gain. Different from the intra prediction in the H.264/AVC, this approach is more complicated in terms of computation and memory access, which makes the VLSI design very difficult. In this paper, with the analysis of the algorithm of the DC prediction mode and the planar prediction mode in HM, a reused adder VLSI architecture for DC mode and a high efficient VLSI architecture for planar mode are proposed for intra prediction. Implementation with TSMC 90 nm CMOS technology indicates that the proposed architecture can work at 357MHz operation frequency and 12970 logic gates acquired for DC prediction and can work at 308 MHz operation frequency and 57500 logic gates acquired for planar prediction. The processing latency of the proposed VLSI architecture can support the real-time processing of 4∶2∶0 format 4096×2048@30fps video sequences. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
183. Architecture design framework for flexible and configurable WiMAX OFDMA baseband transceiver.
- Author
-
Adiono, Trio and Sutisna, Nana
- Abstract
This paper describes a design framework for VLSI architecture design of WiMAX OFDMA Baseband Transceiver (Physical layer - PHY). Architecture design is emphasized on a balance of flexibility and implementation efficiency, in term of optimum area usage and data processing speed. Flexibility feature of proposed architecture is obtained by employing some design approaches, such as performing design partitioning to maintain design modularity, providing software access for configuration, and designing control unit. The design has successfully implemented in 0.13 µm CMOS technology. Implementation result shows that chip area is about 13.4 mm2. The implemented design can deliver throughput up to 41 Mbps. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
184. A high-performance architectural design for motion estimation in MPEG-4.
- Author
-
Guhagarkar, Nikhil R and Ahamed Shaik, Rafi
- Abstract
The key to high performance of MPEG-4 video compression lies in an efficient reduction of spatial and temporal redundancy. The main idea of inter prediction techniques is quick checking of the entire search area with efficient matching criterion viz. sum of absolute difference to eliminate the impossible or least matched candidates, followed by finer selection among the potentially best matched candidates. The macroblock with least SAD value will decide the motion vector. Due to object-based nature of MPEG-4, new SAD design with efficient computational ability, less area and less power in 0.18µm CMOS technology and operating frequency of 1.508GHz is proposed in the following paper. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
185. VLSI architectural design of zoomable real time spectrum analyzer.
- Author
-
Sarkar, Sumantra and Dhar, Anindya Sundar
- Abstract
Spectrum analyzer is perhaps the most useful instrument that finds its application in almost every branch of engineering. In this work, a pipelined architecture of a hardware economic spectrum analyzer has been presented that incorporates the zooming capability with a nominal increase in the hardware complexity. Sliding window DFT algorithm has been chosen for this purpose and CORDIC (Co-Ordinate Rotation DIgital Computer) module has been employed for implementing the architecture. Due to the recursive nature of the algorithm, the error accumulates with number of iterations with time. To prevent accretion of the error beyond the limit of the required accuracy, the proposed architecture operates two parallel pipes simultaneously in a time skewed manner. External control is provided to select the level of zooming. The design is implemented on XILINX Spartan3 FPGA and tested up to two level of zooming for various real-time input signals. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
186. A high throughput turbo decoder VLSI architecture for 3GPP LTE standard.
- Author
-
Ahmed, Ashfaq, Awais, Muhammad, Rehman, Ata ur, Maurizio, Martina, and Masera, Guido
- Abstract
This paper presents a highly parallel turbo decoding architecture for 3GPP LTE standard. High throughput is achieved by increasing the decoder parallelism and reducing window sizes. A batcher-sorting-based permutation network is presented which is able to support multi-standard applications. The proposed solution supports all codes specified by 3GPP LTE standard. High coding efficiency is achieved at low computational cost by exploiting an effective scheme for the initialization of forward and backward state metrics. The decoder achieves a maximum throughput of 285 Mbps at 200 MHz, occupying an area of 210 mm2 on 90-nm Standard Cell ASIC technology. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
187. Lifting-based VLSI Architectures for Two-Dimensional Discrete Wavelet Transform for Effective Image Compression.
- Author
-
Saeed, Ibrahim and Agustiawan, Herman
- Subjects
VERY large scale circuit integration ,WAVELETS (Mathematics) ,IMAGE compression ,COMPUTER network architectures ,MICROPROCESSORS ,COMPUTER storage devices ,COMPUTER input-output equipment - Abstract
In this paper, a new approach for designing and implementing lifting-based VLSI architectures for two-dimensional discrete wavelet transform (2-D DWT) is introduced. As a result, two high performance VLSI architectures that perform 2-D DWT for lossless 5/3 and lossy 9/7 filters are proposed. In addition, the architectures implement symmetric extension for boundary treatment. First, two pipelined architectures consisting of two stages, the row and column processors stages, were developed for 5/3 and 9/7 filters. The internal memory between the row and the column processors is reduced to a few registers. Second, in order to speedup the computation, fully pipelined datapath architectures for row and column processors were separately developed for each 5/3 and 9/7 filters that can be incorporated into the two architectures developed in the first part. Finally, 100% hardware utilization is achieved. [ABSTRACT FROM AUTHOR]
- Published
- 2008
188. FPGA Implementation of the Ternary Pulse Compression Sequences.
- Author
-
Balaji, N., Rao, M. Srinivasa, Rao, K. Subba, Singh, S. P., and Reddy, N. Madhusudhana
- Subjects
COMPUTER architecture ,FIELD programmable gate arrays ,TERNARY system ,PULSE compression radar ,STOCHASTIC convergence - Abstract
Ternary codes have been widely used in radar and communication areas, but the synthesis of ternary codes with good merit factor is a nonlinear multivariable optimization problem, which is usually difficult to tackle. To get the solution of above problem many global optimization algorithms like genetic algorithm, simulated annealing, and tunneling algorithm were reported in the literature. However, there is no guarantee to get global optimum point. In this paper, a novel and efficient VLSI architecture is proposed to design Ternary Pulse compression sequences with good Merit factor. The VLSI architecture is implemented on the Field Programmable Gate Array (FPGA) as it provides the flexibility of reconfigurability and reprogramability. The implemented architecture overcomes the drawbacks of non guaranteed convergence of the earlier optimization algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2008
189. VLSI Architecture for Combined R2B, R4B and R8B FFT using SDF and Modified CSLA
- Author
-
Devendra Kumar Patel and Shivraj Singh
- Subjects
Very-large-scale integration ,Vlsi architecture ,Computer science ,Path delay ,Equipment use ,Fast Fourier transform ,Carry-select adder ,Point (geometry) ,Arithmetic ,Constant (mathematics) - Abstract
The FFT is enumerate is DFT and DFT is enumerate is consecutive way, it accomplishes continuous application with constant preparing when the information is persistently taken care of through the processor. Included paper, joined is radix-2 butterfly (R 2 B), R 4 B & R 8 B components based single path delay feedback (SDF) and modified carry select adder (MCSLA) technique, for diminishing the computational stages and for decreasing the equipment use than the R2B and R 4 B FFT. The implemented SDF technique has single delay commutators at one stage without exception. N/2 point is consecutive controlled in consequence of delay component. The proposed technique has less number of multipliers and the more modest number of computational stages and butterfly components than the Radix-2 & 4 FFT.
- Published
- 2021
- Full Text
- View/download PDF
190. Performance Analysis of Booth Multiplier-Based FIR in DWT Image Processing Applications
- Author
-
R Ramesh, B. Nithya, K Hema Priya, and S. Tamilselvan
- Subjects
Vlsi architecture ,Finite impulse response ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,% area reduction ,Image processing ,Multiplier (economics) ,Booth's multiplication algorithm ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,business ,Computer hardware ,Convolution - Abstract
The main objective of this work is to suggest an FIR filter for convolution-based DWT in image processing. Image processing using filters mainly suffers from the delay caused by the multiplier unit. Herewith, we have proposed booth multiplier-based MAC architecture for the design of FIR filter. Hence, the overall system achieves maximum speed improvement of 13.83% and area reduction by 22.39%. This work assures that the performance improved VLSI architecture for image processing with DWT techniques.
- Published
- 2021
- Full Text
- View/download PDF
191. A Kind of Design for CCSDS Standard GF(28) Multiplier
- Author
-
Dacheng Cao, Wei Zhang, Hao Zhang, and Aihua Dong
- Subjects
Vlsi architecture ,Resource (project management) ,Computer science ,Dual basis ,Multiplier (economics) ,Multiplication ,Field based ,Arithmetic ,GF(2) ,Electronic circuit - Abstract
Through theoretical analysis, the calculation method of dual basis multiplication in GF (28) field based on CCSDS Berlekamp is given. Based on this calculation method, a VLSI architecture for parallel multiplication and serial operation in circuits is proposed. At the same time, the hardware resource occupation and the timing performance of each VLSI architecture are analyzed in detail.
- Published
- 2021
- Full Text
- View/download PDF
192. A Scalable VLSI Architecture for Real-Time and Energy-Efficient Sparse Approximation in Compressive Sensing Systems
- Author
-
Ren, Fengbo
- Subjects
Electrical engineering ,Computer engineering ,Compressive Sensing ,Energy-Efficient Design ,Integrated Circuit ,Sparse Approximation ,VLSI Architecture ,Wireless Health - Abstract
Digital electronic industry today relies on Nyquist sampling theorem, which requires to double the size (sampling rate) of the signal representation on the Fourier basis to avoid information loss. However, most natural signals have very sparse representations on some other orthogonal (non-Fourier) basis. This mismatch implies a large redundancy in Nyquist-sampled data, making compression a necessity prior to storage or transmission. Recent advances in compressive sensing (CS) theory offer us an alternative data acquisition framework, which can greatly impact power-starved applications such as wireless sensors. CS techniques provide a universal approach to sample compressible signals at a rate significantly below the Nyquist rate with limited information loss. Therefore, CS is a promising technology for realizing configurable, cost-effective, miniaturized, and ultra-low-power data acquisition devices for mobile and wearable applications.However, the digital signal processing of compressively-sampled data involves solving a sparse approximation problem, which requires iterative-searching algorithms that have high computational complexity and require intensive memory access. As a result, existing software solutions are neither energy-efficient nor cost-effective for real-time processing of compressively-sampled data, especially when the processing is to be performed on energy-limited devices. To solve this problem, this dissertation presents a scalable VLSI architecture that can be implemented on field-programmable gate arrays (FPGAs) or system-on-chips (SoCs) to perform dedicated-hardware-driven sparse approximation. A VLSI soft-IP core of the sparse approximation engine is developed in Verilog-HDL, which supports a floating-point data format with 10 design parameters, providing a high dynamic range and the flexibility for application-specific user customizations. Taking advantage of the algorithm-architecture co-design that leverages algorithm reformulations, configurable architectures, and efficient memory mapping schemes, the proposed VLSI architecture features a 100% utilization of the computing resources and is scalable in terms of computation parallelism and memory capability.The hardware emulation of the soft-IP core on a 28-nm Xilinx Kintex-7 FPGA shows that our design achieves the same level of accuracy as the double-precision C program running on an Intel Core i7-4700MQ mobile processor, while providing 47-147x speed-up for ECG signal reconstruction. Furthermore, a 12-237 KS/s 12.8 mW sparse approximation engine chip is realized in a 40-nm CMOS technology for enabling the mobile data aggregation of compressively sampled biomedical signals in CS-based wireless health monitoring systems. The measurement results show that the sparse approximation engine chip operating at the minimum energy point achieves a real-time throughput for reconstructing 61-237 channels of biomedical signals simultaneously with
- Published
- 2014
193. A High Parallelism Hardware Architecture Design of the H.264/AVC Integer Motion Estimation for Applications in Real-time DTTV Transmissions.
- Author
-
Lunarejo, José Luis Santos and Cárdenas, Carlos Silva
- Abstract
The H.264/AVC is the Standard Video Format used by the SBTVD (Sistema Brasileiro de Televisão Digital), with presence in almost all the countries in South America, that allows transmissions in Full High Definition (Full HD) video quality. So this work presents a hardware architecture design of the Motion Estimation algorithm used in the Standard, as the higher computational processing is located in this part, so we take advantage of the high parallelism characteristics of the designs made in hardware to achieve faster processing and hence real-time broadcasts. The design was described using VHDL and synthesized to the Altera Cyclone II FPGA being able to process Full HD video (1920x1080 pixels) in real-time. The results establish a maximum operation frequency of 183.55 MHz, and at this speed it can process 35 frames per second. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
194. VLSI architecture for LGXP texture for face recognition.
- Author
-
Kannan, P. and Shantha Selva Kumari, R.
- Subjects
- *
VERY large scale circuit integration , *GABOR filters , *LOGIC circuits , *VERILOG (Computer hardware description language) , *FIELD programmable gate arrays , *FEATURE extraction - Abstract
VLSI architecture for face recognition system based on Local Gabor XOR Pattern (LGXP) feature extraction method is presented in this paper. LGXP is utilized to encode Gabor phase variations and to extract feature with the help of Gabor filter and Local XOR Pattern (LXP) operator. VLSI architecture for Gabor Filter and a Behavioral model for LXP operator for feature extraction are investigated. Also a behavioral model for Similarity matching is designed using Verilog language. The similarity matching for face recognition is executed by L1 distance measure. Therefore our approach explores the effectiveness of Gabor phase information on FPGA platform by addressing the drawbacks like computational complexity and hardware complexity by mapping the algorithms. The proposed approach is designed on virtex-5 device using Veriolg HDL in Xilinx ISE tool and the logic utilization results will be generated using synthesis tool while the power consumption report will be analyzed using Xpower analysis tool. Also the effectiveness of our design is evaluated with FAR, FRR and accuracy plot in Matlab simulation environment. Research outcome of our proposed face recognition system over UPC face database is 72.225% Accuracy for distance matching threshold of '5'. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
195. A novel real-time resource efficient implementation of Sobel operator-based edge detection on FPGA.
- Author
-
Singh, Sanjay, Saini, Anil K, Saini, Ravi, Mandal, A.S., Shekhar, Chandra, and Vohra, Anil
- Subjects
- *
REAL-time computing , *EDGE detection (Image processing) , *OPERATOR theory , *FIELD programmable gate arrays , *COMPUTER input-output equipment , *COMPUTER architecture - Abstract
A new resource efficient FPGA-based hardware architecture for real-time edge detection using Sobel operator for video surveillance applications has been proposed. The choice of Sobel operator is due to its property to counteract the noise sensitivity of the simple gradient operator. FPGA is chosen for this implementation due to its flexibility to provide the possibility to perform algorithmic changes in later stage of the system development and its capability to provide real-time performance, hard to achieve with general purpose processor or digital signal processor, while limiting the extensive design work, time and cost required for application specific integrated circuit. The proposed architecture uses single processing element for both horizontal and vertical gradient computation for Sobel operator and utilised approximately 38% less FPGA resources as compared to standard Sobel edge detection architecture while maintaining real-time frame rates for high definition videos (1920 × 1080 image sizes). The complete system is implemented on Xilinx ML510 (Virtex-5 FX130T) FPGA board. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
196. A 0.79 pJ/K-Gate, 83% Efficient Unified Core and Voltage Regulator Architecture for Sub/Near-Threshold Operation in 130 nm CMOS.
- Author
-
Zhang, Sai, Tu, Jane S., Shanbhag, Naresh R., and Krein, Philip T.
- Subjects
VOLTAGE regulators ,COMPLEMENTARY metal oxide semiconductors ,ENERGY consumption ,CAPACITOR switching ,ELECTRIC controllers ,SYNCHRONOUS capacitors - Abstract
This paper presents the compute voltage regulator module (C-VRM), an architecture that embeds the information processing subsystem into the energy delivery subsystem for ultra-low power (ULP) platforms. The C-VRM employs multiple voltage domain stacking and core swapping to achieve high total system energy efficiency in near/sub-threshold region. Energy models for the C-VRM are derived, and employed in system simulations to compare the energy efficiency benefits of the C-VRM over a switched capacitor VRM (SC-VRM). A prototype IC incorporating a C-VRM and a SC-VRM supplying energy to an 8-tap fully folded FIR filter core is implemented in a 1.2 V, 130 nm CMOS process. Measured results indicate that the C-VRM has up to 44.8% savings in system-level energy per operation (Eop) compared to the SC-VRM system, and an efficiency \eta ranging from 79% to 83% over an output voltage range of 0.52 V to 0.6 V. Measured values of the Eop and \eta match those predicted by system simulations thereby validating the energy models. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
197. An Efficient NLMS-based VLSI Architecture for Robust FECG Extraction and FHR Processing
- Author
-
Sergio Bampi, Patricia Ucker Leleu da Costa, Eduardo Costa, Guilherme Paim, Sergio Almeida, and Leandro M. G. Rocha
- Subjects
Very-large-scale integration ,Adaptive filter ,Vlsi architecture ,Fetal heart rate ,Computer science ,business.industry ,embryonic structures ,Pattern recognition ,Artificial intelligence ,Fetal electrocardiogram ,Ecg signal ,business - Abstract
This work presents an efficient NLMS-based VLSI architecture to extract the fetal electrocardiogram (FECG) and detect the fetal heart rate (FHR), using the adaptive filter strategy. The efficient NLMS-based architecture herein investigated can robustly cancel the high-noised mother-related ECG signals, enabling the FHR measurement. We used the Improved Fetal Pan and Tompkins Algorithm (IFPTA) to detect fetal R-peak and calculate the FHR. Our NLMS-based VLSI architecture effectively detects the R-peaks in the extracted FECG with 93.2% accuracy with the only 2.4 mW of total power dissipation.
- Published
- 2020
- Full Text
- View/download PDF
198. A Regular VLSI Architecture of Motion Vector Prediction for Multiple-Standard MPEG-Like Video Codec.
- Author
-
Yin, Haibing, Li, Shizhong, Qi, Honggang, and Hu, Hongqi
- Abstract
Motion vector (MV) prediction and residue coding technique is adopted to fully utilize the motion field redundancy in the prevailing video standards, and MV prediction is desired in both video encoder and decoder. The computation burden for MV prediction is not very high. However, there is high irregularity in raw MV prediction algorithm with two-stage and four-level hierarchical tree control flows. It makes efficient VLSI architecture implementation challenging. The high irregularity is mainly derived from the abundant inter prediction modes including variable block size partition and temporal prediction direction, as well as the irregular control flow of the MV prediction algorithm. This paper proposes a highly regular architecture to implement MV prediction for multi-standard video codec. Complex control logic is simplified by regularly table look-up of the control parameters predefined and stored in on-chip tables. The parameters of the current macroblock (MB) and its neighboring blocks are initialized and refreshed in a regular manner. Moreover, pipelining and parallelism are employed in the proposed architecture to improve throughput efficiency and tradeoff between hardware cost and efficiency. Simulation results verify the effectiveness of the proposed design. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
199. Performance Characterization of AES Datapath Architectures in 90-nm Standard Cell CMOS Technology.
- Author
-
Wang, Cheng and Heys, Howard
- Abstract
In this paper, we characterize the performance of datapath architectures of the Advanced Encryption Standard (AES). These architectures are parameterized by a datapath width of 8, 16, 32, 64, or 128 bits and, for the 128-bit width, an unrolling factor of 1, 2, 5 or 10. Composite field S-boxes are adopted for all the architectures and shift registers based ShiftRows and MixColumns components are used for architectures with datapath widths of less than 128 bits. Their performance in terms of area, peak power and average energy is benchmarked using a 90-nm standard cell CMOS technology under a variety of throughput requirements. Through this characterization, the performance trade-offs affected by the architecture parameters are extensively explored. The parameters leading to the best performance are identified. It is found that the 8-bit width datapath, which is conventionally adopted for resource efficient purposes, has the worst energy efficiency and does not result in the minimal peak power among the architectures. As well, the 16, 32 and 64-bit width AES datapath architectures are newly considered or represent improvements over previous work. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
200. Motion-sensor fusion-based gesture recognition and its VLSI architecture design for mobile devices.
- Author
-
Zhu, Wenping, Liu, Leibo, Yin, Shouyi, Hu, Siqi, Tang, Eugene Y., and Wei, Shaojun
- Subjects
- *
GESTURE , *FUSION (Phase transformation) , *PATTERN recognition systems , *VERY large scale circuit integration , *HUMAN-computer interaction , *SMARTPHONES - Abstract
With the rapid proliferation of smartphones and tablets, various embedded sensors are incorporated into these platforms to enable multimodal human–computer interfaces. Gesture recognition, as an intuitive interaction approach, has been extensively explored in the mobile computing community. However, most gesture recognition implementations by now are all user-dependent and only rely on accelerometer. In order to achieve competitive accuracy, users are required to hold the devices in predefined manner during the operation. In this paper, a high-accuracy human gesture recognition system is proposed based on multiple motion sensor fusion. Furthermore, to reduce the energy overhead resulted from frequent sensor sampling and data processing, a high energy-efficient VLSI architecture implemented on a Xilinx Virtex-5 FPGA board is also proposed. Compared with the pure software implementation, approximately 45 times speed-up is achieved while operating at 20 MHz. The experiments show that the average accuracy for 10 gestures achieves 93.98% for user-independent case and 96.14% for user-dependent case when subjects hold the device randomly during completing the specified gestures. Although a few percent lower than the conventional best result, it still provides competitive accuracy acceptable for practical usage. Most importantly, the proposed system allows users to hold the device randomly during operating the predefined gestures, which substantially enhances the user experience. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.