153 results on '"FPGA implementations"'
Search Results
2. When Bad News Become Good News
- Author
-
Davide Bellizia, Clément Hoffmann, Dina Kamel, Pierrick Méaux, and François-Xavier Standaert
- Subjects
Learning With Errors ,Physical Assumptions ,FPGA Implementations ,Computer engineering. Computer hardware ,TK7885-7895 ,Information technology ,T58.5-58.64 - Abstract
Hard physical learning problems have been introduced as an alternative option to implement cryptosystems based on hard learning problems. Their high-level idea is to use inexact computing to generate erroneous computations directly, rather than to first compute correctly and add errors afterwards. Previous works focused on the applicability of this idea to the Learning Parity with Noise (LPN) problem as a first step, and formalized it as Learning Parity with Physical Noise (LPPN). In this work, we generalize it to the Learning With Errors (LWE) problem, formalized as Learning With Physical Errors (LWPE). We first show that the direct application of the design ideas used for LPPN prototypes leads to a new source of (mathematical) data dependencies in the error distributions that can reduce the security of the underlying problem. We then show that design tweaks can be used to avoid this issue, making LWPE samples natively robust against such data dependencies. We additionally put forward that these ideas open a quite wide design space that could make hard physical learning problems relevant in various applications. And we conclude by presenting a first prototype FPGA design confirming our claims.
- Published
- 2022
- Full Text
- View/download PDF
3. FPGA Realizations of Chaotic Epidemic and Disease Models Including Covid-19
- Author
-
M. Elnawawy, F. Aloul, A. Sagahyroon, A. S. Elwakil, Wafaa S. Sayed, Lobna A. Said, S. M. Mohamed, and Ahmed G. Radwan
- Subjects
Chaos ,chaotic circuits ,epidemic models ,FPGA implementations ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The spread of epidemics and diseases is known to exhibit chaotic dynamics; a fact confirmed by many developed mathematical models. However, to the best of our knowledge, no attempt to realize any of these chaotic models in analog or digital electronic form has been reported in the literature. In this work, we report on the efficient FPGA implementations of three different virus spreading models and one disease progress model. In particular, the Ebola, Influenza, and COVID-19 virus spreading models in addition to a Cancer disease progress model are first numerically analyzed for parameter sensitivity via bifurcation diagrams. Subsequently and despite the large number of parameters and large number of multiplication (or division) operations, these models are efficiently implemented on FPGA platforms using fixed-point architectures. Detailed FPGA design process, hardware architecture and timing analysis are provided for three of the studied models (Ebola, Influenza, and Cancer) on an Altera Cyclone IV EP4CE115F29C7 FPGA chip. All models are also implemented on a high performance Xilinx Artix-7 XC7A100TCSG324 FPGA for comparison of the needed hardware resources. Experimental results showing real-time control of the chaotic dynamics are presented.
- Published
- 2021
- Full Text
- View/download PDF
4. Enabling real-time object detection on low cost FPGAs.
- Author
-
Jain, Vikram, Jadhav, Ninad, and Verhelst, Marian
- Abstract
Object detection using convolutional neural networks (CNNs) has garnered a lot of interest due to their high performance capability. Yet, the large number of operations and memory fetches to both on-chip and external memory needed for such CNNs result in high latency and power dissipation on resource constrained edge devices, hence impeding their real-time operation from a battery supply. In this paper, a resource and cost efficient hardware accelerator for CNN is implemented on an FPGA. Using an existing metric DSP efficiency and a new metric Cost efficiency as the primary optimization variables, exploration of algorithms and hardware using a design space exploration tool, called ZigZag, is undertaken. An optimized architecture is implemented on a Xilinx XC7Z035 FPGA and tiny-YOLOv2 is mapped to demonstrate the real-time object detection application. Compared to the state-of-the-art (SotA), the implementation results shows that the hardware achieves the best DSP efficiency at 90% and Cost efficiency at 0.146. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Digital Implementation of Oscillatory Neural Network for Image Recognition Applications
- Author
-
Madeleine Abernot, Thierry Gil, Manuel Jiménez, Juan Núñez, María J. Avellido, Bernabé Linares-Barranco, Théophile Gonos, Tanguy Hardelin, and Aida Todri-Sanial
- Subjects
artificial intelligence ,auto-associative memory ,FPGA implementations ,learning rules ,oscillatory neural networks ,pattern recognition ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
Computing paradigm based on von Neuman architectures cannot keep up with the ever-increasing data growth (also called “data deluge gap”). This has resulted in investigating novel computing paradigms and design approaches at all levels from materials to system-level implementations and applications. An alternative computing approach based on artificial neural networks uses oscillators to compute or Oscillatory Neural Networks (ONNs). ONNs can perform computations efficiently and can be used to build a more extensive neuromorphic system. Here, we address a fundamental problem: can we efficiently perform artificial intelligence applications with ONNs? We present a digital ONN implementation to show a proof-of-concept of the ONN approach of “computing-in-phase” for pattern recognition applications. To the best of our knowledge, this is the first attempt to implement an FPGA-based fully-digital ONN. We report ONN accuracy, training, inference, memory capacity, operating frequency, hardware resources based on simulations and implementations of 5 × 3 and 10 × 6 ONNs. We present the digital ONN implementation on FPGA for pattern recognition applications such as performing digits recognition from a camera stream. We discuss practical challenges and future directions in implementing digital ONN.
- Published
- 2021
- Full Text
- View/download PDF
6. Learning Parity with Physical Noise: Imperfections, Reductions and FPGA Prototype
- Author
-
Davide Bellizia, Clément Hoffmann, Dina Kamel, Hanlin Liu, Pierrick Méaux, François-Xavier Standaert, and Yu Yu
- Subjects
Learning Parity with Noise ,Physical Assumptions ,Physical Defaults ,Security Reductions ,FPGA Implementations ,Side-Channel Security ,Computer engineering. Computer hardware ,TK7885-7895 ,Information technology ,T58.5-58.64 - Abstract
Hard learning problems are important building blocks for the design of various cryptographic functionalities such as authentication protocols and post-quantum public key encryption. The standard implementations of such schemes add some controlled errors to simple (e.g., inner product) computations involving a public challenge and a secret key. Hard physical learning problems formalize the potential gains that could be obtained by leveraging inexact computing to directly generate erroneous samples. While they have good potential for improving the performances and physical security of more conventional samplers when implemented in specialized integrated circuits, it remains unknown whether physical defaults that inevitably occur in their instantiation can lead to security losses, nor whether their implementation can be viable on standard platforms such as FPGAs. We contribute to these questions in the context of the Learning Parity with Physical Noise (LPPN) problem by: (1) exhibiting new (output) data dependencies of the error probabilities that LPPN samples may suffer from; (2) formally showing that LPPN instances with such dependencies are as hard as the standard LPN problem; (3) analyzing an FPGA prototype of LPPN processor that satisfies basic security and performance requirements.
- Published
- 2021
- Full Text
- View/download PDF
7. Digital Implementation of Oscillatory Neural Network for Image Recognition Applications.
- Author
-
Abernot, Madeleine, Gil, Thierry, Jiménez, Manuel, Núñez, Juan, Avellido, María J., Linares-Barranco, Bernabé, Gonos, Théophile, Hardelin, Tanguy, and Todri-Sanial, Aida
- Subjects
ARTIFICIAL intelligence ,IMAGE recognition (Computer vision) ,ARTIFICIAL neural networks ,PATTERN recognition systems - Abstract
Computing paradigm based on von Neuman architectures cannot keep up with the ever-increasing data growth (also called "data deluge gap"). This has resulted in investigating novel computing paradigms and design approaches at all levels from materials to system-level implementations and applications. An alternative computing approach based on artificial neural networks uses oscillators to compute or Oscillatory Neural Networks (ONNs). ONNs can perform computations efficiently and can be used to build a more extensive neuromorphic system. Here, we address a fundamental problem: can we efficiently perform artificial intelligence applications with ONNs? We present a digital ONN implementation to show a proof-of-concept of the ONN approach of "computing-in-phase" for pattern recognition applications. To the best of our knowledge, this is the first attempt to implement an FPGA-based fully-digital ONN. We report ONN accuracy, training, inference, memory capacity, operating frequency, hardware resources based on simulations and implementations of 5 × 3 and 10 × 6 ONNs. We present the digital ONN implementation on FPGA for pattern recognition applications such as performing digits recognition from a camera stream. We discuss practical challenges and future directions in implementing digital ONN. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Efficient FPGA architecture of optimized Haar wavelet transform for image and video processing applications.
- Author
-
Sarkar, Sayantam and Bhairannawar, Satish S.
- Abstract
Discrete Wavelet Transform (DWT) is widely used in digital image and video processing due to its various advantages over other similar transform techniques. In this paper, efficient hardware architecture of Optimized Haar Wavelet Transform is proposed which is modeled using Optimized Kogge–Stone Adder/Subtractor, Optimized Controller, Buffer, Shifter and D_FF blocks. The existing Kogge–Stone Adder architecture is optimized by using Modified Carry Correction block which uses parallel architecture to reduce the computational delay. Similarly, the Controller block is optimized by using Clock Dividers and Reset Counter interdependently. To preserve the accuracy of the processed data, suitable size of intermediate bits in fractional format with the help of Q-notation is considered. The comparison results show that the proposed architecture performs better than existing ones concerning both hardware utilization and data accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Reducing risks through simplicity: high side-channel security for lazy engineers.
- Author
-
Bronchain, Olivier, Schneider, Tobias, and Standaert, François-Xavier
- Abstract
Countermeasures against side-channel attacks are in general expensive, and a lot of research has been devoted to the optimization of their security versus performance trade-off. Besides, a wide literature has also shown that implementing such countermeasures is an error-prone task and requires to deal with various engineering challenges (e.g., physical defaults, compositional errors,...). This work aims to contribute to this second item, by evaluating the extent to which (almost) key-homomorphic primitives, and in particular a recent PRF instance based on the learning with rounding problem, can lead to easy-to-implement and easier-to-evaluate side-channel-secure designs. We confirm these properties by describing an FPGA implementation that does not require complex (compositional) reasoning in its analysis and can be masked securely under simple design conditions, and for which the evaluation directly scales to arbitrary number of shares. We provide a comprehensive performance and (worst-case) security analysis of our design and compare the obtained results with those of an AES implementation protected with the domain-oriented masking scheme. Results show that simplicity has a cost, which becomes less prohibitive as security requirements increase. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. A Review of Synthetic-Aperture Radar Image Formation Algorithms and Implementations: A Computational Perspective
- Author
-
Helena Cruz, Mário Véstias, José Monteiro, Horácio Neto, and Rui Policarpo Duarte
- Subjects
synthetic-aperture radar ,SAR algorithms ,SAR systems ,FPGA implementations ,GPU implementations ,many-core implementations ,Science - Abstract
Designing synthetic-aperture radar image formation systems can be challenging due to the numerous options of algorithms and devices that can be used. There are many SAR image formation algorithms, such as backprojection, matched-filter, polar format, Range–Doppler and chirp scaling algorithms. Each algorithm presents its own advantages and disadvantages considering efficiency and image quality; thus, we aim to introduce some of the most common SAR image formation algorithms and compare them based on these two aspects. Depending on the requisites of each individual system and implementation, there are many device options to choose from, for instance, FPGAs, GPUs, CPUs, many-core CPUs, and microcontrollers. We present a review of the state of the art of SAR imaging systems implementations. We also compare such implementations in terms of power consumption, execution time, and image quality for the different algorithms used.
- Published
- 2022
- Full Text
- View/download PDF
11. Hardware architectures for PRESENT block cipher and their FPGA implementations.
- Author
-
Pandey, Jai Gopal, Goel, Tarun, and Karmakar, Abhijit
- Abstract
Data security is essential for the proliferation of the Internet of things and cyber‐physical system technologies. Data security can be efficiently achieved by incorporating lightweight cryptography techniques. In this study, a set of high‐performance hardware architectures for PRESENT lightweight block cipher are proposed that perform encryption, decryption and integrated encryption/decryption operations. Datapath of the architectures is of 64 bit width that supports standard 80 and 128 bits key lengths. The architectures are synthesised on Xilinx Virtex‐5 XC5VLX110T (ff1136‐1) field‐programmable gate array device of ML‐505 platform. To perform functional verification, a large number of test vectors are used. Performance measurement is performed by evaluating maximum frequency, throughput, power dissipation and energy consumption. Experimentally, it is found that the proposed architectures are resource‐efficient, high‐performance and suitable for lightweight, latency‐critical and low‐power applications in comparison with existing architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
12. True Random Number Generator based on RO-PUF
- Author
-
Rojas Muñoz, Luis Felipe, Sánchez-Solano, Santiago, Martínez Rodríguez, Macarena Cristina, Brox, Piedad, and European Commission
- Subjects
Randomness tests ,Ring Oscillator PUFs ,FPGA implementations ,True Random Number Generators - Abstract
The implementation of true random number generators is of vital importance to preserve the reliability of cryptographic systems. The lack of entropy can compromise their integrity, affecting the security of the entire chain of applications. Ensuring the effectiveness of a random number generator can be understood as reducing the risk of information loss due to possible attacks by third parties. This paper presents a novel approach for a true random number generator based on a Ring Oscillator-Physical Unclonable Function. Since the principle of operation of physical unclonable functions is based on the physical properties of each device, they can be used for security applications such as device identification, counterfeit prevention, and increase the robustness of cryptographic functions. In addition, increasing the versatility of the design to use them as a source of entropy, they can also fulfill tasks such as generation of initialization vectors or nonces and keys for symmetric cryptography. The system incorporates multiple operating configurations, which allows a complete analysis of its performance to adapt it to different application scenarios. The randomness and correct operation of the proposed design have been evaluated online, by incorporating it into a hybrid HW/SW embedded system able to run the official test suite published by the National Institute of Standards and Technology without any need for post-processing. The architecture has been designed for Xilinx Zynq-700 family devices and implemented on the Pynq-Z2 development board., SPIRS Project with Grant Agreement No. 952622 under the EU H2020 research and innovation program, ARES Project PID2020-116664RB-100 funded by MCIN/AEI/10.13039/501100011033 and the EU NextGenerationEU/PRTR., M.C.M.R. holds a Postdoc fellowship from the Andalusia Government with support from PO FSE of EU.
- Published
- 2022
13. Learning Parity with Physical Noise: Imperfections, Reductions and FPGA Prototype
- Author
-
Dina Kamel, Clément Hoffmann, Hanlin Liu, Yu Yu, Davide Bellizia, François-Xavier Standaert, Pierrick Méaux, and UCL - SST/ICTM/ELEN - Pôle en ingénierie électrique
- Subjects
Computer engineering. Computer hardware ,Physical Assumptions ,Computer science ,Learning Parity with Noise ,Side-Channel Security ,Information technology ,T58.5-58.64 ,FPGA Implementations ,Security Reductions ,TK7885-7895 ,Communication noise ,Physical Defaults ,Electronic engineering ,Parity (mathematics) ,FPGA prototype - Abstract
Hard learning problems are important building blocks for the design of various cryptographic functionalities such as authentication protocols and post-quantum public key encryption. The standard implementations of such schemes add some controlled errors to simple (e.g., inner product) computations involving a public challenge and a secret key. Hard physical learning problems formalize the potential gains that could be obtained by leveraging inexact computing to directly generate erroneous samples. While they have good potential for improving the performances and physical security of more conventional samplers when implemented in specialized integrated circuits, it remains unknown whether physical defaults that inevitably occur in their instantiation can lead to security losses, nor whether their implementation can be viable on standard platforms such as FPGAs. We contribute to these questions in the context of the Learning Parity with Physical Noise (LPPN) problem by: (1) exhibiting new (output) data dependencies of the error probabilities that LPPN samples may suffer from; (2) formally showing that LPPN instances with such dependencies are as hard as the standard LPN problem; (3) analyzing an FPGA prototype of LPPN processor that satisfies basic security and performance requirements.
- Published
- 2021
- Full Text
- View/download PDF
14. A narrowband interference suppressor in pulse signal detection and its FPGA implementations.
- Author
-
Yu Han
- Subjects
SIGNAL detection ,PULSE modulation ,INTERFERENCE (Telecommunication) ,INTERFERENCE suppression ,FIELD programmable gate arrays - Published
- 2016
15. Slow-Envelope Shaping Function FPGA Implementation for 5G NR Envelope Tracking PA
- Author
-
W. Li, Nikolaos Bartzoudis, José Rubio, David López, G. Montoro, P. Gilabert, Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. CSC - Components and Systems for Communications Research Group
- Subjects
Economic and social effects ,FPGA implementations ,Power amplifiers ,Amplificadors de potència ,Rate reduction ,Field programmable gate arrays (FPGA) ,Shaping function ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] ,Slew rate ,Envelope tracking power amplifiers ,5G mobile communication systems ,Shaping function 5G mobile communication systems ,High-level synthesis ,Integrated circuit design ,FPGAs implementation ,Envelope tracking ,Power amplifier linearization ,Computer hardware description languages ,Synthesis design ,FPGA ,High level synthesis - Abstract
This paper focuses on the FPGA implementation of a slew-rate reduction (SR) shaping function for envelope tracking (ET) power amplifiers (PAs). The SR envelope has been proved effective to trade-off power efficiency and linearity in ET PA systems where the envelope tracking modulator (ETM) is bandwidth limited. However, the implementation issues need to be addressed when targeting high clock rates to cope with current 5G new radio wide-band signals. This paper shows the FPGA implementation of the SR envelope generation. We explore the use of high-level synthesis (HLS) for the SR envelope generation to evaluate the performance and resource usage of the hardware architecture. The HLS design is also compared with a hand-written hardware description language (HDL) version. An in-depth analysis shows strengths and limitations of the HLS design to meet the timing constraints when considering a throughput of 614.4 MSa/s. © 2022 IEEE.
- Published
- 2022
- Full Text
- View/download PDF
16. Digital Implementation of Oscillatory Neural Network for Image Recognition Application
- Author
-
Manuel Jimenez, Thierry Gil, Aida Todri-Sanial, María J. Avellido, Théophile Gonos, Madeleine Abernot, Tanguy Hardelin, Bernabe Linares-Barranco, Juan Núñez, Smart Integrated Electronic Systems (SmartIES), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Instituto de Microelectrónica de Sevilla (IMSE-CNM), Universidad de Sevilla-Centro Nacional de Microelectronica [Spain] (CNM)-Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), A.I.Mergence [Paris], European Project: 871501,H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP - Leadership in enabling and industrial technologies - Information and Communication Technologies (ICT),H2020-ICT-2019-2,NeurONN(2020), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Instituto de Microelectrónica de Sevilla (IMSE-CNM CSIC), Centro Nacional de Microelectronica [Spain] (CNM)-Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), Universidad de Sevilla / University of Sevilla-Centro Nacional de Microelectronica [Spain] (CNM)-Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), Universidad de Sevilla. Departamento de Electrónica y Electromagnetismo, and European Union (UE). H2020
- Subjects
Learning rules ,[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR] ,Artificial intelligence ,Auto-associative memory ,FPGA implementations ,Computer science ,Neurosciences. Biological psychiatry. Neuropsychiatry ,02 engineering and technology ,[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] ,01 natural sciences ,Autoassociative memory ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Pattern recognition ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,[SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics ,Field-programmable gate array ,Implementation ,Original Research ,010302 applied physics ,oscillatory neural networks ,Oscillatory neural networks ,Artificial neural network ,Hebbian learning rule ,General Neuroscience ,Oscillatory neural network ,Neuromorphic engineering ,Computer engineering ,FPGA implementation ,Pattern recognition (psychology) ,020201 artificial intelligence & image processing ,Storkey learning rule ,Applications of artificial intelligence ,Unconventional computing ,RC321-571 ,Neuroscience - Abstract
Computing paradigm based on von Neuman architectures cannot keep up with the ever-increasing data growth (also called "data deluge gap"). This has resulted in both the academic and industrial community investigating novel computing paradigms and design approaches at all levels from materials, devices, circuits, architectures, and all the way to system-level implementations and applications. For example, to improve performance, the community has been investigating solutions through massively parallel and distributed systems that are a rupture from von Neumann architectures. As artificial neural networks (ANN) and deep neural networks (DNN) that are trained over hundreds of graphic processing units (GPU)-accelerated servers where each GPU can have thousands of cores. The limitations of data processing through the memory wall in von Neumann architectures have been overcome with bringing computing to the data inspired by biological brainlike computing. An alternative computing approach based on ANNs uses oscillators to compute or oscillatory neural networks (ONNs). Such an approach differs from classical CMOS and classical von Neumann where building blocks are analog and perform computations efficiently. Moreover, data is encoded on the oscillator signals phase, which is a departure from the classical voltage level-based data encoding (such as amplitude voltage to represent a logical bit '1' or '0'). ONNs can perform computations efficiently and can be used to build a more extensive neuromorphic system. How should a designer choose to optimally implement ONNs in analog is the focus of many ongoing research efforts. But here, we address a more fundamental problem, can we efficiently perform AI applications (such as image/pattern recognition) with ONNs? In other words, what are the advantages and limitations of the ONN computing paradigm for practical AI applications? Here, we present a digital ONN implementation to show a proof-of-concept of the ONN approach of "computing-in-phase" for pattern recognition applications. To the best of our knowledge, this is the first attempt to implement an FPGA-based fully-digital ONN. We report ONN accuracy, training, inference, memory capacity, operating frequency, hardware resources based on simulations and implementations of 5x3 and 10x6 ONNs. We present the digital ONN implementation on FPGA for pattern recognition applications such as performing digits recognition from a camera stream. We discuss practical challenges and future directions in implementing digital ONN.
- Published
- 2021
- Full Text
- View/download PDF
17. A Low-Cost Bug Hunting Verification Methodology for RISC-V-Based Processors
- Author
-
Camilo Rojas, Hanssel Morales, and Elkim Roa
- Subjects
Software ,Unit testing ,Software bug ,Computer science ,business.industry ,Processor design ,Embedded system ,RISC-V ,Software development ,Fpga implementations ,business ,Agile software development - Abstract
Agile hardware design strategies have shown a fast adoption in academia and industry by bringing ideas from the software development side. However, adopted design methodologies exhibit traditional verification scenarios based on handmade testbenches. Here we describe a verification methodology for RISC-V-based processors with human-independent testbenches creation, employing high-effort verification methods throughout all processor design cycle. We demonstrated the methodology by performing verification tests in a single-issue in-order (SIIO) 32-bit RISC-V ISA based processor described in Chisel. In contrast to standard verification methods, the proposed methodology can detect bugs hard to isolate even after final FPGA implementations in-field. The generated test programs show higher coverage metrics, and χ 30 fewer instructions compared to official RISC-V torture unit tests.
- Published
- 2021
- Full Text
- View/download PDF
18. FPGA Realizations of Chaotic Epidemic and Disease Models Including Covid-19
- Author
-
Assim Sagahyroon, Ahmed G. Radwan, Wafaa S. Sayed, Ahmed S. Elwakil, Lobna A. Said, Fadi Aloul, S. M. Mohamed, and Mohammed Elnawawy
- Subjects
0209 industrial biotechnology ,FPGA implementations ,General Computer Science ,Computer science ,Process (engineering) ,Chaotic ,02 engineering and technology ,01 natural sciences ,010305 fluids & plasmas ,020901 industrial engineering & automation ,Circuits and Systems ,0103 physical sciences ,General Materials Science ,Sensitivity (control systems) ,Field-programmable gate array ,Hardware architecture ,Science - General ,Mathematical model ,General Engineering ,Static timing analysis ,chaotic circuits ,TK1-9971 ,Computer engineering ,Chaos ,Multiplication ,epidemic models ,Electrical engineering. Electronics. Nuclear engineering - Abstract
The spread of epidemics and diseases is known to exhibit chaotic dynamics; a fact confirmed by many developed mathematical models. However, to the best of our knowledge, no attempt to realize any of these chaotic models in analog or digital electronic form has been reported in the literature. In this work, we report on the efficient FPGA implementations of three different virus spreading models and one disease progress model. In particular, the Ebola, Influenza, and COVID-19 virus spreading models in addition to a Cancer disease progress model are first numerically analyzed for parameter sensitivity via bifurcation diagrams. Subsequently and despite the large number of parameters and large number of multiplication (or division) operations, these models are efficiently implemented on FPGA platforms using fixed-point architectures. Detailed FPGA design process, hardware architecture and timing analysis are provided for three of the studied models (Ebola, Influenza, and Cancer) on an Altera Cyclone IV EP4CE115F29C7 FPGA chip. All models are also implemented on a high performance Xilinx Artix-7 XC7A100TCSG324 FPGA for comparison of the needed hardware resources. Experimental results showing real-time control of the chaotic dynamics are presented.
- Published
- 2021
- Full Text
- View/download PDF
19. Allpass-based design, multiplierless realization and implementation of IIR wavelet filter banks with approximate linear phase.
- Author
-
Abdul-Jabbar, Jassim M. and Hmad, Rasha W.
- Abstract
In this paper, Bireciprocal Lattice Wave Digital Filters (BLWDFs) are utilized in an approximate linear phase design of 9th order IIR wavelet filter banks (FBs). Each of the two branches in the structure of the BLWDF realizes an allpass filter. The low-coefficient sensitivity, good dynamic range and good stability properties of such filters allow their realization with short coefficient wordlengths. Suitable coefficient wordlength representations are estimated for best selection of some prescribed performance measures. The quantized coefficients are then realized in a multiplierless manner and implemented on Xilinx FPGA device. Therefore, less-complex infinite impulse response (IIR) wavelet filter bank structures are obtained with linear phase processing. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
20. Simple true random number generator for any semi‐conductor technology.
- Author
-
Böhl, Eberhard
- Abstract
True random number generators (TRNGs) are needed in cryptography for key generation, in challenge response authentication procedures and for countermeasures against power analysis attacks. Such true randomness requires to utilise random physical hardware effects. It is the goal to make the TRNG usable for different semi‐conductor technologies (including field programmable gate arrays (FPGAs)). This approach is based on ring oscillators with multiple taps in combination with a simple post processing by exclusive OR antivalence (XOR) compression. Verifications with a test chip and several FPGA implementations showed that standard digital library elements and the digital design flow can be used without any constraints for compilation and special layout rules. A proper choice of sampling frequency and compression coefficient ensures a random output with extremely low bias for different technologies which can be checked on‐line easily. It was shown that for passing the on‐line test with a given bias limit the generated random data passes the statistical tests. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
21. Exploring the FPGA Implementations of the LBlock, Piccolo, Twine, and Klein Ciphers
- Author
-
George Theodoridis, Odysseas Koufopavlou, D. Seitanidis, and S. Moraitis
- Subjects
Very-large-scale integration ,Loop unrolling ,business.industry ,Computer science ,Round function ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,Encryption ,Cipher ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Fpga implementations ,business ,Field-programmable gate array ,Throughput (business) - Abstract
In this work, the implementations of the LBlock, Piccolo, Twine, and Klein lightweight ciphers in FPGA technology are studied in terms of area, frequency, throughput, and throughput/area. To accomplish this, loop unrolling and pipelining were employed in two phases. In the first phase, different loop unrolling factors were used to implement the round function of each cipher, while in the second phase, 2-stage pipelining with loop unrolling per stage was applied. The produced designs were implemented in Xilinx (Kintex-7) FPGA technology. Based on the implementation results, a detailed study on the above-mentioned design metrics was performed and important outcomes were derived.
- Published
- 2020
- Full Text
- View/download PDF
22. Compact Hardware Architectures of Enocoro-128v2 Stream Cipher for Constrained Embedded Devices
- Author
-
Paris Kitsos and Lampros Pyrgas
- Subjects
Hardware security module ,FPGA implementations ,Computer Networks and Communications ,Computer science ,Computation ,Data_MISCELLANEOUS ,lcsh:TK7800-8360 ,02 engineering and technology ,Enocoro-128v2 stream cipher ,Field (computer science) ,Datapath ,constrained embedded devices ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Field-programmable gate array ,Throughput (business) ,Stream cipher ,Implementation ,020203 distributed computing ,business.industry ,lcsh:Electronics ,TheoryofComputation_GENERAL ,020202 computer hardware & architecture ,Hardware and Architecture ,Control and Systems Engineering ,Embedded system ,Signal Processing ,lightweight cryptography ,hardware security ,business ,Computer hardware - Abstract
Lightweight cryptography is a vital and fast growing field in today&rsquo, s world where billions of constrained devices interact with each other. In this paper, two novel compact architectures of the Enocoro-128v2 stream cipher are presented. The Enocoro-128v2 is part of the ISO/IEC 29192-3 standard. The first architecture has an 8-bit datapath while the second one has a 4-bit datapath. The proposed architectures were implemented on the BASYS3 board (Artix 7 XC7A35T) using the VERILOG hardware description language. The hardware implementation of the proposed 8-bit architecture runs at a 189 MHz clock and reaches a throughput equal to 302 Mbps, while at the same time, it utilizes only 254 Look-up Tables (LUTs) and 330 Flip-flops (FFs). Each round of computations requires 5 clock cycles. The 4-bit implementation has an operating frequency of 204 MHz and reaches a throughput equal to 181 Mbps, with each round requiring 9 clock cycles. The 4-bit implementation utilizes 249 LUTs and 343 FFs. To our knowledge, this is the first time that such implementations of the Enocoro-128v2 are presented. Both implementations utilize a very low number of resources (only 78 FPGA slices are required for the 8-bit architecture and only 83 for the 4-bit one) and the results demonstrate that they are sustainable for area constrained embedded devices.
- Published
- 2020
23. FPGA Implementations of SVM Classifiers: A Review
- Author
-
Hamid GholamHosseini, Shereen Afifi, and Roopak Sinha
- Subjects
Computer science ,business.industry ,Computation ,General Medicine ,Machine learning ,computer.software_genre ,Support vector machine ,Embedded applications ,Gate array ,Fpga implementations ,Artificial intelligence ,Field-programmable gate array ,business ,computer ,Primary research - Abstract
Support vector machine (SVM) is a robust machine learning model with high classification accuracy. SVM is widely utilized for online classification in various real-time embedded applications. However, implementing SVM classification algorithm for an embedded system is challenging due to intensive and complicated computations required. Several works attempted to optimize performance and cost by implementing SVM in hardware, especially on field-programmable gate array (FPGA) as it is a promising platform for meeting challenging embedded systems constraints. This article presents a comprehensive survey of hardware architectures used for implementing SVM on FPGA over the period 2010–2019. We performed a critical analysis and comparison of existing works with in-depth discussions around limitations, challenges, and research gaps. We concluded that the primary research gap is overcoming the challenging trade-off between meeting critical embedded systems constraints and achieving efficient and precise classification. Finally, some future research directions are proposed, aiming to address such research gaps.
- Published
- 2020
- Full Text
- View/download PDF
24. A Novel Flow Control Mechanism to Avoid Multi-Point Progressive Blocking in Hard Real-Time Priority-Preemptive NoCs
- Author
-
Leandro Soares Indrusiak, Alan Burns, J. Harrison, and N. Smirnov
- Subjects
010302 applied physics ,Flow control (data) ,Router ,Computer science ,Wormhole router ,Distributed computing ,02 engineering and technology ,01 natural sciences ,020202 computer hardware & architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Fpga implementations ,Wormhole ,Multi point - Abstract
The recently uncovered problem of multi-point progressive blocking (MPB) has significantly increased the complexity of schedulability analysis of priority-preemptive wormhole networks-on-chip. While state-of-the-art analysis is currently deemed safe, there is still significant inherent pessimism when it comes to considering backpressure issues caused by downstream indirect interference. In this paper, we attempt to simplify the problem by considering a novel flow control protocol that can avoid backpressure issues, enabling simpler schedulability analysis approaches to be used. Rather than construct the analysis to fit the protocol, we modify the protocol so that effective analysis applies. We describe the changes to a baseline wormhole router in order to implement the proposed protocol, and comment on the impact on hardware overheads. We also examine the number of routers that actually require these changes. Comparative analysis of FPGA implementations show that the hardware overheads of the proposed NoC router are comparable or lower than those of the baseline, while analytical comparison shows that the proposed approach can guarantee schedulability in up to 77% more cases.
- Published
- 2020
- Full Text
- View/download PDF
25. Different FPGA Implementations of Audio Processing Using Least Mean Square Adaptive Filtering: A Comparative Study
- Author
-
Anil Kumar Singh and R.K. Sharma
- Subjects
Audio signal ,Computer science ,business.industry ,Dissipation ,computer.software_genre ,Adaptive filter ,Least mean squares filter ,Audio codec ,Fpga implementations ,Audio signal processing ,Field-programmable gate array ,business ,computer ,Computer hardware - Abstract
This paper proposes audio signal processing implementation on two different multimedia FPGA board (Zybo and Zed Board). The audio signal coming from audio codec is adaptive filtering using LMS algorithm and it is implemented on FPGA. The comparative study has been done in terms of hardware resource utilization, static and dynamic power dissipation and speed of the system. The interesting result we have found is that hardware resource utilization and delay are less in ZYBO as compare to Zed multimedia board and also, we have found that the static power dissipation is less and dynamic power dissipation is more in ZYBO multimedia board.
- Published
- 2020
- Full Text
- View/download PDF
26. MajorityNets: BNNs Utilising Approximate Popcount for Improved Efficiency
- Author
-
Lingli Wang, Philip H. W. Leong, Hao Zhou, Seyedramin Rasoulinezhad, David Boland, and Sean Fox
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial neural network ,Computer science ,Computation ,05 social sciences ,010501 environmental sciences ,01 natural sciences ,Machine Learning (cs.LG) ,Computer engineering ,Gate array ,Fpga architecture ,0502 economics and business ,FOS: Electrical engineering, electronic engineering, information engineering ,Fpga implementations ,050207 economics ,Latency (engineering) ,Electrical Engineering and Systems Science - Signal Processing ,Field-programmable gate array ,Implementation ,0105 earth and related environmental sciences - Abstract
Binarized neural networks (BNNs) have shown exciting potential for utilising neural networks in embedded implementations where area, energy and latency constraints are paramount. With BNNs, multiply-accumulate (MAC) operations can be simplified to XnorPopcount operations, leading to massive reductions in both memory and computation resources. Furthermore, multiple efficient implementations of BNNs have been reported on field-programmable gate array (FPGA) implementations. This paper proposes a smaller, faster, more energy-efficient approximate replacement for the XnorPopcountoperation, called XNorMaj, inspired by state-of-the-art FPGAlook-up table schemes which benefit FPGA implementations. Weshow that XNorMaj is up to 2x more resource-efficient than the XnorPopcount operation. While the XNorMaj operation has a minor detrimental impact on accuracy, the resource savings enable us to use larger networks to recover the loss., 4 pages
- Published
- 2020
27. Random number generators for large-scale parallel Monte Carlo simulations on FPGA
- Author
-
Lin Yarong, B. Liu, and Fuming Wang
- Subjects
Numerical Analysis ,Physics and Astronomy (miscellaneous) ,Scale (ratio) ,Random number generation ,Computer science ,Applied Mathematics ,Monte Carlo method ,010103 numerical & computational mathematics ,Parallel computing ,Simulation system ,01 natural sciences ,Computer Science Applications ,Computational Mathematics ,Lagged Fibonacci generator ,Modeling and Simulation ,0103 physical sciences ,Key (cryptography) ,Fpga implementations ,0101 mathematics ,010306 general physics ,Field-programmable gate array - Abstract
Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
- Published
- 2018
- Full Text
- View/download PDF
28. Design and Multiplierless Implementations of 9th order linear- Phase Bireciprocal Lattice Wave Digital Wavelet Filter Banks.
- Author
-
Abdul-Jabbar, Jassim M. and Hamad, Rasha Waleed
- Subjects
- *
INFINITE impulse response filters , *DISCRETE wavelet transforms , *LATTICE dynamics , *WAVELETS (Mathematics) - Abstract
In this paper, a filter bank structure for the implementation of infinite impulse response (IIR) discrete wavelet transform (DWT) is proposed. Bireciprocal lattice wave filters (BLWDFs) are utilized in a linear-phase design of 9th order IIR wavelet filter bank (FB). Each of the two branches in the structure of the BLWDF bank realizes an all-pass filter. Filters of this bank belong to the intermediate design group, maintaining linear-phase property with best roll-off characteristics in their frequency responses. The design is first simulated using Matlab7.10 programming in order to investigate the resulting wavelet filter properties and to find the suitable wordlength to represent the BLWDF's coefficients in quantized forms for best selection of some prescribed performance measures. All adopted measures show an excellent closeness to some typical cases. Each coefficient in the resulting structure is realized in a multiplierless manner after representing it as sum-of-powers-of-two (SPT). Multiplications are then achieved by only shift and add. Multiplierless FPGA implementations of the proposed IIR wavelet filter banks are also achieved with less complexity and high operating frequency. [ABSTRACT FROM AUTHOR]
- Published
- 2013
29. Allpass-Based Design, Multiplierless Realization and Implementation of IIR Wavelet Filter Banks with Approximate Linear Phase.
- Author
-
Abdul-Jabbar, Jassim M. and Hmad, Rasha Waleed
- Subjects
- *
DIGITAL filters (Mathematics) , *IMPULSE response , *MULTIPLIERS (Mathematical analysis) , *COMPUTER engineering , *FIELD programmable gate arrays - Abstract
In this paper, Bireciprocal Lattice Wave Digital Filters (BLWDFs) are utilized in an approximate linear phase design of 9th order IIR wavelet filter banks (FBs). Each of the two branches in the structure of the BLWDF realizes an Allpass filter. The low-coefficient sensitivity, excellent dynamic range and good stability properties of such filters allow their realization with short coefficient word lengths. Suitable coefficient wordlength representations are estimated for the best selection of some prescribed performance measures. The quantized coefficients are then realized in a multiplier less manner and implemented on Xilinx FPGA device. Therefore, less-complex infinite impulse response (IIR) wavelet filter bank structures are obtained with linear phase processing. [ABSTRACT FROM AUTHOR]
- Published
- 2012
30. Utilizing hard cores of modern FPGA devices for high-performance cryptography.
- Author
-
Güneysu, Tim
- Abstract
This article presents a unique design approach for the implementation of standardized symmetric and asymmetric cryptosystems on modern FPGA devices. In contrast to many other FPGA implementations that algorithmically optimize the cryptosystems for being optimally placed in the generic array logic, our primary implementation goal is to shift as many cryptographic operations as possible into specific hard cores that have become available on many reconfigurable devices. For example, some of these dedicated functions are designed to provide large blocks of memory or fast arithmetic functions for Digital Signal Processing applications that can also be adopted for efficient cryptographic implementations. Based on these dedicated functions, we present specific design approaches that enable a performance for the symmetric AES block cipher (FIPS 197) of up to 55 GBit/s and a throughput of more than 30.000 scalar multiplications per second for asymmetric Elliptic Curve Cryptography over NIST's P-224 prime (FIPS 186-3). [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
31. A Review of Synthetic-Aperture Radar Image Formation Algorithms and Implementations: A Computational Perspective.
- Author
-
Cruz, Helena, Véstias, Mário, Monteiro, José, Neto, Horácio, and Duarte, Rui Policarpo
- Subjects
- *
SYNTHETIC aperture radar , *RADAR , *ALGORITHMS , *IMAGING systems , *MATCHED filters , *SYNTHETIC apertures , *MICROCONTROLLERS - Abstract
Designing synthetic-aperture radar image formation systems can be challenging due to the numerous options of algorithms and devices that can be used. There are many SAR image formation algorithms, such as backprojection, matched-filter, polar format, Range–Doppler and chirp scaling algorithms. Each algorithm presents its own advantages and disadvantages considering efficiency and image quality; thus, we aim to introduce some of the most common SAR image formation algorithms and compare them based on these two aspects. Depending on the requisites of each individual system and implementation, there are many device options to choose from, for instance, FPGAs, GPUs, CPUs, many-core CPUs, and microcontrollers. We present a review of the state of the art of SAR imaging systems implementations. We also compare such implementations in terms of power consumption, execution time, and image quality for the different algorithms used. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Low-footprint CLEFIA FPGA Implementations with Full-key Expansion
- Author
-
João Carlos Nunes Bittencourt, Ricardo Chaves, and Wagner Luiz Alves de Oliveira
- Subjects
060201 languages & linguistics ,Computer science ,business.industry ,06 humanities and the arts ,02 engineering and technology ,Footprint (electronics) ,Embedded system ,CLEFIA ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Fpga implementations ,business - Published
- 2017
- Full Text
- View/download PDF
33. HUB Floating Point for Improving FPGA Implementations of DSP Applications
- Author
-
Julio Villalba and Javier Hormigo
- Subjects
Adder ,Forcing (recursion theory) ,Floating point ,Computer science ,business.industry ,Rounding ,020208 electrical & electronic engineering ,02 engineering and technology ,020202 computer hardware & architecture ,Computer Science::Hardware Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Fpga implementations ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Electrical and Electronic Engineering ,business ,Field-programmable gate array ,Implementation ,Digital signal processing ,Computer hardware - Abstract
The increasing complexity of new digital signal processing (DSP) applications is forcing the use of floating point (FP) numbers in their hardware implementations. In this brief, we investigate the advantages of using half-unit biased (HUB) formats to implement these FP applications on field-programmable gate arrays (FPGAs). These new FP formats allow for the effective elimination of the rounding logic on FP arithmetic units. First, we experimentally show that HUB and standard formats provide equivalent signal-to-noise ratio on DSP application implementations. We then present a detailed study of the improvement achieved when implementing FP adders and multipliers on FPGAs by using HUB numbers. In most of the cases studied, the HUB approach reduces resource use and increases the speed of these FP units while always providing statistically equivalent accuracy as that of conventional formats. However, for some specific sizes, HUB multipliers require far more resources than the corresponding conventional approach.
- Published
- 2017
- Full Text
- View/download PDF
34. FPGA Implementation and Comparison of Protections against SCAs for RLWE
- Author
-
Arnaud Tisserand, Karim Bigou, Timo Zijlstra, Lab-STICC_UBS_CACS_MOCS, Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (Lab-STICC), École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT), Lab-STICC_UBO_CACS_MOCS, Institut Brestois du Numérique et des Mathématiques (IBNM), Université de Brest (UBO)-Université de Brest (UBO)-Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (Lab-STICC), Institut Mines-Télécom [Paris] (IMT)-École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), and Bourse de thèse PEC/DGA/Région Bretagne
- Subjects
Shuffling ,Computer science ,[INFO.INFO-AO]Computer Science [cs]/Computer Arithmetic ,02 engineering and technology ,Parallel computing ,masking ,randomization ,shuffling ,ring learning with errors ,[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR] ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Fpga implementations ,blinding ,Side channel attack ,[SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics ,Field-programmable gate array ,side channel attack - Abstract
International audience; We present various FPGA implementations of protections against SCAs for RLWE-based PKE. We implemented the main solutions from the state of the art with improved variants. We also propose a new protection based on a redundant representation of the ring elements to randomize computations. We compare the implementation results of all these solutions.
- Published
- 2019
35. FPGA-based accelerator for convolution operations
- Author
-
He Chen, Xin Wei, Yunfei Cao, and Tingting Qiao
- Subjects
business.industry ,Computer science ,Deep learning ,Systolic array ,Convolutional neural network ,Convolution ,Computer Science::Hardware Architecture ,Fpga implementations ,Artificial intelligence ,Field-programmable gate array ,business ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,Digital signal processing ,Computer hardware ,Efficient energy use - Abstract
Convolutional neural networks have been widely used in many deep learning applications. Convolutional neural networks have a large number of convolution operations, which poses a huge challenge to real-time performance. In recent years, FPGA implementations of convolutional accelerators have received much attention due to their high performance and energy efficiency. In this paper, we implement an accelerator for convolution operations through the systolic array architecture on Xilinx ZedBoard device. The experimental results show that ours designed accelerators achieving performance density of up to 0.032 Gop/s/DSP.
- Published
- 2019
- Full Text
- View/download PDF
36. Simulação de Redes Reguladoras de Genes com Lógica Booleana e Limiar em Plataformas Alto Desempenho
- Author
-
Jose Augusto M. Nacif, Marcelo M. Menezes, Michael Canesche, Salles V. G. Magalhães, Lucas Braganca, Ricardo Ferreira, Wallace Rosa, and Hector P. Baranda
- Subjects
Computer science ,Gene regulatory network ,Graph (abstract data type) ,Fpga implementations ,Central processing unit ,Overlay ,Compiler ,Parallel computing ,Field-programmable gate array ,computer.software_genre ,computer - Abstract
As redes reguladoras de genes são modelos baseados em grafos muito utilizadas para estudar o comportamento de células, processos de diferenciação celular ou tratamento e evolução de doenças. Uma rede pode ser implementada por um grafo com equações booleanas. Os algoritmos usados nas simulações das redes avaliam estas equações várias vezes ao longo da execução. Este artigo propõe um estudo das implementações em CPU, GPU e FPGA da operação básica que é o cálculo do próximo estado. Exploramos as técnicas de vetorização e paralelização com AVX e OpenMP para os processadores e uma nova arquitetura dinâmica é proposta para simplificar o uso das soluções com FPGA. Além do modelo booleano, mostramos como as redes podem ser transformadas em equações com somas de peso e limiares. Finalmente, 16 redes biológicas usados na literatura foram avaliadas, onde as implementações em CPU com OMP apresentaram uma aceleração de 3x em comparação com a CPU, as implementações em GPU foram em média 57,3x mais rápidas que a CPU e finalmente as implementações em FPGA foram em média 86,7x mais rápidas que a CPU. ∗
- Published
- 2019
- Full Text
- View/download PDF
37. A Microcontroller Implementation Of Hindmarsh-Rose Neuron Model-Based Biological Central Pattern Generator
- Author
-
Reşat Mutlu and Suayb Cagri Yener
- Subjects
Microcontroller ,Application areas ,business.industry ,Computer science ,Central pattern generator ,Fpga implementations ,Biological neuron model ,Rose (topology) ,business ,Field-programmable gate array ,Computer hardware - Abstract
Central Pattern Generator (CPG) has an important role in controlling the locomotion part of the animals. Bio-inspired Central Pattern Generators can find application areas in robotic and control applications. FPGA implementations of Central Pattern Generators have already been studied in literature. Not all institutes have FPGA systems. In this paper, it is shown that like STM32F103 devices which use the Cortex-M3, a Hindmarsh-Rose (HR) neuron model can be made using a cheap microcontroller board. The experimental results verify that the system is successfully able to reproduce the original Central Pattern Generator patterns.
- Published
- 2019
- Full Text
- View/download PDF
38. High voltage insulator monitoring using infrared camera and FPGA
- Author
-
Stephen C. Robson, Timothy Mathias, and M. Albano
- Subjects
010302 applied physics ,business.industry ,Infrared ,Computer science ,Insulator (electricity) ,High voltage ,02 engineering and technology ,01 natural sciences ,Stream processing ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Verilog ,020201 artificial intelligence & image processing ,Fpga implementations ,Field-programmable gate array ,business ,MATLAB ,computer ,Computer hardware ,computer.programming_language - Abstract
This research investigates the use of FPGAs to create a fully autonomous system to detect hotspots on high voltage insulators by processing infrared videos. Initial algorithm development was conducted with MATLAB. The algorithm was automated by the addition of automatic temperature tracking and arc detection giving a sixtyfold speed increase (85 Hz). Explored two FPGA implementations of the algorithm: Verilog only; an embedded processor running compiled C++ code. Applications include post processing and live stream processing where alerts can be fed to maintenance teams of transmission system operators.
- Published
- 2019
- Full Text
- View/download PDF
39. Synthesis of Hardware Sandboxes for Trojan Mitigation in Systems on Chip
- Author
-
Joel Mandebi Mbongue, Christophe Bobda, Sujan Kumar Saha, and Taylor J L Whitaker
- Subjects
010302 applied physics ,business.industry ,Computer science ,02 engineering and technology ,Specification language ,01 natural sciences ,020202 computer hardware & architecture ,Formalism (philosophy of mathematics) ,Trojan ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Fpga implementations ,System on a chip ,business ,Computer hardware - Abstract
In this work, we propose a high-level synthesis approach for hardware sandboxes in system-on-chip. Using interface formalism to capture interactions between non-trusted IPs and trusted parts of a system on chip, along with the properties specification language to specify non-authorized actions of non-trusted IPs, sandboxes are generated and made ready for inclusion as IP in a system-on-chip design. The concepts of composition, compatibility, and refinement are used to capture illegal actions and optimize resources across the boundary of single IPs. We have designed a tool that automatically generates the sandbox and facilitates their integration into system-on-chip. Our approach was validated with benchmarks from trust-hub.com and FPGA implementations. All our results showed 100% Trojan detection and mitigation, with only a minimal increase in resource overhead and no performance decrease.
- Published
- 2019
- Full Text
- View/download PDF
40. Dynamic Selection and Update of Digital Predistorter Coefficients for Power Amplifier Linearization
- Author
-
Quynh Anh Pham, Pere L. Gilabert, David Lopez-Bueno, Gabriel Montoro, Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. CSC - Components and Systems for Communications Research Group
- Subjects
Least squares estimation ,FPGA implementations ,principal component analysis ,Computer science ,power amplifier ,Linearization ,linearization ,Least squares approximations ,Partial least square (PLS) ,Least squares ,Predistortion ,Matrix decomposition ,Matrix (mathematics) ,Canonical correlation analysis ,partial least squares ,Partial least squares regression ,Power amplifiers ,Estimation techniques ,partial least squares regression ,Detectors ,Enginyeria de la telecomunicació [Àrees temàtiques de la UPC] ,power amplifier linearization Iterative methods ,QR decomposition ,Digital predistortion ,Digital predistorter ,digital predistortion ,Linearizer ,model order reduction ,Audio Amplifiers ,Amplificadors de baixa freqüència ,Power amplifier linearization ,Algorithm - Abstract
This paper presents a new technique that dynamically estimates and updates the coefficients of a digital predistorter (DPD) for power amplifier (PA) linearization. The proposed technique is dynamic in the sense of estimating, at every iteration of the coefficient’s update, only the minimum necessary parameters according to a criterion based on the residual estimation error. At the first step, the original basis functions defining the DPD in the forward path are orthonormalized for DPD adaptation in the feedback path by means of a precalculated principal component analysis (PCA) transformation. The robustness and reliability of the precalculated PCA transformation (i.e., PCA transformation matrix obtained off line and only once) is tested and verified. Then, at the second step, a properly modified partial least squares (PLS) method, named dynamic partial least squares (DPLS), is applied to obtain the minimum and most relevant transformed components required for updating the coefficients of the DPD linearizer. The combination of the PCA transformation with the DPLS extraction of components is equivalent to a canonical correlation analysis (CCA) updating solution, which is optimum in the sense of generating components with maximum correlation (instead of maximum covariance as in the case of the DPLS extraction alone). The proposed dynamic extraction technique is evaluated and compared in terms of computational cost and performance with the commonly used QR decomposition approach for solving the least squares (LS) problem. Experimental results show that the proposed method (i.e., combining PCA with DPLS) drastically reduces the amount of DPD coefficients to be estimated while maintaining the same linearization performance.
- Published
- 2019
- Full Text
- View/download PDF
41. Effects of aging and compensation mechanisms in ordering based RO-PUFs
- Author
-
Ali Emre Pusane, Giray Komurcu, and Gunhan Dundar
- Subjects
Engineering ,Circuit performance ,business.industry ,02 engineering and technology ,Ring oscillator ,Challenge response ,Accelerated aging ,020202 computer hardware & architecture ,Hardware and Architecture ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,020201 artificial intelligence & image processing ,Fpga implementations ,Electrical and Electronic Engineering ,business ,Software - Abstract
With the increasing need for highly secure systems, Physical Unclonable Functions (PUFs) have emerged within the last decade. Ordering based Ring Oscillator (RO) PUFs are one of the best performing structures with their robustness and suitability to FPGA implementations. Even though the performance of the ordering based RO-PUFs have been analyzed in detail, effects of aging have not been studied before. In this work, we present the results of an accelerated aging test applied to analyze the effects of aging on ROs. Then, the effects of aging on ordering based RO-PUFs are examined. Finally, a compensation method to protect the 100% robustness claim of the PUF structure is proposed and its influence on the circuit performance is presented. HighlightsResults of an accelerated aging test applied on ROs are presented.The effects of aging on ordering based RO-PUFs are examined.A compensation method to protect the 100% robustness of the PUF structure is proposed.
- Published
- 2016
- Full Text
- View/download PDF
42. Optimised CORDIC‐based atan2 computation for FPGA implementations
- Author
-
Torres Carot, Vicente, Valls Coquillat, Javier, and Canet Subiela, Mª José
- Subjects
Z-path computation ,Optimised CORDIC-based atan2 computation ,Computer science ,Computation ,02 engineering and technology ,Parallel computing ,01 natural sciences ,Table lookup ,TECNOLOGIA ELECTRONICA ,Operator (computer programming) ,0203 mechanical engineering ,0103 physical sciences ,Fpga implementations ,Electrical and Electronic Engineering ,atan2 ,CORDIC ,Field-programmable gate array ,010301 acoustics ,Coordinate rotation digital computer algorithm ,Field programmable gate arrays ,020303 mechanical engineering & transports ,Table (database) ,Look-up table-based FPGA resources ,Digital arithmetic ,Atan2 operator - Abstract
[EN] A method for the implementation of the atan2 operator based on the coordinate rotation digital computer algorithm is described. In the proposal, the computation of the z-path takes advantage of the look-up table-based FPGA resources to reduce by between 17 and 25%, without performance deterioration, the overall area of the unrolled architecture., This work was funded by the Spanish Ministerio de Economia y Competitividad and FEDER under the grant TEC2015-70858-C2-2-R.
- Published
- 2017
- Full Text
- View/download PDF
43. FPGA implementations of Grain v1, Mickey 2.0, Trivium, Lizard and Plantlet
- Author
-
Meicheng Liu, Bohan Li, and Dongdai Lin
- Subjects
Computer Networks and Communications ,Computer science ,020208 electrical & electronic engineering ,02 engineering and technology ,Parallel computing ,020202 computer hardware & architecture ,Hardware Application ,Artificial Intelligence ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Verilog ,Fpga implementations ,Trivium (cipher) ,Hardware design languages ,eSTREAM ,Throughput (business) ,Stream cipher ,computer ,Software ,computer.programming_language - Abstract
In this paper, three eSTREAM Portfolio 2 ciphers: Grain v1, Mickey 2.0 and Trivium, and two Grain-like stream ciphers: Lizard and Plantlet are implemented in three versions aimed at different hardware application purposes. The hardware platform adopts Xilinx’s Spartan7 serial, and the simulations, syntheses and implementations are conducted in Vivado using the Verilog hardware design language. These implementations are compared with each other and those presented in existing literature in terms of performance metrics including: throughput, area consumption and throughput-area ratio. The basic version of Trivium achieves the highest frequency reaching a maximum of 416 Mbps and the serial version consumes the smallest area of 13 slices. In the parallel version, the maximum throughput-area ratio is 165.5 Mbps/Slice achieved by Trivium. At the same time, the basic version of the Mickey 2.0 algorithm achieved the second highest frequency of 384 Mbps, and the serial version of the Grain v1 algorithm achieved the second smallest area with 26 slices.
- Published
- 2020
- Full Text
- View/download PDF
44. Face-off Between the CAESAR Lightweight Finalists: ACORN vs. Ascon
- Author
-
William Diehl, Farnoud Farahmand, Abubakr Abdulgadir, Jens-Peter Kaps, and Kris Gaj
- Subjects
060201 languages & linguistics ,Computer science ,06 humanities and the arts ,02 engineering and technology ,computer.software_genre ,Acorn ,Set (abstract data type) ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,020201 artificial intelligence & image processing ,Fpga implementations ,Message authentication code ,Side channel attack ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,computer ,Throughput (business) ,De facto standard - Abstract
Authenticated ciphers potentially provide resource savings and security improvements over the joint use of secret-key ciphers and message authentication codes. The CAESAR competition aims to choose the most suitable authenticated ciphers for several categories of applications, including a lightweight use case, for which the primary criteria are performance in resource-constrained devices, and ease of protection against side channel attacks (SCA). In March 2018, two of the candidates from this category, ACORN and Ascon, were selected as CAESAR contest finalists. In this research, we compare two SCA-resistant FPGA implementations of ACORN and Ascon, where one set of implementations has area consumption nearly equivalent to the defacto standard AES-GCM, and the other set has throughput (TP) close to that of AES-GCM. The results show that protected implementations of ACORN and Ascon, with area consumption less than but close to AES-GCM, have 23.3 and 2.5 times, respectively, the TP of AES-GCM. Likewise, implementations of ACORN and Ascon with TP greater than but close to AES-GCM, consume 18% and 74% of the area, respectively, of AES-GCM.
- Published
- 2018
- Full Text
- View/download PDF
45. Sistema para ataques a cifradores Triviums implementados en FPGAs
- Author
-
Potestad-Ordóñez, Francisco Eugenio, Jiménez Fernández, Carlos Jesús, and Valencia Barrero, Manuel
- Subjects
FPGA implementations ,Ataque lateral activo ,Cifrador de flujo ,Criptografía ,Cryptography ,Trivium ,Implementaciones FPGA ,Stream cipher ,Side channel attack - Abstract
Proyecto CESAR TEC2013-45523-R Proyecto INTERVALO TEC2016-80549-R Proyecto LACRE CSIC 201550E039 La información intercambiada diariamente en las redes de comunicación crece de forma exponencial y gran parte de ella está constituida por información sensible, por lo que debe estar protegida para prevenir su uso fraudulento. Una correcta aplicación de la Criptografía evita el acceso no deseado, de forma que esta información pueda ser protegida. Estudiar la fragilidad/robustez de la seguridad es, pues, una tarea esencial para garantizar esa protección. Por otra parte, el ámbito de uso es tan amplio (desde la seguridad gubernamental, como por ejemplo secretos militares del Pentágono, hasta la doméstica) que resulta necesario concretarlo, en este caso, para entornos de pocos recursos (lightweight cryptography) y centrados en implementaciones con dispositivos electrónicos programables de alta densidad. El Trabajo Fin de Máster que se presenta tiene como principal objetivo analizar el funcionamiento de implementaciones FPGA de cifradores de flujo en presencia de ataques laterales activos, en concreto de ataques que intenten introducir fallos en el funcionamiento del Trivium aumentando la frecuencia del reloj. A través del análisis de las técnicas de ataques laterales activos, se presenta un sistema cuyo objetivo es analizar la vulnerabilidad y comportamiento del cifrador Trivium, implementado sobre FPGA, frente a dichos ataques laterales activos. The large amount of information exchanged grows exponentially and a lot of it is constituted by sensitive data, it must be protected to prevent its fraudulent use. A correct application of Cryptography prevents malicious access, so that information can be protected. Studying the vulnerability/robustness of the security is, then, an essential aim in order to guarantee that protection. On the other hand, the field of use is so wide that it is necessary, in this case, for low resource environments and focused on device implementations with high density programmable electronic. The aim of this TFG is to analyze the behaviour of stream ciphers implemented on FPGAs against active attacks, that is inyecting faults in the clock line during the operation of Trivium stream ciphers. Through the analysis of active attack techniques, an attack system is presented with the aim of analyzing the Trivium cipher vulnerability and behavior, implemented on FPGA, against active fault attacks. Universidad de Sevilla. Grado en Ingeniería Electrónica Industrial
- Published
- 2018
46. FPGA Implementations of Low Latency Centroiding Algorithms for Adaptive Optics
- Author
-
Manuel Cegarra Polo, Fanpeng Kong, and Andrew Lambert
- Subjects
Computer science ,Digital image processing ,Fpga implementations ,Latency (engineering) ,Image sensor ,Adaptive optics ,Adaptive optics systems ,Field-programmable gate array ,Algorithm - Abstract
We describe two innovative low latency centroiding algorithms implemented in an FPGA, exploiting the parallel processing features of these devices, and showing low values in latency and real estate, which eases their integration with complete adaptive optics systems.
- Published
- 2018
- Full Text
- View/download PDF
47. Selection of an error-correcting code for FPGA-based physical unclonable functions
- Author
-
Brian Jarvis and Kris Gaj
- Subjects
060201 languages & linguistics ,Block code ,Computer science ,Data_CODINGANDINFORMATIONTHEORY ,06 humanities and the arts ,02 engineering and technology ,Convolutional code ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,Entropy (information theory) ,020201 artificial intelligence & image processing ,Fpga implementations ,Arithmetic ,Field-programmable gate array ,Error detection and correction ,Decoding methods ,BCH code - Abstract
This paper explores error-correcting codes for fuzzy extractor applications with Physical Unclonable Functions. We investigate BCH codes and compare them to convolutional codes using criteria of remaining entropy, probability of decoder failure, and hardware requirements. Parallel BCH coding is analyzed with a comprehensive search performed to find the smallest BCH code which satisfies the criteria in a parallel design to produce 128, 192, and 256-bit keys. A convolutional code is selected for comparison against the BCH codes found in this analysis. Application of the selected codes to a fuzzy extractor design is analyzed. Hardware requirements for FPGA implementations of each code is compared, with a BCH decoder design implemented for Artix-7 and Spartan-6 FPGA families. We find that a (127, 22, 47) parallel BCH code or (2, 1, 12) convolutional code is capable of performing as well as a single large BCH code, while requiring fewer FPGA resources when block RAMs can be leveraged. The convolutional code additionally requires the least amount of PUF ID bits.
- Published
- 2017
- Full Text
- View/download PDF
48. Customizable FPGA OpenCL matrix multiply design template for deep neural networks
- Author
-
Srivatsan Krishnan, Eriko Nurvitadhi, Suchit Subhaschandra, Yinger Jack Z, Duncan J. M. Moss, Andrew Ling, Debbie Marr, and Davor Capalija
- Subjects
020203 distributed computing ,Computer science ,Design space exploration ,business.industry ,020206 networking & telecommunications ,Usability ,02 engineering and technology ,Data type ,Matrix multiplication ,Matrix (mathematics) ,Computer architecture ,0202 electrical engineering, electronic engineering, information engineering ,Deep neural networks ,Fpga implementations ,Field-programmable gate array ,business - Abstract
Deep neural networks (DNNs) have gained popularity for their state-of-the-art accuracy and relative ease of use. DNNs rely on a growing variety of matrix multiply operations (i.e., dense to sparse, FP32 to N-bit). We propose an OpenCL-based matrix multiply design template, which enables automated design exploration to generate optimized FPGA matrix accelerators for DNN applications. Given the desired matrix operations (e.g., sparsity, data types), our template rapidly produces performance and area estimates for a variety of design variants and/or FPGA platforms. Upon identifying compelling design points and target platforms, FPGA implementations can then be generated automatically using the Intel® OpenCL™ FPGA SDK. We show the effectiveness of the template with a comparison to hand-tuned RTL, a design space exploration, and a DNN case study.
- Published
- 2017
- Full Text
- View/download PDF
49. Model-based hardware design based on compatible sets of isomorphic subgraphs
- Author
-
Martin Kumm, Peter Zipf, Bogdan Pasca, Mark Jervis, Konrad Moller, and Patrick Sittel
- Subjects
Computer science ,Design space exploration ,business.industry ,02 engineering and technology ,020202 computer hardware & architecture ,Scheduling (computing) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Design process ,Algorithm design ,Fpga implementations ,Latency (engineering) ,business ,Computer hardware ,Electronic circuit - Abstract
Hardware applications in an industrial context often have tight area, latency and throughput requirements or a specific combination thereof. This paper presents a method to improve area and throughput figures for folded circuits generated during a model-based hardware design process. The method targets FPGA implementations and is based on the automatic combination of isomorphic subgraphs and the detailed consideration of pipelined primitive operations for folding core scheduling. In the course of a design space exploration, the user is provided with fine-grain control over the area/throughput trade-off.
- Published
- 2017
- Full Text
- View/download PDF
50. Implementation of a performance optimized database join operation on FPGA-GPU platforms using OpenCL
- Author
-
Mehdi Roozmeh and Luciano Lavagno
- Subjects
Source code ,Speedup ,database management systems ,parallel processing ,FPGA implementations ,Computer science ,media_common.quotation_subject ,heterogeneous HPC platforms ,GPU ,high-performance computing applications ,02 engineering and technology ,computer.software_genre ,performance optimized database join operation ,power consumption constraints ,Database ,time constraints ,Bandwidth ,Parallel Computing ,Software ,graphics processing units ,energy consumption ,High-level synthesis ,0202 electrical engineering, electronic engineering, information engineering ,Field-programmable gate array ,FPGA ,field programmable gate arrays ,Data Center ,media_common ,020203 distributed computing ,Bitonic sorter ,OpenCL ,business.industry ,Sorting ,Energy consumption ,Low-power low-energy computations ,FPGA-GPU platforms ,Kernel ,software developers ,bitonic sort network ,Parallel programming model ,sorting ,direct O(n2) algorithm ,heterogeneous platforms ,Field programmable gate arrays ,Graphics processing units ,Parallel processing ,020201 artificial intelligence & image processing ,business ,computer - Abstract
The growing trend toward heterogeneous platforms is crucial to meet time and power consumption constraints for high-performance computing applications. The OpenCL parallel programming language and framework enable programming CPU, GPU and recently FPGAs using the same source code. This eases software developers to implement applications on various devices supported by heterogeneous HPC platforms. This work presents two very different FPGA implementations of a database join operation, one using a direct O(n2) algorithm, and the other using a bitonic sort network to speed up the join operation. Comparison of performance and energy consumption for both FPGA and GPUs is provided which suggests a 40% performance/watt improvement by using an FPGA instead of a GPU.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.