Author: "Langlois J" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Langlois J"' showing total 26 results

Start Over Author "Langlois J" Publication Type Electronic Resources

26 results on '"Langlois J"'

1. Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic Reconfiguration

Author: Vakili, Shervin, Vaziri, Mobin, Zarei, Amirhossein, Langlois, J. M. Pierre, Vakili, Shervin, Vaziri, Mobin, Zarei, Amirhossein, and Langlois, J. M. Pierre
Abstract: Multipliers are widely-used arithmetic operators in digital signal processing and machine learning circuits. Due to their relatively high complexity, they can have high latency and be a significant source of power consumption. One strategy to alleviate these limitations is to use approximate computing. This paper thus introduces an original FPGA-based approximate multiplier specifically optimized for machine learning computations. It utilizes dynamically reconfigurable lookup table (LUT) primitives in AMD-Xilinx technology to realize the core part of the computations. The paper provides an in-depth analysis of the hardware architecture, implementation outcomes, and accuracy evaluations of the multiplier proposed in INT8 precision. Implementation results on an AMD-Xilinx Kintex Ultrascale+ FPGA demonstrate remarkable savings of 64% and 67% in LUT utilization for signed multiplication and multiply-and-accumulation configurations, respectively, when compared to the standard Xilinx multiplier core. Accuracy measurements on four popular deep learning (DL) benchmarks indicate a minimal average accuracy decrease of less than 0.29% during post-training deployment, with the maximum reduction staying less than 0.33%. The source code of this work is available on GitHub., Comment: 5 figures, 3 tables
Published: 2023

2. Examining the impacts of great lakes temperature perturbations on simulated precipitation in the Northeastern United States

Author: Hanrahan, J, Hanrahan, J, Langlois, J, Cornell, L, Huang, H, Winter, JM, Clemins, PJ, Beckage, B, Bruyère, C, Hanrahan, J, Hanrahan, J, Langlois, J, Cornell, L, Huang, H, Winter, JM, Clemins, PJ, Beckage, B, and Bruyère, C
Abstract: Most inland water bodies are not resolved by general circulation models, requiring that lake surface temperatures be estimated. Given the large spatial and temporal variability of the surface temperatures of the North American Great Lakes, such estimations can introduce errors when used as lower boundary conditions for dynamical downscaling. Lake surface temperatures (LSTs) influence moisture and heat fluxes, thus impacting precipitation within the immediate region and potentially in regions downwind of the lakes. For this study, the Advanced Research version of the Weather Research and Forecasting Model (WRF-ARW) was used to simulate precipitation over the six New England states during a 5-yr historical period. The model simulation was repeated with perturbed LSTs, ranging from 10°C below to 10°C above baseline values obtained from reanalysis data, to determine whether the inclusion of erroneous LST values has an impact on simulated precipitation and synoptic-scale features. Results show that simulated precipitation in New England is statistically correlated with LST perturbations, but this region falls on a wet-dry line of a larger bimodal distribution. Wetter conditions occur to the north and drier conditions occur to the south with increasing LSTs, particularly during the warm season. The precipitation differences coincide with large-scale anomalous temperature, pressure, and moisture patterns. Care must therefore be taken to ensure reasonably accurate Great Lakes surface temperatures when simulating precipitation, especially in southeastern Canada, Maine, and the mid-Atlantic region.
Published: 2021

3. Examining the impacts of great lakes temperature perturbations on simulated precipitation in the Northeastern United States

Author: Hanrahan, J, Hanrahan, J, Langlois, J, Cornell, L, Huang, H, Winter, JM, Clemins, PJ, Beckage, B, Bruyère, C, Hanrahan, J, Hanrahan, J, Langlois, J, Cornell, L, Huang, H, Winter, JM, Clemins, PJ, Beckage, B, and Bruyère, C
Abstract: Most inland water bodies are not resolved by general circulation models, requiring that lake surface temperatures be estimated. Given the large spatial and temporal variability of the surface temperatures of the North American Great Lakes, such estimations can introduce errors when used as lower boundary conditions for dynamical downscaling. Lake surface temperatures (LSTs) influence moisture and heat fluxes, thus impacting precipitation within the immediate region and potentially in regions downwind of the lakes. For this study, the Advanced Research version of the Weather Research and Forecasting Model (WRF-ARW) was used to simulate precipitation over the six New England states during a 5-yr historical period. The model simulation was repeated with perturbed LSTs, ranging from 10°C below to 10°C above baseline values obtained from reanalysis data, to determine whether the inclusion of erroneous LST values has an impact on simulated precipitation and synoptic-scale features. Results show that simulated precipitation in New England is statistically correlated with LST perturbations, but this region falls on a wet-dry line of a larger bimodal distribution. Wetter conditions occur to the north and drier conditions occur to the south with increasing LSTs, particularly during the warm season. The precipitation differences coincide with large-scale anomalous temperature, pressure, and moisture patterns. Care must therefore be taken to ensure reasonably accurate Great Lakes surface temperatures when simulating precipitation, especially in southeastern Canada, Maine, and the mid-Atlantic region.
Published: 2021

4. Design Principles for Packet Deparsers on FPGAs

Author: Luinaud, Thomas, da Silva, Jeferson Santiago, Langlois, J. M. Pierre, Savaria, Yvon, Luinaud, Thomas, da Silva, Jeferson Santiago, Langlois, J. M. Pierre, and Savaria, Yvon
Abstract: The P4 language has drastically changed the networking field as it allows to quickly describe and implement new networking applications. Although a large variety of applications can be described with the P4 language, current programmable switch architectures impose significant constraints on P4 programs. To address this shortcoming, FPGAs have been explored as potential targets for P4 applications. P4 applications are described using three abstractions: a packet parser, match-action tables, and a packet deparser, which reassembles the output packet with the result of the match-action tables. While implementations of packet parsers and match-action tables on FPGAs have been widely covered in the literature, no general design principles have been presented for the packet deparser. Indeed, implementing a high-speed and efficient deparser on FPGAs remains an open issue because it requires a large amount of interconnections and the architecture must be tailored to a P4 program. As a result, in several works where a P4 application is implemented on FPGAs, the deparser consumes a significant proportion of chip resources. Hence, in this paper, we address this issue by presenting design principles for efficient and high-speed deparsers on FPGAs. As an artifact, we introduce a tool that generates an efficient vendor-agnostic deparser architecture from a P4 program. Our design has been validated and simulated with a cocotb-based framework. The resulting architecture is implemented on Xilinx Ultrascale+ FPGAs and supports a throughput of more than 200 Gbps while reducing resource usage by almost 10$\times$ compared to other solutions., Comment: Presented at ISFPGA'21, 2021 Source code available at : https://github.com/luinaudt/deparser/tree/FPGA_paper
Published: 2021
Full Text: View/download PDF

5. Design Principles for Packet Deparsers on FPGAs

Author: Luinaud, Thomas, Santiago da Silva, Jeferson, Langlois, J. M. Pierre, Savaria, Yvon, Luinaud, Thomas, Santiago da Silva, Jeferson, Langlois, J. M. Pierre, and Savaria, Yvon
Abstract: The P4 language has drastically changed the networking field as it allows to quickly describe and implement new networking applications. Although a large variety of applications can be described with the P4 language, current programmable switch architectures impose significant constraints on P4 programs. To address this shortcoming, FPGAs have been explored as potential targets for P4 applications. P4 applications are described using three abstractions: a packet parser, match-action tables, and a packet deparser, which reassembles the output packet with the result of the match-action tables. While implementations of packet parsers and match-action tables on FPGAs have been widely covered in the literature, no general design principles have been presented for the packet deparser. Indeed, implementing a high-speed and efficient deparser on FPGAs remains an open issue because it requires a large amount of interconnections and the architecture must be tailored to a P4 program. As a result, in several works where a P4 application is implemented on FPGAs, the deparser consumes a significant proportion of chip resources. Hence, in this paper, we address this issue by presenting design principles for efficient and high-speed deparsers on FPGAs. As an artifact, we introduce a tool that generates an efficient vendor-agnostic deparser architecture from a P4 program.Our design has been validated and simulated with a cocotb-based framework.The resulting architecture is implemented on Xilinx Ultrascale+ FPGAs and supports a throughput of more than 200 Gbps while reducing resource usage by almost 10x compared to other solutions.
Published: 2021

6. Bridging the Gap: FPGAs as Programmable Switches

Author: Luinaud, Thomas, Stimpfling, Thibaut, da Silva, Jeferson Santiago, Savaria, Yvon, Langlois, J. M. Pierre, Luinaud, Thomas, Stimpfling, Thibaut, da Silva, Jeferson Santiago, Savaria, Yvon, and Langlois, J. M. Pierre
Abstract: The emergence of P4, a domain specific language, coupled to PISA, a domain specific architecture, is revolutionizing the networking field. P4 allows to describe how packets are processed by a programmable data plane, spanning ASICs and CPUs, implementing PISA. Because the processing flexibility can be limited on ASICs, while the CPUs performance for networking tasks lag behind, recent works have proposed to implement PISA on FPGAs. However, little effort has been dedicated to analyze whether FPGAs are good candidates to implement PISA. In this work, we take a step back and evaluate the micro-architecture efficiency of various PISA blocks. We demonstrate, supported by a theoretical and experimental analysis, that the performance of a few PISA blocks is severely limited by the current FPGA architectures. Specifically, we show that match tables and programmable packet schedulers represent the main performance bottlenecks for FPGA-based programmable switches. Thus, we explore two avenues to alleviate these shortcomings. First, we identify network applications well tailored to current FPGAs. Second, to support a wider range of networking applications, we propose modifications to the FPGA architectures which can also be of interest out of the networking field., Comment: To be published in : IEEE International Conference on High Performance Switching and Routing 2020
Published: 2020

7. PoET-BiN: Power Efficient Tiny Binary Neurons

Author: Chidambaram, Sivakumar, Langlois, J. M. Pierre, David, Jean Pierre, Chidambaram, Sivakumar, Langlois, J. M. Pierre, and David, Jean Pierre
Abstract: The success of neural networks in image classification has inspired various hardware implementations on embedded platforms such as Field Programmable Gate Arrays, embedded processors and Graphical Processing Units. These embedded platforms are constrained in terms of power, which is mainly consumed by the Multiply Accumulate operations and the memory accesses for weight fetching. Quantization and pruning have been proposed to address this issue. Though effective, these techniques do not take into account the underlying architecture of the embedded hardware. In this work, we propose PoET-BiN, a Look-Up Table based power efficient implementation on resource constrained embedded devices. A modified Decision Tree approach forms the backbone of the proposed implementation in the binary domain. A LUT access consumes far less power than the equivalent Multiply Accumulate operation it replaces, and the modified Decision Tree algorithm eliminates the need for memory accesses. We applied the PoET-BiN architecture to implement the classification layers of networks trained on MNIST, SVHN and CIFAR-10 datasets, with near state-of-the art results. The energy reduction for the classifier portion reaches up to six orders of magnitude compared to a floating point implementations and up to three orders of magnitude when compared to recent binary quantized neural networks., Comment: Accepted in MLSys 2020 conference
Published: 2020

8. An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs

Author: Ahmadi, Mehdi, Vakili, Shervin, Langlois, J. M. Pierre, Ahmadi, Mehdi, Vakili, Shervin, and Langlois, J. M. Pierre
Abstract: Convolutional Neural Networks (CNNs) have shown outstanding accuracy for many vision tasks during recent years. When deploying CNNs on portable devices and embedded systems, however, the large number of parameters and computations result in long processing time and low battery life. An important factor in designing CNN hardware accelerators is to efficiently map the convolution computation onto hardware resources. In addition, to save battery life and reduce energy consumption, it is essential to reduce the number of DRAM accesses since DRAM consumes orders of magnitude more energy compared to other operations in hardware. In this paper, we propose an energy-efficient architecture which maximally utilizes its computational units for convolution operations while requiring a low number of DRAM accesses. The implementation results show that the proposed architecture performs one image recognition task using the VGGNet model with a latency of 393 ms and only 251.5 MB of DRAM accesses., Comment: 4 pages
Published: 2020

9. CARLA: A Convolution Accelerator with a Reconfigurable and Low-Energy Architecture

Author: Ahmadi, Mehdi, Vakili, Shervin, Langlois, J. M. Pierre, Ahmadi, Mehdi, Vakili, Shervin, and Langlois, J. M. Pierre
Abstract: Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in hardware accelerator design for CNNs, two major problems have not yet been addressed effectively, particularly when the convolution layers have highly diverse structures: (1) minimizing energy-hungry off-chip DRAM data movements; (2) maximizing the utilization factor of processing resources to perform convolutions. This work thus proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs. The proposed approach is evaluated by implementing convolutional layers of VGGNet-16 and ResNet-50. Results show that the architecture achieves a Processing Element (PE) utilization factor of 98% for the majority of 3x3 and 1x1 convolutional layers, while limiting latency to 396.9 ms and 92.7 ms when performing convolutional layers of VGGNet-16 and ResNet-50, respectively. In addition, the proposed architecture benefits from the structured sparsity in ResNet-50 to reduce the latency to 42.5 ms when half of the channels are pruned., Comment: 12 pages
Published: 2020

10. Cerebellar Cortex 4-12 Hz Oscillations and Unit Phase Relation in the Awake Rat

Author: Lévesque, Maxime, Gao, HongYing, Southward, Carla, Langlois, J. M. Pierre, Léna, Clément, Courtemanche, Richard, Lévesque, Maxime, Gao, HongYing, Southward, Carla, Langlois, J. M. Pierre, Léna, Clément, and Courtemanche, Richard
Abstract: Oscillations in the granule cell layer (GCL) of the cerebellar cortex have been related to behavior and could facilitate communication with the cerebral cortex. These local field potential (LFP) oscillations, strong at 4–12 Hz in the rodent cerebellar cortex during awake immobility, should also be an indicator of an underlying influence on the patterns of the cerebellar cortex neuronal firing during rest. To address this hypothesis, cerebellar cortex LFPs and simultaneous single-neuron activity were collected during LFP oscillatory periods in the GCL of awake resting rats. During these oscillatory episodes, different types of units across the GCL and Purkinje cell layers showed variable phase-relation with the oscillatory cycles. Overall, 74% of the Golgi cell firing and 54% of the Purkinje cell simple spike (SS) firing were phase-locked with the oscillations, displaying a clear phase relationship. Despite this tendency, fewer Golgi cells (50%) and Purkinje cell’s SSs (25%) showed an oscillatory firing pattern. Oscillatory phase-locked spikes for the Golgi and Purkinje cells occurred towards the peak of the LFP cycle. GCL LFP oscillations had a strong capacity to predict the timing of Golgi cell spiking activity, indicating a strong influence of this oscillatory phenomenon over the GCL. Phase-locking was not as prominent for the Purkinje cell SS firing, indicating a weaker influence over the Purkinje cell layer, yet a similar phase relation. Overall, synaptic activity underlying GCL LFP oscillations likely exert an influence on neuronal population firing patterns in the cerebellar cortex in the awake resting state and could have a preparatory neural network shaping capacity serving as a neural baseline for upcoming cerebellar operations.
Published: 2020

11. Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design

Author: da Silva, Jeferson Santiago, Boyer, François-Raymond, Langlois, J. M. Pierre, da Silva, Jeferson Santiago, Boyer, François-Raymond, and Langlois, J. M. Pierre
Abstract: High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to hardware design. However, achieving the highest Quality-of-Results (QoR) with HLS is still unattainable for most programmers. This requires detailed knowledge of FPGA architecture and hardware design in order to produce FPGA-friendly codes. Moreover, these codes are normally in conflict with best coding practices, which favor code reuse, modularity, and conciseness. To overcome these limitations, we propose Module-per-Object (MpO), a human-driven HLS design methodology intended for both hardware designers and software developers with limited FPGA expertise. MpO exploits modern C++ to raise the abstraction level while improving QoR, code readability and modularity. To guide HLS designers, we present the five characteristics of MpO classes. Each characteristic exploits the power of HLS-supported modern C++ features to build C++-based hardware modules. These characteristics lead to high-quality software descriptions and efficient hardware generation. We also present a use case of MpO, where we use C++ as the intermediate language for FPGA-targeted code generation from P4, a packet processing domain specific language. The MpO methodology is evaluated using three design experiments: a packet parser, a flow-based traffic manager, and a digital up-converter. Based on experiments, we show that MpO can be comparable to hand-written VHDL code while keeping a high abstraction level, human-readable coding style and modularity. Compared to traditional C-based HLS design, MpO leads to more efficient circuit generation, both in terms of performance and resource utilization. Also, the MpO approach notably improves software quality, augmenting parametrization while eliminating the incidence of code duplication., Comment: 9 pages. Paper accepted for publication at The 27th IEEE International Symposium on Field-Programmable Custom Computing Machines, San Diego CA, April 28 - May 1, 2019
Published: 2019

12. P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs

Author: da Silva, Jeferson Santiago, Boyer, François-Raymond, Langlois, J. M. Pierre, da Silva, Jeferson Santiago, Boyer, François-Raymond, and Langlois, J. M. Pierre
Abstract: Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN networks need to be both reconfigurable and fast, to support the evolving network protocols and the increasing multi-gigabit data rates. The combination of packet processing languages with FPGAs seems to be the perfect match for these requirements. In this work, we develop an open-source FPGA-based configurable architecture for arbitrary packet parsing to be used in SDN networks. We generate low latency and high-speed streaming packet parsers directly from a packet processing program. Our architecture is pipelined and entirely modeled using templated C++ classes. The pipeline layout is derived from a parser graph that corresponds a P4 code after a series of graph transformation rounds. The RTL code is generated from the C++ description using Xilinx Vivado HLS and synthesized with Xilinx Vivado. Our architecture achieves 100 Gb/s data rate in a Xilinx Virtex-7 FPGA while reducing the latency by 45% and the LUT usage by 40% compared to the state-of-the-art., Comment: Accepted for publication at the 26th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays February 25 - 27, 2018 Monterey Marriott Hotel, Monterey, California, 7 pages, 7 figures, 1 table
Published: 2017
Full Text: View/download PDF

13. SHIP: A Scalable High-performance IPv6 Lookup Algorithm that Exploits Prefix Characteristics

Author: Stimpfling, Thibaut, Bélanger, Normand, Langlois, J. M. Pierre, Savaria, Yvon, Stimpfling, Thibaut, Bélanger, Normand, Langlois, J. M. Pierre, and Savaria, Yvon
Abstract: Due to the emergence of new network applications, current IP lookup engines must support high-bandwidth, low lookup latency and the ongoing growth of IPv6 networks. However, existing solutions are not designed to address jointly those three requirements. This paper introduces SHIP, an IPv6 lookup algorithm that exploits prefix characteristics to build a two-level data structure designed to meet future application requirements. Using both prefix length distribution and prefix density, SHIP first clusters prefixes into groups sharing similar characteristics, then it builds a hybrid trie-tree for each prefix group. The compact and scalable data structure built can be stored in on-chip low-latency memories, and allows the traversal process to be parallelized and pipelined at each level in order to support high packet bandwidth. Evaluated on real and synthetic prefix tables holding up to 580 k IPv6 prefixes, SHIP has a logarithmic scaling factor in terms of the number of memory accesses, and a linear memory consumption scaling. Using the largest synthetic prefix table, simulations show that compared to other well-known approaches, SHIP uses at least 44% less memory per prefix, while reducing the memory latency by 61%., Comment: Submitted to EEE/ACM Transactions on Networking
Published: 2017

14. Efficient realization of BCD multipliers using FPGAs

Author: Gao, Shuli, Al-Khalili, Dhamin, Langlois, J. M. Pierre, Chabini, Noureddine, Gao, Shuli, Al-Khalili, Dhamin, Langlois, J. M. Pierre, and Chabini, Noureddine
Abstract: In this paper, a novel BCD multiplier approach is proposed. The main highlight of the proposed architecture is the generation of the partial products and parallel binary operations based on 2-digit columns. 1 × 1-digit multipliers used for the partial product generation are implemented directly by 4-bit binary multipliers without any code conversion. The binary results of the 1 × 1-digit multiplications are organized according to their two-digit positions to generate the 2-digit column-based partial products. A binary-decimal compressor structure is developed and used for partial product reduction. These reduced partial products are added in optimized 6-LUT BCD adders. The parallel binary operations and the improved BCD addition result in improved performance and reduced resource usage. The proposed approach was implemented on Xilinx Virtex-5 and Virtex-6 FPGAs with emphasis on the critical path delay reduction. Pipelined BCD multipliers were implemented for 4 × 4, 8 × 8, and 16 × 16-digit multipliers. Our realizations achieve an increase in speed by up to 22% and a reduction of LUT count by up to 14% over previously reported results.
Published: 2017

15. Node configuration for the Aho-Corasick algorithm in intrusion detection systems

Author: Lacroix, Alexsandre B., Langlois, J. M. Pierre, Boyer, François-Raymond, Gosselin, Antoine, Bois, Guy, Lacroix, Alexsandre B., Langlois, J. M. Pierre, Boyer, François-Raymond, Gosselin, Antoine, and Bois, Guy
Abstract: In this paper, we analyze the performance and cost trade-off from selecting two representations of nodes when implementing the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Our analysis uses the Snort 2.9.7 rules set, which contains almost 26k patterns. Our methodology consists of code profiling and analysis, followed by the selection of a parameter to maximize a metric that combines clock cycles count and memory usage. The parameter determines which of two types of nodes is selected for each trie node. We show that it is possible to select the parameter to optimize the metric, which results in an improvement by up to 12× compared with the single node-type case.
Published: 2017

16. Memory Efficient Multi-Scale Line Detector Architecture for Retinal Blood Vessel Segmentation

Author: Bendaoudi, Hamza, Cheriet, Farida, Langlois, J. M. Pierre, Bendaoudi, Hamza, Cheriet, Farida, and Langlois, J. M. Pierre
Abstract: This paper presents a memory efficient architecture that implements the Multi-Scale Line Detector (MSLD) algorithm for real-time retinal blood vessel detection in fundus images on a Zynq FPGA. This implementation benefits from the FPGA parallelism to drastically reduce the memory requirements of the MSLD from two images to a few values. The architecture is optimized in terms of resource utilization by reusing the computations and optimizing the bit-width. The throughput is increased by designing fully pipelined functional units. The architecture is capable of achieving a comparable accuracy to its software implementation but 70x faster for low resolution images. For high resolution images, it achieves an acceleration by a factor of 323x., Comment: This paper was accepted and presented at Conference on Design and Architectures for Signal and Image Processing - DASIP 2016
Published: 2016

17. Extern Objects in P4: an ROHC Header Compression Scheme Case Study

Author: da Silva, Jeferson Santiago, Boyer, François-Raymond, Chiquette, Laurent-Olivier, Langlois, J. M. Pierre, da Silva, Jeferson Santiago, Boyer, François-Raymond, Chiquette, Laurent-Olivier, and Langlois, J. M. Pierre
Abstract: P4 is an emergent packet-processing language with which the user can describe how the packets are to be processed in a switching element. This paper presents a way to implement complex operations that are not natively supported in P4. In this work, we explored two different methods to add extensions to P4: i) using new native primitives and ii) using extern instances. As a case study, an ROHC entity was implemented and invoked in a P4 program. The tests showed similar relative performance in both methods in terms of normalized packet latency. However, extern instances appear to be more suitable for target-specific switching applications, where the manufacturer/vendor can specify its own specific operations without changes in the P4 syntax and semantics. Extern instances only require changes in the target-specific backend compiler while keeping the P4 frontend compiler unchanged. The use of externs also results in a more elegant code solution since they are implemented outside the switch-core, thus reducing side effects risks that can be caused by a modification in a switch pipeline implementation., Comment: 6 pages, 4 figures, 3 listings
Published: 2016

18. Accurate and Efficient Hyperbolic Tangent Activation Function on FPGA using the DCT Interpolation Filter

Author: Abdelsalam, Ahmed M., Langlois, J. M. Pierre, Cheriet, F., Abdelsalam, Ahmed M., Langlois, J. M. Pierre, and Cheriet, F.
Abstract: Implementing an accurate and fast activation function with low cost is a crucial aspect to the implementation of Deep Neural Networks (DNNs) on FPGAs. We propose a high-accuracy approximation approach for the hyperbolic tangent activation function of artificial neurons in DNNs. It is based on the Discrete Cosine Transform Interpolation Filter (DCTIF). The proposed architecture combines simple arithmetic operations on stored samples of the hyperbolic tangent function and on input data. The proposed DCTIF implementation achieves two orders of magnitude greater precision than previous work while using the same or fewer computational resources. Various combinations of DCTIF parameters can be chosen to tradeoff the accuracy and complexity of the hyperbolic tangent function. In one case, the proposed architecture approximates the hyperbolic tangent activation function with 10E-5 maximum error while requiring only 1.52 Kbits memory and 57 LUTs of a Virtex-7 FPGA. We also discuss how the activation function accuracy affects the performance of DNNs in terms of their training and testing accuracies. We show that a high accuracy approximation can be necessary in order to maintain the same DNN training and testing performances realized by the exact function., Comment: 8 pages, 6 figures, 5 tables, submitted for the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ISFPGA), 22-24 February 2017, California, USA
Published: 2016

19. The Mission of the United States Air Force and its Support to the United States Army Compared to the Marine Air Ground Task Force (MAGTF) Concept

Author: MARINE CORPS COMMAND AND STAFF COLL QUANTICO VA, Langlois, J. E., MARINE CORPS COMMAND AND STAFF COLL QUANTICO VA, and Langlois, J. E.
Abstract: When Executive Order 9877 was signed in 1947, it granted the United States Air Force its autonomy from the U.S. Army and delineated its new roles and responsibilities. One responsibility, support to ground forces, has remained a point of contention between the Army and the Air Force. Should the Air Force adopt more of a combined arms approach in the employment of combat forces, it would become part of a much more formidable combat force. This force would be similar to what is known as a Marine Air Ground Task Force or MAGT.
Published: 2009

20. Véloce Club Havrais : [membres du Véloce Club Havrais photographiés à Yvetot] / J. Langlois, photographe

Author: Langlois, J.. Photographe and Langlois, J.. Photographe
Abstract: Appartient à l’ensemble documentaire : BmLHav000
Published: 1889

21. Contre le projet de loi Ferry et l'article 7, vers dédiés au Sénat, par J.-M. Langlois,...

Author: Langlois, J.-M.. Auteur du texte and Langlois, J.-M.. Auteur du texte
Abstract: Avec mode texte

22. Fables de Phèdre, affranchi d'Auguste. T. 1 / ... traduites en français, avec le texte à côté et ornées de gravures

Author: Le Maistre de Sacy, Isaac-Louis (1613-1684). Traducteur, Camus, C.-S. (17..-18..?). Traducteur, Langlois, J.. Éditeur scientifique, Phèdre (0015? av. J.-C.-0054). Auteur du texte, Le Maistre de Sacy, Isaac-Louis (1613-1684). Traducteur, Camus, C.-S. (17..-18..?). Traducteur, Langlois, J.. Éditeur scientifique, and Phèdre (0015? av. J.-C.-0054). Auteur du texte
Abstract: [Fables]

23. Fables de Phèdre, affranchi d'Auguste. T. 2 / ... traduites en français, avec le texte à côté et ornées de gravures

Author: Le Maistre de Sacy, Isaac-Louis (1613-1684). Traducteur, Camus, C.-S. (17..-18..?). Traducteur, Langlois, J.. Éditeur scientifique, Phèdre (0015? av. J.-C.-0054). Auteur du texte, Le Maistre de Sacy, Isaac-Louis (1613-1684). Traducteur, Camus, C.-S. (17..-18..?). Traducteur, Langlois, J.. Éditeur scientifique, and Phèdre (0015? av. J.-C.-0054). Auteur du texte
Abstract: [Fables]

24. Le maistre d'armes, ou l'Exercice de l'épée seule, dans sa perfection dédié à Mgr le duc de Bourgogne / Par le sieur de Liancour

Author: Perelle, Adam (1638-1695). Graveur, Langlois, J.. Graveur, Liancour, André Wernesson, sieur de. Auteur du texte, Perelle, Adam (1638-1695). Graveur, Langlois, J.. Graveur, and Liancour, André Wernesson, sieur de. Auteur du texte

25. [Retrato de Jerónimo Benete] [Material gráfico]

Author: Langlois, J[ean] 1649-1712, Gantrel, Viuda Et A, Langlois, J[ean] 1649-1712, and Gantrel, Viuda Et A
Abstract: Inscripción: "Ve. H.º Gerónimo Benete, Pintor de Oficio, Varón de Insigne Caridad: murió en la Compañía de Jesús en Valladolid a 7 de enero de 1707", Ilustración de Noticia de la vida, virtudes, muerte y fama póstuma del V. H. Jerónimo Benete (anónimo ?). Valladolid. Por Antonio Figueroa, Iconografía Hispana, Langlois, J[ean] (1649-1712). Grabador, escuela francesa..Gantrel, Viuda et A. (Enfant). París

26. Contre le projet de loi Ferry et l'article 7, vers dédiés au Sénat, par J.-M. Langlois,...

Author: Langlois, J.-M.. Auteur du texte and Langlois, J.-M.. Auteur du texte
Abstract: Avec mode texte

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

26 results on '"Langlois J"'

1. Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic Reconfiguration

2. Examining the impacts of great lakes temperature perturbations on simulated precipitation in the Northeastern United States

3. Examining the impacts of great lakes temperature perturbations on simulated precipitation in the Northeastern United States

4. Design Principles for Packet Deparsers on FPGAs

5. Design Principles for Packet Deparsers on FPGAs

6. Bridging the Gap: FPGAs as Programmable Switches

7. PoET-BiN: Power Efficient Tiny Binary Neurons

8. An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs

9. CARLA: A Convolution Accelerator with a Reconfigurable and Low-Energy Architecture

10. Cerebellar Cortex 4-12 Hz Oscillations and Unit Phase Relation in the Awake Rat

11. Module-per-Object: a Human-Driven Methodology for C++-based High-Level Synthesis Design

12. P4-compatible High-level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs

13. SHIP: A Scalable High-performance IPv6 Lookup Algorithm that Exploits Prefix Characteristics

14. Efficient realization of BCD multipliers using FPGAs

15. Node configuration for the Aho-Corasick algorithm in intrusion detection systems

16. Memory Efficient Multi-Scale Line Detector Architecture for Retinal Blood Vessel Segmentation

17. Extern Objects in P4: an ROHC Header Compression Scheme Case Study

18. Accurate and Efficient Hyperbolic Tangent Activation Function on FPGA using the DCT Interpolation Filter

19. The Mission of the United States Air Force and its Support to the United States Army Compared to the Marine Air Ground Task Force (MAGTF) Concept

20. Véloce Club Havrais : [membres du Véloce Club Havrais photographiés à Yvetot] / J. Langlois, photographe

21. Contre le projet de loi Ferry et l'article 7, vers dédiés au Sénat, par J.-M. Langlois,...

22. Fables de Phèdre, affranchi d'Auguste. T. 1 / ... traduites en français, avec le texte à côté et ornées de gravures

23. Fables de Phèdre, affranchi d'Auguste. T. 2 / ... traduites en français, avec le texte à côté et ornées de gravures

24. Le maistre d'armes, ou l'Exercice de l'épée seule, dans sa perfection dédié à Mgr le duc de Bourgogne / Par le sieur de Liancour

25. [Retrato de Jerónimo Benete] [Material gráfico]

26. Contre le projet de loi Ferry et l'article 7, vers dédiés au Sénat, par J.-M. Langlois,...

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

26 results on '"Langlois J"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources