Author: "Hofmann, Jaco" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hofmann, Jaco"' showing total 21 results

Start Over Author "Hofmann, Jaco"

21 results on '"Hofmann, Jaco"'

1. DISCO: Distributed Inference with Sparse Communications

Author: Qin, Minghai, Sun, Chao, Hofmann, Jaco, and Vucinic, Dejan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep neural networks (DNNs) have great potential to solve many real-world problems, but they usually require an extensive amount of computation and memory. It is of great difficulty to deploy a large DNN model to a single resource-limited device with small memory capacity. Distributed computing is a common approach to reduce single-node memory consumption and to accelerate the inference of DNN models. In this paper, we explore the "within-layer model parallelism", which distributes the inference of each layer into multiple nodes. In this way, the memory requirement can be distributed to many nodes, making it possible to use several edge devices to infer a large DNN model. Due to the dependency within each layer, data communications between nodes during this parallel inference can be a bottleneck when the communication bandwidth is limited. We propose a framework to train DNN models for Distributed Inference with Sparse Communications (DISCO). We convert the problem of selecting which subset of data to transmit between nodes into a model optimization problem, and derive models with both computation and communication reduction when each layer is inferred on multiple nodes. We show the benefit of the DISCO framework on a variety of CV tasks such as image classification, object detection, semantic segmentation, and image super resolution. The corresponding models include important DNN building blocks such as convolutions and transformers. For example, each layer of a ResNet-50 model can be distributively inferred across two nodes with five times less data communications, almost half overall computations and half memory requirement for a single node, and achieve comparable accuracy to the original ResNet-50 model. This also results in 4.7 times overall inference speedup.
Published: 2023

2. Multi-Chip Dataflow Architecture for Massive Scale Biophyscially Accurate Neuron Simulation

Author: Hofmann, Jaco, primary
Published: 2022
Full Text: View/download PDF

3. Exploiting 3D Memory for Accelerated In-Network Processing of Hash Joins in Distributed Databases

Author: Wirth, Johannes, Hofmann, Jaco A., Thostrup, Lasse, Koch, Andreas, Binnig, Carsten, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Derrien, Steven, editor, Hannig, Frank, editor, Diniz, Pedro C., editor, and Chillet, Daniel, editor
Published: 2021
Full Text: View/download PDF

4. The TaPaSCo Open-Source Toolflow: for the Automated Composition of Task-Based Parallel Reconfigurable Computing Systems

Author: Heinz, Carsten, Hofmann, Jaco, Korinth, Jens, Sommer, Lukas, Weber, Lukas, and Koch, Andreas
Published: 2021
Full Text: View/download PDF

5. The TaPaSCo Open-Source Toolflow for the Automated Composition of Task-Based Parallel Reconfigurable Computing Systems

Author: Korinth, Jens, Hofmann, Jaco, Heinz, Carsten, Koch, Andreas, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Hochberger, Christian, editor, Nelson, Brent, editor, Koch, Andreas, editor, Woods, Roger, editor, and Diniz, Pedro, editor
Published: 2019
Full Text: View/download PDF

6. DISCO: Distributed Inference with Sparse Communications

Author: Qin, Minghai, primary, Sun, Chao, additional, Hofmann, Jaco, additional, and Vucinic, Dejan, additional
Published: 2024
Full Text: View/download PDF

7. Exploiting 3D Memory for Accelerated In-Network Processing of Hash Joins in Distributed Databases

Author: Wirth, Johannes, primary, Hofmann, Jaco A., additional, Thostrup, Lasse, additional, Koch, Andreas, additional, and Binnig, Carsten, additional
Published: 2021
Full Text: View/download PDF

8. Scalable and Flexible High-Performance In-Network Processing of Hash Joins in Distributed Databases

Author: Wirth, Johannes, primary, Hofmann, Jaco A., additional, Thostrup, Lasse, additional, Binnig, Carsten, additional, and Koch, Andreas, additional
Published: 2021
Full Text: View/download PDF

9. An Improved Framework for and Case Studies in FPGA-Based Application Acceleration - Computer Vision, In-Network Processing and Spiking Neural Networks

Author: Hofmann, Jaco and Hofmann, Jaco
Abstract: Field Programmable Gate Arrays (FPGAs) are a new addition to the world of data center acceleration. While the underlying technology has been around for decades, their application in data centers slowly starts gaining traction. However, there are myriad problems that hinder the widespread application of FPGAs in the data center. The closed source tool chains result in vendor lock-in and unstable tool flows. The languages used to program FPGAs require different design processes which are not easily learned by software developers. Compared to commodity solutions using CPUs and GPUs, FPGAs are more expensive and more time consuming to develop for. All of this and more make FPGAs a tough sell to people in need of task acceleration. Nonetheless, FPGAs also offer an opportunity to develop faster accelerators with a smaller energy envelop for rapidly changing applications. This work presents a solution to FPGA abstraction using the TaPaSCo framework. TaPaSCo simplifies moving between different FPGA architectures and automates scaling of accelerators across a multitude of devices. In addition, the framework provides a homogenized way of interacting with the accelerators. This thesis presents applications where FPGAs offer many benefits in the data center. Applications such as Semi-Global Block Matching which are difficult to compute on CPUs and GPUs due to the specific data transfer patterns, can be implemented highly efficiently an FPGAs. The presented work achieves over 35x of speedup on FPGAs compared to implementations of GPUs. FPGAs can also be used to improve network efficiency in the data center by replacing central network components with smart switches. The work presented here achieves up to 7x speedup over a classical distributed software implementation in a hash join scenario. Furthermore, FPGA can be used to bring new storage technologies into the data center by providing highly efficient consensus services right inside the network. The presented work shows that
Published: 2020

10. Improving Job Launch Rates in the TaPaSCo FPGA Middleware by Hardware/Software-Co-Design

Author: Heinz, Carsten, primary, Hofmann, Jaco A., additional, Sommer, Lukas, additional, and Koch, Andreas, additional
Published: 2020
Full Text: View/download PDF

11. Online Reprogrammable Multi Tenant Switches

Author: Krude, Johannes, primary, Hofmann, Jaco, additional, Eichholz, Matthias, additional, Wehrle, Klaus, additional, Koch, Andreas, additional, and Mezini, Mira, additional
Published: 2019
Full Text: View/download PDF

12. A Catalog and In-Hardware Evaluation of Open-Source Drop-In Compatible RISC-V Softcore Processors

Author: Heinz, Carsten, primary, Lavan, Yannick, additional, Hofmann, Jaco, additional, and Koch, Andreas, additional
Published: 2019
Full Text: View/download PDF

13. An Improved Framework for and Case Studies in FPGA-Based Application Acceleration - Computer Vision, In-Network Processing and Spiking Neural Networks

Author: Hofmann, Jaco
Abstract: Field Programmable Gate Arrays (FPGAs) are a new addition to the world of data center acceleration. While the underlying technology has been around for decades, their application in data centers slowly starts gaining traction. However, there are myriad problems that hinder the widespread application of FPGAs in the data center. The closed source tool chains result in vendor lock-in and unstable tool flows. The languages used to program FPGAs require different design processes which are not easily learned by software developers. Compared to commodity solutions using CPUs and GPUs, FPGAs are more expensive and more time consuming to develop for. All of this and more make FPGAs a tough sell to people in need of task acceleration. Nonetheless, FPGAs also offer an opportunity to develop faster accelerators with a smaller energy envelop for rapidly changing applications. This work presents a solution to FPGA abstraction using the TaPaSCo framework. TaPaSCo simplifies moving between different FPGA architectures and automates scaling of accelerators across a multitude of devices. In addition, the framework provides a homogenized way of interacting with the accelerators. This thesis presents applications where FPGAs offer many benefits in the data center. Applications such as Semi-Global Block Matching which are difficult to compute on CPUs and GPUs due to the specific data transfer patterns, can be implemented highly efficiently an FPGAs. The presented work achieves over 35x of speedup on FPGAs compared to implementations of GPUs. FPGAs can also be used to improve network efficiency in the data center by replacing central network components with smart switches. The work presented here achieves up to 7x speedup over a classical distributed software implementation in a hash join scenario. Furthermore, FPGA can be used to bring new storage technologies into the data center by providing highly efficient consensus services right inside the network. The presented work shows that fetching pages remotely using a FPGA accelerated consensus system can be done as fast as 10us over the network which is only 55% of a conventional solution. These results make non-volatile network storage solutions as replacement for main memory viable. Lastly, this thesis presents a way of simulating parts of a brain with a very high level accuracy using FPGA. The spiking neural networks employed in the accelerator can benefit the research of brain functionality. The accelerator is capable of handling tens of thousands of neurons with a strict real time requirement of 50us per simulation step.
Published: 2019
Full Text: View/download PDF

14. High-Throughput Multi-Threaded Sum-Product Network Inference in the Reconfigurable Cloud

Author: Ober, Micha, primary, Hofmann, Jaco, additional, Sommer, Lukas, additional, Weber, Lukas, additional, and Koch, Andreas, additional
Published: 2019
Full Text: View/download PDF

15. Consensus for Non-volatile Main Memory

Author: Dang, Huynh Tu, primary, Hofmann, Jaco, additional, Liu, Yang, additional, Radi, Marjan, additional, Vucinic, Dejan, additional, Soule, Robert, additional, and Pedone, Fernando, additional
Published: 2018
Full Text: View/download PDF

16. A Real-Time Reconfigurable Multichip Architecture for Large-Scale Biophysically Accurate Neuron Simulation

Author: Zjajo, Amir, primary, Hofmann, Jaco, additional, Christiaanse, Gerrit Jan, additional, van Eijk, Martijn, additional, Smaragdos, Georgios, additional, Strydis, Christos, additional, de Graaf, Alexander, additional, Galuzzi, Carlo, additional, and van Leuken, Rene, additional
Published: 2018
Full Text: View/download PDF

17. Synthesis of interleaved multithreaded accelerators from OpenMP loops

Author: Sommer, Lukas, primary, Oppermann, Julian, additional, Hofmann, Jaco, additional, and Koch, Andreas, additional
Published: 2017
Full Text: View/download PDF

18. A scalable latency-insensitive architecture for FPGA-accelerated semi-global matching in stereo vision applications

Author: Hofmann, Jaco, primary, Korinth, Jens, additional, and Koch, Andreas, additional
Published: 2016
Full Text: View/download PDF

19. Multi-chip dataflow architecture for massive scale biophysically accurate neuron simulation

Author: Hofmann, Jaco, primary, Zjajo, Amir, additional, Galuzzi, Carlo, additional, and van Leuken, Rene, additional
Published: 2016
Full Text: View/download PDF

20. A Scalable High-Performance Hardware Architecture for Real-Time Stereo Vision by Semi-Global Matching

Author: Hofmann, Jaco, primary, Korinth, Jens, additional, and Koch, Andreas, additional
Published: 2016
Full Text: View/download PDF

21. Multi-chip dataflow architecture for massive scale biophysically accurate neuron simulation.

Author: Hofmann J, Zjajo A, Galuzzi C, and van Leuken R
Subjects: Animals, Biophysics, Humans, Computer Simulation, Models, Biological, Nerve Net physiology, Neural Networks, Computer, Neurons physiology
Abstract: State-of-the-art neuron simulators are capable of simulating at most few tens/hundreds of neurons in real-time due to the exponential growth in the communication costs with the number of simulated neurons. In this paper, we present a novel, reconfigurable, multi-chip system architecture based on localized communication, which effectively reduces the communication cost to a linear growth. The system is very flexible and it allows to tune, at run-time, various parameters, e.g. the intracellular concentration of chemical compounds, the interconnection scheme between the neurons. Experimental results indicate that the proposed system architecture allows the simulation of up to few thousands biophysically accurate neurons over multiple chips.
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

21 results on '"Hofmann, Jaco"'

1. DISCO: Distributed Inference with Sparse Communications

2. Multi-Chip Dataflow Architecture for Massive Scale Biophyscially Accurate Neuron Simulation

3. Exploiting 3D Memory for Accelerated In-Network Processing of Hash Joins in Distributed Databases

4. The TaPaSCo Open-Source Toolflow: for the Automated Composition of Task-Based Parallel Reconfigurable Computing Systems

5. The TaPaSCo Open-Source Toolflow for the Automated Composition of Task-Based Parallel Reconfigurable Computing Systems

6. DISCO: Distributed Inference with Sparse Communications

7. Exploiting 3D Memory for Accelerated In-Network Processing of Hash Joins in Distributed Databases

8. Scalable and Flexible High-Performance In-Network Processing of Hash Joins in Distributed Databases

9. An Improved Framework for and Case Studies in FPGA-Based Application Acceleration - Computer Vision, In-Network Processing and Spiking Neural Networks

10. Improving Job Launch Rates in the TaPaSCo FPGA Middleware by Hardware/Software-Co-Design

11. Online Reprogrammable Multi Tenant Switches

12. A Catalog and In-Hardware Evaluation of Open-Source Drop-In Compatible RISC-V Softcore Processors

13. An Improved Framework for and Case Studies in FPGA-Based Application Acceleration - Computer Vision, In-Network Processing and Spiking Neural Networks

14. High-Throughput Multi-Threaded Sum-Product Network Inference in the Reconfigurable Cloud

15. Consensus for Non-volatile Main Memory

16. A Real-Time Reconfigurable Multichip Architecture for Large-Scale Biophysically Accurate Neuron Simulation

17. Synthesis of interleaved multithreaded accelerators from OpenMP loops

18. A scalable latency-insensitive architecture for FPGA-accelerated semi-global matching in stereo vision applications

19. Multi-chip dataflow architecture for massive scale biophysically accurate neuron simulation

20. A Scalable High-Performance Hardware Architecture for Real-Time Stereo Vision by Semi-Global Matching

21. Multi-chip dataflow architecture for massive scale biophysically accurate neuron simulation.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

21 results on '"Hofmann, Jaco"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources