7 results on '"Aach, Marcel"'
Search Results
2. Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks
- Author
-
Aach, Marcel, Inanc, Eray, Sarma, Rakesh, Riedel, Morris, and Lintermann, Andreas
- Subjects
ddc:004 - Abstract
Continuously increasing data volumes from multiple sources, such as simulation and experimental measurements, demand efficient algorithms for an analysis within a realistic timeframe. Deep learning models have proven to be capable of understanding and analyzing large quantities of data with high accuracy. However, training them on massive datasets remains a challenge and requires distributed learning exploiting High-Performance Computing systems. This study presents a comprehensive analysis and comparison of three well-established distributed deep learning frameworks - Horovod, DeepSpeed, and Distributed Data Parallel by PyTorch - with a focus on their runtime performance and scalability. Additionally, the performance of two data loaders, the native PyTorch data loader and the DALI data loader by NVIDIA, is investigated. To evaluate these frameworks and data loaders, three standard ResNet architectures with 50, 101, and 152 layers are tested using the ImageNet dataset. The impact of different learning rate schedulers on validation accuracy is also assessed. The novel contribution lies in the detailed analysis and comparison of these frameworks and data loaders on the state-of-the-art Jülich Wizard for European Leadership Science (JUWELS) Booster system at the Jülich Supercomputing Centre, using up to 1024 A100 NVIDIA GPUs in parallel. Findings show that the DALI data loader significantly reduces the overall runtime of ResNet50 from more than 12 h on 4 GPUs to less than 200 s on 1024 GPUs. The outcomes of this work highlight the potential impact of distributed deep learning using efficient tools on accelerating scientific discoveries and data-driven applications.
- Published
- 2023
- Full Text
- View/download PDF
3. A proposed hybrid two-stage DL-HPC method for wind speed forecasting: using the first average forecast output for long-term forecasting
- Author
-
Hassanian, Reza, Helgadottir, Asdis, Aach, Marcel, Lintermann, Andreas, and Riedel, Morris
- Abstract
Energy consumption is growing extensively, which is caused by new demanding technological applications and continuously changing lifestyles, also with respect to climate change. Climate change is a significant issue and scientific reports notice the temperature environment continuously increasing, particularly in the summer. To alleviate the heat, people in many countries tend to use air conditioning systems in residential and business buildings. This puts additional pressure on the electricity network and the energy producers must be able to predict such events. It is agreed worldwide that harvesting renewable energy is the best option for fighting climate change. For example, recently, the number of electric cars has increased and it becomes more and more attractive to utilize green energy, e.g., produced by wind turbines, for them. The advantages of wind energy have intensively been studied, and a wide range of methods to create very short-term, short-term, medium-term, and long-term predictions using wind energy models or wind speed profiles are in use [1,2]. However, some of the forecasting methods are highly complex and costly in computing [3,4]. This study uses a gated recurrent unit (GRU) model, a deep learning model, to efficiently perform medium-term predictions of wind energy production. There is effort to apply these medium-term predictions to create long-term forecasting models. The literature has reported that GRUs are faster than long short-term memory (LSTM) models, which have been used in recent studies, can deal with relatively fewer data, and are cheaper in computing. The study applies empirical wind speed data from 5 years, which the Iceland Metrological office has measured at 10 m height at the Búfrell location. The log law is used to scale the speed up to 55 m, which is the height of an Enercon E44 wind turbine hub. The predictions are performed on the DAM module of the DEEP cluster at the Jülich Supercomputing Centre. The parallel machine allows to speed up the model scaling. The results show that the proposed model can predict medium and long-term wind speeds as a function of the ratio of training data. This method conducts the forecasting cheaper in computing than LSTM but with equal performance.
- Published
- 2023
4. Parallel and Scalable Deep Learning to Reconstruct Actuated Turbulent Boundary Layer Flows. Part II: Autoencoder Training on HPC Systems
- Author
-
Inanc, Eray, Albers, Marian, Sarma, Rakesh, Aach, Marcel, Schröder, Wolfgang, and Lintermann, Andreas
- Abstract
Convolutional autoencoders are trained on exceptionally large actuated turbulent boundary layer simulation data (8.3 TB) on the high-performance computer JUWELS at the J\"ulich Supercomputing Centre. The parallelization of the training is based on a distributed data-parallelism approach. This method relies on distributing the training dataset to multiple workers, where the trainable parameters of the convolutional autoencoder network are occasionally exchanged between the workers. This allows the training times to be drastically reduced - almost linear scaling performance can be achieved by increasing the number of workers (up to 2,048 GPUs). As a consequence of this increase, the total batch size also increases. This directly affects the training accuracy and hence, the quality of the trained network. The training error, computed between the reference and the reconstructed turbulent boundary layer fields, becomes larger when the number of workers is increased. This behavior needs to be taken care of especially when going to a large number of workers, i.e., a compromise between parallel speed and accuracy needs to be found.
- Published
- 2022
5. Parallel and Scalable Deep Learning to Reconstruct Actuated Turbulent Boundary Layer Flows. Part I: Investigation of Autoencoder-Based Trainings
- Author
-
Sarma, Rakesh, Albers, Marian, Inanc, Eray, Aach, Marcel, Schröder, Wolfgang, and Lintermann, Andreas
- Abstract
With the availability of large datasets and increasing high-performance computing resources, machine learning tools offer many opportunities to improve and/or augment numerical methods used in the field of computational fluid dynamics. A low-dimensional representation of a turbulent boundary layer flow field is generated by a plain and a physics-contrained autoencoder. The training makes use of a distributed learning environment. The average test error of the plain autoencoder is ~4.4 times smaller than the error of the physics-constrained autoencoder although the latter integrates physical laws in the training process. Furthermore, after 1,000 epochs, the training loss of the physics-constrained autoencoder is ~9.1 times higher than the plain autoencoder after 300 epochs. The neural network corresponding to the plain autoencoder is able to provide accurate reconstructions of a turbulent boundary layer flow.
- Published
- 2022
6. Deep Learning for Prediction and Control of Cellular Automata in Unreal Environments
- Author
-
Aach, Marcel
- Subjects
Nonlinear Sciences::Cellular Automata and Lattice Gases - Abstract
In this thesis, we show the ability of a deep convolutional neural network to understand the underlying transition rules of two-dimensional cellular automata by pure observation. To do so, we evaluate the network on a prediction task, where it has to predict the next state of some cellular automata, and a control task, where it has to intervene in the evolution of a cellular automaton to achieve a state of standstill. The cellular automata we use in this case are based on the classical Game of Life by John Conway and implemented in the Unreal Engine. With the usage of the Unreal Engine for data generation, a technical pipeline for processing output images with neural networks is established.Cellular automata in general are chaotic dynamical systems, making any sort of prediction or control very challenging, but using convolutional neural networks to exploit the locality of their interactions is a promising approach to solve these problems. The network we present in this thesis follows the Encoder-Decoder structure and features residual skip connections that serve as shortcuts in between the different layers. Recent advancements in the field of image recognition and segmentation have shown that both of these aspects are the key to success.The evaluation of the prediction task is split into several levels of generalization: we train the developed network on trajectories of several hundred different cellular automata, varying in their transition rules and neighborhood sizes. Results on a test set show that the network is able to learn the rules of even more complex cellular automata (with an accuracy of ≈ 93%). To some extent, it is even able to interpolate and generalize to completely unseen rules (with an accuracy of ≈ 77%). A qualitative investigation shows that static rules (not forcing many changes in between time steps) are among the easiest to predict.For the control task, we combine the encoder part of the developed neural network with a reinforcement agent and train it to stop all movements on the grid of the cellular automata as quickly as possible. To do so, the agent can change the state of a single cell per time step. A comparison between giving back rewards to agents continuously and giving them only in the case of success or failure shows that Proximal Policy Optimization agents do better with receiving sparse rewards while Deep Q-Network agents fare better with continuously receiving them. Both algorithms beat random agents on training data, but their generalization ability remains limited.
- Published
- 2021
7. Generalization over different cellular automata rules learned by a deep feed-forward neural network
- Author
-
Aach, Marcel, Goebbert, Jens Henrik, and Jitsev, Jenia
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Science - Neural and Evolutionary Computing ,FOS: Physical sciences ,Neural and Evolutionary Computing (cs.NE) ,Nonlinear Sciences::Cellular Automata and Lattice Gases ,Adaptation and Self-Organizing Systems (nlin.AO) ,Nonlinear Sciences - Adaptation and Self-Organizing Systems ,Machine Learning (cs.LG) - Abstract
To test generalization ability of a class of deep neural networks, we randomly generate a large number of different rule sets for 2-D cellular automata (CA), based on John Conway's Game of Life. Using these rules, we compute several trajectories for each CA instance. A deep convolutional encoder-decoder network with short and long range skip connections is trained on various generated CA trajectories to predict the next CA state given its previous states. Results show that the network is able to learn the rules of various, complex cellular automata and generalize to unseen configurations. To some extent, the network shows generalization to rule sets and neighborhood sizes that were not seen during the training at all. Code to reproduce the experiments is publicly available at: https://github.com/SLAMPAI/generalization-cellular-automata, Comment: Accepted at 23rd International Conference on Artificial Intelligence (July 2021, Las Vegas, USA) To appear in: Springer Transactions on Computational Science & Computational Intelligence
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.