Author: "Wojciech, Samek" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wojciech, Samek"' showing total 142 results

Start Over Author "Wojciech, Samek" Database OpenAIRE

142 results on '"Wojciech, Samek"'

1. Dataset Similarity to Assess Semisupervised Learning Under Distribution Mismatch Between the Labeled and Unlabeled Datasets

Author: Saul Calderon-Ramirez, Luis Oala, Jordina Torrents-Barrena, Shengxiang Yang, David Elizondo, Armaghan Moemeni, Simon Colreavy-Donnelly, Wojciech Samek, Miguel A. Molina-Cabello, and Ezequiel López-Rubio
Subjects: ComputingMethodologies_PATTERNRECOGNITION, Semi-supervised deep learning, Artificial Intelligence, Dataset similarity, MixMatch, Deep learning, Distribution mismatch, Out of distribution data, Computer Science Applications
Abstract: The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link. Semi-supervised deep learning (SSDL) is a popular strategy to leverage unlabelled data for machine learning when labelled data is not readily available. In real-world scenarios, different unlabelled data sources are usually available, with varying degrees of distribution mismatch regarding the labelled datasets. It begs the question which unlabelled dataset to choose for good SSDL outcomes. ftentimes, semantic heuristics are used to match unlabelled data with labelled data. However, a quantitative and systematic approach to this election problem would be preferable. In this work, we first test the SSDL MixMatch algorithm under various distribution mismatch configurations to study the impact on SSDL accuracy. Then, we propose a quantitative unlabelled dataset selection heuristic based on dataset dissimilarity measures. These are designed to systematically assess how distribution mismatch between the labelled and unlabelled datasets affects MixMatch performance. We refer to our proposed method as deep dataset dissimilarity measures (DeDiMs), designed to compare labelled and unlabelled datasets. They use the feature space of a generic Wide-ResNet, can be applied prior to learning, are quick to evaluate and model agnostic. The strong correlation in our tests between MixMatch accuracy and the proposed DeDiMs suggests that this approach can be a good fit for quantitatively ranking different unlabelled datasets prior to SSDL training.
Published: 2023

2. Beyond explaining: Opportunities and challenges of XAI-based model improvement

Author: Leander Weber, Sebastian Lapuschkin, Alexander Binder, and Wojciech Samek
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Hardware and Architecture, Signal Processing, Software, Machine Learning (cs.LG), Information Systems
Abstract: Explainable Artificial Intelligence (XAI) is an emerging research field bringing transparency to highly complex and opaque machine learning (ML) models. Despite the development of a multitude of methods to explain the decisions of black-box classifiers in recent years, these tools are seldomly used beyond visualization purposes. Only recently, researchers have started to employ explanations in practice to actually improve models. This paper offers a comprehensive overview over techniques that apply XAI practically for improving various properties of ML models, and systematically categorizes these approaches, comparing their respective strengths and weaknesses. We provide a theoretical perspective on these methods, and show empirically through experiments on toy and realistic settings how explanations can help improve properties such as model generalization ability or reasoning, among others. We further discuss potential caveats and drawbacks of these methods. We conclude that while model improvement based on XAI can have significant beneficial effects even on complex and not easily quantifyable model properties, these methods need to be applied carefully, since their success can vary depending on a multitude of factors, such as the model and dataset used, or the employed explanation method.
Published: 2023

3. Decentral and Incentivized Federated Learning Frameworks: A Systematic Literature Review

Author: Leon Witt, Mathis Heyer, Kentaroh Toyoda, Wojciech Samek, and Dan Li
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Networks and Communications, Hardware and Architecture, Signal Processing, Distributed, Parallel, and Cluster Computing (cs.DC), Machine Learning (cs.LG), Computer Science Applications, Information Systems
Abstract: The advent of Federated Learning (FL) has ignited a new paradigm for parallel and confidential decentralized Machine Learning (ML) with the potential of utilizing the computational power of a vast number of IoT, mobile and edge devices without data leaving the respective device, ensuring privacy by design. Yet, in order to scale this new paradigm beyond small groups of already entrusted entities towards mass adoption, the Federated Learning Framework (FLF) has to become (i) truly decentralized and (ii) participants have to be incentivized. This is the first systematic literature review analyzing holistic FLFs in the domain of both, decentralized and incentivized federated learning. 422 publications were retrieved, by querying 12 major scientific databases. Finally, 40 articles remained after a systematic review and filtering process for in-depth examination. Although having massive potential to direct the future of a more distributed and secure AI, none of the analyzed FLF is production-ready. The approaches vary heavily in terms of use-cases, system design, solved issues and thoroughness. We are the first to provide a systematic approach to classify and quantify differences between FLF, exposing limitations of current works and derive future directions for research in this novel domain., submitted to IEEE IOTJ
Published: 2023

4. CFD: Communication-Efficient Federated Distillation via Soft-Label Quantization and Delta Coding

Author: Wojciech Samek, Roman Rischke, Arturo Marban, Felix Sattler, and Publica
Subjects: Contextual image classification, Computer Networks and Communications, Computer science, Distributed computing, Perspective (graphical), Computer Science Applications, law.invention, Data set, Control and Systems Engineering, law, Leverage (statistics), Language model, Quantization (image processing), Distillation, Coding (social sciences)
Abstract: Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning with fundamentally different communication properties, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set, between the central server and the participating clients. In this work, we investigate FD from the perspective of communication efficiency by analyzing the effects of active distillation-data curation, soft-label quantization, and delta-coding techniques. Based on the insights gathered from this analysis, we present Compressed Federated Distillation (CFD), an efficient Federated Distillation method. Extensive experiments, on federated image classification and language modeling problems, at different levels of data heterogeneity, demonstrate that our method can reduce the amount of communication necessary to achieve fixed performance targets by more than two orders of magnitude when compared to FD, and by more than four orders of magnitude when compared to parameter averaging based techniques like Federated Averaging.
Published: 2022

5. Toward Explainable Artificial Intelligence for Regression Models: A methodological perspective

Author: Simon Letzgus, Patrick Wagner, Jonas Lederer, Wojciech Samek, Klaus-Robert Muller, and Gregoire Montavon
Subjects: Applied Mathematics, Signal Processing, Electrical and Electronic Engineering
Published: 2022

6. Overview of the Neural Network Compression and Representation (NNR) Standard

Author: Hamed Rezazadegan-Tavakoli, Wojciech Samek, Werner Bailer, Paul Haase, Karsten Muller, Swayambhoo Jain, Francesco Cricri, Miska Hannuksela, Shan Liu, Emre Aksu, Wei Jiang, Shahab Hamidi-Rad, Fabien Racape, Heiner Kirchhoffer, and Wei Wang
Subjects: Artificial neural network, Computer science, Quantization (signal processing), Encoding (memory), Media Technology, Data_CODINGANDINFORMATIONTHEORY, Pruning (decision trees), Electrical and Electronic Engineering, Representation (mathematics), Bitstream format, Algorithm, Decoding methods, Coding (social sciences)
Abstract: Neural Network Coding and Representation (NNR) is the first international standard for efficient compression of neural networks (NNs). The standard is designed as a toolbox of compression methods, which can be used to create coding pipelines. It can be either used as an independent coding framework (with its own bitstream format) or together with external neural network formats and frameworks. For providing the highest degree of flexibility, the network compression methods operate per parameter tensor in order to always ensure proper decoding, even if no structure information is provided. The NNR standard contains compression-efficient quantization and deep context-adaptive binary arithmetic coding (DeepCABAC) as core encoding and decoding technologies, as well as neural network parameter pre-processing methods like sparsification, pruning, low-rank decomposition, unification, local scaling and batch norm folding. NNR achieves a compression efficiency of more than 97% for transparent coding cases, i.e. without degrading classification quality, such as top-1 or top-5 accuracies. This paper provides an overview of the technical features and characteristics of NNR.
Published: 2022

7. Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence

Author: Rita Cucchiara, Javier Del Ser, Wojciech Samek, Matthias Dehmer, Igor Jurisica, Isabelle Augenstein, Natalia Díaz-Rodríguez, Frank Emmert-Streib, Andreas Holzinger, Tampere University, Computing Sciences, and Publica
Subjects: Artificial intelligence, Computer science, Process (engineering), Inference, Context (language use), Trust, 03 medical and health sciences, Neural-symbolic learning and reasoning, 0302 clinical medicine, Robustness, 030304 developmental biology, Causal model, 0303 health sciences, business.industry, 213 Electronic, automation and communications engineering, electronics, Explainability, Explainable AI, Graph-based machine learning, Information fusion, Medical AI, Complex network, 3. Good health, Transformative learning, Workflow, Hardware and Architecture, 030220 oncology & carcinogenesis, Enabling, Signal Processing, business, Software, Information Systems
Abstract: Andreas Holzinger acknowledges funding support from the Austrian Science Fund (FWF), Project: P-32554 explainable Artificial Intelligence and from the European Union's Horizon 2020 research and innovation program under grant agreement 826078 (Feature Cloud). This publication reflects only the authors' view and the European Commission is not responsible for any use that may be made of the information it contains; Natalia Diaz-Rodriguez is supported by the Spanish Government Juan de la Cierva Incorporacion contract (IJC2019-039152-I); Isabelle Augenstein's research is partially funded by a DFF Sapere Aude research leader grant; Javier Del Ser acknowledges funding support from the Basque Government through the ELKARTEK program (3KIA project, KK-2020/00049) and the consolidated research group MATHMODE (ref. T1294-19); Wojciech Samek acknowledges funding support from the European Union's Horizon 2020 research and innovation program under grant agreement No. 965221 (iToBoS), and the German Federal Ministry of Education and Research (ref. 01IS18025 A, ref. 01IS18037I and ref. 0310L0207C); Igor Jurisica acknowledges funding support from Ontario Research Fund (RDI 34876), Natural Sciences Research Council (NSERC 203475), CIHR Research Grant (93579), Canada Foundation for Innovation (CFI 29272, 225404, 33536), IBM, Ian Lawson van Toch Fund, the Schroeder Arthritis Institute via the Toronto General and Western Hospital Foundation., Medical artificial intelligence (AI) systems have been remarkably successful, even outperforming human performance at certain tasks. There is no doubt that AI is important to improve human health in many ways and will disrupt various medical workflows in the future. Using AI to solve problems in medicine beyond the lab, in routine environments, we need to do more than to just improve the performance of existing AI methods. Robust AI solutions must be able to cope with imprecision, missing and incorrect information, and explain both the result and the process of how it was obtained to a medical expert. Using conceptual knowledge as a guiding model of reality can help to develop more robust, explainable, and less biased machine learning models that can ideally learn from less data. Achieving these goals will require an orchestrated effort that combines three complementary Frontier Research Areas: (1) Complex Networks and their Inference, (2) Graph causal models and counterfactuals, and (3) Verification and Explainability methods. The goal of this paper is to describe these three areas from a unified view and to motivate how information fusion in a comprehensive and integrative manner can not only help bring these three areas together, but also have a transformative role by bridging the gap between research and practical applications in the context of future trustworthy medical AI. This makes it imperative to include ethical and legal aspects as a cross-cutting discipline, because all future solutions must not only be ethically responsible, but also legally compliant., Austrian Science Fund (FWF) P-32554, European Union's Horizon 2020 research and innovation program 826078 965221, Spanish Government Juan de la Cierva Incorporacion IJC2019-039152-I, DFF Sapere Aude research leader grant, Basque Government KK-2020/00049, consolidated research group MATHMODE T1294-19, Federal Ministry of Education & Research (BMBF) 01IS18025 A 01IS18037I 0310L0207C, Ontario Research Fund RDI 34876, Natural Sciences Research Council NSERC 203475, Canadian Institutes of Health Research (CIHR) 93579, Canada Foundation for Innovation CGIAR CFI 29272 225404 33536, International Business Machines (IBM), Ian Lawson van Toch Fund, Schroeder Arthritis Institute via the Toronto General and Western Hospital Foundation
Published: 2022

8. From Clustering to Cluster Explanations via Neural Networks

Author: Jacob Kauffmann, Malte Esders, Lukas Ruff, Gregoire Montavon, Wojciech Samek, and Klaus-Robert Muller
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, ComputingMethodologies_PATTERNRECOGNITION, Statistics - Machine Learning, Artificial Intelligence, Computer Networks and Communications, Machine Learning (stat.ML), Software, Machine Learning (cs.LG), Computer Science Applications
Abstract: A recent trend in machine learning has been to enrich learned models with the ability to explain their own predictions. The emerging field of Explainable AI (XAI) has so far mainly focused on supervised learning, in particular, deep neural network classifiers. In many practical problems however, label information is not given and the goal is instead to discover the underlying structure of the data, for example, its clusters. While powerful methods exist for extracting the cluster structure in data, they typically do not answer the question why a certain data point has been assigned to a given cluster. We propose a new framework that can, for the first time, explain cluster assignments in terms of input features in an efficient and reliable manner. It is based on the novel insight that clustering models can be rewritten as neural networks - or 'neuralized'. Cluster predictions of the obtained networks can then be quickly and accurately attributed to the input features. Several showcases demonstrate the ability of our method to assess the quality of learned clusters and to extract novel insights from the analyzed data and representations., 15 pages + supplement
Published: 2022

9. Explain and improve: LRP-inference fine-tuning for image captioning models

Author: Jiamei Sun, Wojciech Samek, Alexander Binder, Sebastian Lapuschkin, and Publica
Subjects: FOS: Computer and information sciences, Closed captioning, Computer Science - Machine Learning, Fine-tuning, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Inference, computer.software_genre, Machine Learning (cs.LG), Relevance (information retrieval), Interpretability, Computer Science - Computation and Language, business.industry, Object (computer science), Hardware and Architecture, Hallucinating, Signal Processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Software, Natural language processing, Sentence, Information Systems
Abstract: This paper analyzes the predictions of image captioning models with attention mechanisms beyond visualizing the attention itself. We develop variants of Layer-wise Relevance Propagation (LRP) and gradient-based explanation methods, tailored to image captioning models with attention mechanisms. We compare the interpretability of attention heatmaps systematically against the explanations provided by explanation methods such as LRP, Grad-CAM, and Guided Grad-CAM. We show that explanation methods provide simultaneously pixel-wise image explanations (supporting and opposing pixels of the input image) and linguistic explanations (supporting and opposing words of the preceding sequence) for each word in the predicted captions. We demonstrate with extensive experiments that explanation methods (1) can reveal additional evidence used by the model to make decisions compared to attention; (2) correlate to object locations with high precision; (3) are helpful to “debug” the model, e.g. by analyzing the reasons for hallucinated object words. With the observed properties of explanations, we further design an LRP-inference fine-tuning strategy that reduces the issue of object hallucination in image captioning models, and meanwhile, maintains the sentence fluency. We conduct experiments with two widely used attention mechanisms: the adaptive attention mechanism calculated with the additive attention and the multi-head attention mechanism calculated with the scaled dot product.
Published: 2022

10. Langevin Cooling for Unsupervised Domain Translation

Author: Vignesh Srinivasan, Klaus-Robert Muller, Wojciech Samek, Shinichi Nakajima, and Publica
Subjects: language translation, generative models, Computer Networks and Communications, Perturbation methods, image-to-image translation, Superresolution, Langevin dynamics, Computer Science Applications, Artificial Intelligence, Transformers, Task analysis, Training, Domain translation (DT), Cooling, Manifolds, Software
Abstract: Domain translation is the task of finding correspondence between two domains. Several deep neural network (DNN) models, e.g., CycleGAN and cross-lingual language models, have shown remarkable successes on this task under the unsupervised setting--the mappings between the domains are learned from two independent sets of training data in both domains (without paired samples). However, those methods typically do not perform well on a significant proportion of test samples. In this article, we hypothesize that many of such unsuccessful samples lie at the fringe--relatively low-density areas--of data distribution, where the DNN was not trained very well, and propose to perform the Langevin dynamics to bring such fringe samples toward high-density areas. We demonstrate qualitatively and quantitatively that our strategy, called Langevin cooling (L-Cool), enhances state-of-the-art methods in image translation and language translation tasks.
Published: 2022

11. FedAUXfdp: Differentially Private One-Shot Federated Distillation

Author: Haley Hoech, Roman Rischke, Karsten Müller, and Wojciech Samek
Published: 2023

12. List of contributors

Author: Kamila Abdiyeva, Narendra Ahuja, Mathias Anneken, David Auber, Meghna P. Ayyar, Romaissa Beddiar, Jenny Benois-Pineau, Jesús Bescós, Ilaria Boscolo Galazzo, Romain Bourqui, Lorenza Brusini, Nadia Burkart, Massimiliano Calabrese, Federica Cruciani, Eoin Delaney, Rachid Deriche, Marcos Escudero-Viñolo, Andrija Gajić, Damien Garreau, Giorgio Giacinto, Romain Giot, Oleksii Gorokhovatskyi, Volodymyr Gorokhovatskyi, Derek Greene, Adrien Halnaut, Alexandre Hardouin, Marco F. Huber, Gaëlle Jouis, Mark T. Keane, Eoin M. Kenny, Alejandro López-Cifuentes, Martin Lukac, Gloria Menegaz, Harold Mouchère, Mourad Oussalah, Olena Peredrii, Dragutin Petkovic, Fabien Picarougne, Georges Quénot, Gustavo Retuci Pinheiro, Konrad Rieck, Leticia Rittner, Wojciech Samek, Michele Scalas, Francesco Setti, Manjunatha Veerappa, Nataliia Vlasenko, Akka Zemmari, and Mauro Zucchelli
Published: 2023

13. Explainable deep learning: concepts, methods, and new developments

Author: Wojciech Samek
Published: 2023

14. Explaining the Decisions of Convolutional and Recurrent Neural Networks

Author: Wojciech Samek, Leila Arras, Ahmed Osman, Grégoire Montavon, and Klaus-Robert Müller
Published: 2022

15. Detecting failure modes in image reconstructions with interval neural network uncertainty

Author: Jan Macdonald, Wojciech Samek, Maximilian März, Gitta Kutyniok, Luis Oala, Cosmas Heiß, and Publica
Subjects: uncertainty quantification, Computer science, 500 Naturwissenschaften und Mathematik::510 Mathematik::510 Mathematik, Biomedical Engineering, Health Informatics, Interval (mathematics), Iterative reconstruction, Machine learning, computer.software_genre, failure modes, Image Processing, Computer-Assisted, Humans, Radiology, Nuclear Medicine and imaging, Uncertainty quantification, Artificial neural network, business.industry, Deep learning, Uncertainty, deep learning, General Medicine, Modular design, Inverse problem, Computer Graphics and Computer-Aided Design, Computer Science Applications, Image reconstruction, Original Article, Surgery, Neural Networks, Computer, Computer Vision and Pattern Recognition, Artificial intelligence, Noise (video), Tomography, X-Ray Computed, Iímage reconstruction, business, computer
Abstract: Purpose The quantitative detection of failure modes is important for making deep neural networks reliable and usable at scale. We consider three examples for common failure modes in image reconstruction and demonstrate the potential of uncertainty quantification as a fine-grained alarm system. Methods We propose a deterministic, modular and lightweight approach called Interval Neural Network (INN) that produces fast and easy to interpret uncertainty scores for deep neural networks. Importantly, INNs can be constructed post hoc for already trained prediction networks. We compare it against state-of-the-art baseline methods (MCDrop, ProbOut). Results We demonstrate on controlled, synthetic inverse problems the capacity of INNs to capture uncertainty due to noise as well as directional error information. On a real-world inverse problem with human CT scans, we can show that INNs produce uncertainty scores which improve the detection of all considered failure modes compared to the baseline methods. Conclusion Interval Neural Networks offer a promising tool to expose weaknesses of deep image reconstruction models and ultimately make them more reliable. The fact that they can be applied post hoc to equip already trained deep neural network models with uncertainty scores makes them particularly interesting for deployment.
Published: 2021

16. Towards the interpretability of deep learning models for multi-modal neuroimaging: Finding structural changes of the ageing brain

Author: Simon M. Hofmann, Frauke Beyer, Sebastian Lapuschkin, Ole Goltermann, Markus Loeffler, Klaus-Robert Müller, Arno Villringer, Wojciech Samek, A. Veronica Witte, and Publica
Subjects: Adult, Aging, Cognitive Neuroscience, Population, Neuroimaging, Biology, Convolutional neural network, Structural mri, medicine, Humans, education, Cardiovascular risk factors, Interpretability, education.field_of_study, medicine.diagnostic_test, business.industry, Deep learning, Brain, deep learning, Magnetic resonance imaging, Explainable a.i, Magnetic Resonance Imaging, Ageing, Neurology, Frontal lobe, Brain-age, Child, Preschool, Biomarker (medicine), Artificial intelligence, business, Neuroscience
Abstract: Brain-age (BA) estimates based on deep learning are increasingly used as neuroimaging biomarker for brain health; however, the underlying neural features have remained unclear. We combined ensembles of convolutional neural networks with Layer-wise Relevance Propagation (LRP) to detect which brain features contribute to BA. Trained on magnetic resonance imaging (MRI) data of a population-based study (n=2637, 18-82 years), our models estimated age accurately based on single and multiple modalities, regionally restricted and whole-brain images (mean absolute errors 3.37-3.86 years). We find that BA estimates capture aging at both small and large-scale changes, revealing gross enlargements of ventricles and subarachnoid spaces, as well as white matter lesions, and atrophies that appear throughout the brain. Divergence from expected aging reflected cardiovascular risk factors and accelerated aging was more pronounced in the frontal lobe. Applying LRP, our study demonstrates how superior deep learning models detect brain-aging in healthy and at-risk individuals throughout adulthood.
Published: 2022

17. Explaining automated gender classification of human gait

Author: Wolfgang I. Schöllhorn, Sebastian Lapuschkin, Brian Horsak, Djordje Slijepcevic, Wojciech Samek, Matthias Zeppelzauer, Fabian Horst, Anna-Maria Raberger, and Christian Breiteneder
Subjects: FOS: Computer and information sciences, medicine.medical_specialty, Computer Science - Machine Learning, Physical medicine and rehabilitation, Rehabilitation, Biophysics, medicine, Orthopedics and Sports Medicine, Psychology, Machine Learning (cs.LG)
Abstract: State-of-the-art machine learning (ML) models are highly effective in classifying gait analysis data, however, they lack in providing explanations for their predictions. This "black-box" characteristic makes it impossible to understand on which input patterns, ML models base their predictions. The present study investigates whether Explainable Artificial Intelligence methods, i.e., Layer-wise Relevance Propagation (LRP), can be useful to enhance the explainability of ML predictions in gait classification. The research question was: Which input patterns are most relevant for an automated gender classification model and do they correspond to characteristics identified in the literature? We utilized a subset of the GAITREC dataset containing five bilateral ground reaction force (GRF) recordings per person during barefoot walking of 62 healthy participants: 34 females and 28 males. Each input signal (right and left side) was min-max normalized before concatenation and fed into a multi-layer Convolutional Neural Network (CNN). The classification accuracy was obtained over a stratified ten-fold cross-validation. To identify gender-specific patterns, the input relevance scores were derived using LRP. The mean classification accuracy of the CNN with 83.3% showed a clear superiority over the zero-rule baseline of 54.8%., 3 pages, 1 figure
Published: 2022

18. History Dependent Significance Coding for Incremental Neural Network Compression

Author: Gerhard Tech, Paul Haase, Daniel Becking, Heiner Kirchhoffer, Karsten Muller, Jonathan Pfaff, Heiko Schwarz, Wojciech Samek, Detlev Marpe, and Thomas Wiegand
Published: 2022

19. A Unifying Review of Deep and Shallow Anomaly Detection

Author: Jacob R. Kauffmann, Wojciech Samek, Marius Kloft, Thomas G. Dietterich, Lukas Ruff, Klaus-Robert Müller, Grégoire Montavon, and Robert A. Vandermeulen
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Relation (database), Artificial neural network, Computer Science - Artificial Intelligence, Computer science, business.industry, Deep learning, Machine Learning (stat.ML), Data science, Field (computer science), Machine Learning (cs.LG), Variety (cybernetics), Data modeling, Artificial Intelligence (cs.AI), Statistics - Machine Learning, Anomaly detection, Artificial intelligence, Electrical and Electronic Engineering, business, Generative grammar
Abstract: Deep learning approaches to anomaly detection have recently improved the state of the art in detection performance on complex datasets such as large collections of images or text. These results have sparked a renewed interest in the anomaly detection problem and led to the introduction of a great variety of new methods. With the emergence of numerous such methods, including approaches based on generative models, one-class classification, and reconstruction, there is a growing need to bring methods of this field into a systematic and unified perspective. In this review we aim to identify the common underlying principles as well as the assumptions that are often made implicitly by various methods. In particular, we draw connections between classic 'shallow' and novel deep approaches and show how this relation might cross-fertilize or extend both directions. We further provide an empirical assessment of major existing methods that is enriched by the use of recent explainability techniques, and present specific worked-through examples together with practical advice. Finally, we outline critical open challenges and identify specific paths for future research in anomaly detection., Comment: 40 pages; accepted for publication in the Proceedings of the IEEE
Published: 2021

20. Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XL

Author: Tobias Schaeffter, Nils Strodthoff, Patrick Wagner, Wojciech Samek, and Publica
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Machine Learning (stat.ML), 030204 cardiovascular system & hematology, Machine learning, computer.software_genre, Convolutional neural network, Field (computer science), Machine Learning (cs.LG), Electrocardiography, 03 medical and health sciences, Deep Learning, 0302 clinical medicine, Resource (project management), Health Information Management, Statistics - Machine Learning, Humans, Electrical and Electronic Engineering, Interpretability, business.industry, Deep learning, Benchmarking, Computer Science Applications, Statistical classification, Neural Networks, Computer, Artificial intelligence, Transfer of learning, business, computer, Algorithms, 030217 neurology & neurosurgery, Biotechnology
Abstract: Electrocardiography is a very common, non-invasive diagnostic procedure and its interpretation is increasingly supported by automatic interpretation algorithms. The progress in the field of automatic ECG interpretation has up to now been hampered by a lack of appropriate datasets for training as well as a lack of well-defined evaluation procedures to ensure comparability of different algorithms. To alleviate these issues, we put forward first benchmarking results for the recently published, freely accessible PTB-XL dataset, covering a variety of tasks from different ECG statement prediction tasks over age and gender prediction to signal quality assessment. We find that convolutional neural networks, in particular resnet- and inception-based architectures, show the strongest performance across all tasks outperforming feature-based algorithms by a large margin. These results are complemented by deeper insights into the classification algorithm in terms of hidden stratification, model uncertainty and an exploratory interpretability analysis. We also put forward benchmarking results for the ICBEB2018 challenge ECG dataset and discuss prospects of transfer learning using classifiers pretrained on PTB-XL. With this resource, we aim to establish the PTB-XL dataset as a resource for structured benchmarking of ECG analysis algorithms and encourage other researchers in the field to join these efforts., Comment: 12 pages, 8 figures
Published: 2021

21. Robustifying models against adversarial attacks by Langevin dynamics

Author: Vignesh Srinivasan, Wojciech Samek, Csaba Rohrer, Shinichi Nakajima, Arturo Marban, and Klaus-Robert Müller
Subjects: 0209 industrial biotechnology, Computer science, business.industry, Cognitive Neuroscience, Deep learning, 02 engineering and technology, Conditional probability distribution, Machine learning, computer.software_genre, Adversarial system, Generative model, Deep Learning, 020901 industrial engineering & automation, Discriminative model, Artificial Intelligence, Robustness (computer science), Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Langevin dynamics, business, computer, Computer Security
Abstract: Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a number of defense methods were proposed, which however, have been circumvented by newer and more sophisticated attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains a challenging task. This paper proposes a novel, simple yet effective defense strategy where off-manifold adversarial samples are driven towards high density regions of the data generating distribution of the (unknown) target class by the Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. To achieve this task, we introduce a generative model of the conditional distribution of the inputs given labels that can be learned through a supervised Denoising Autoencoder (sDAE) in alignment with a discriminative classifier. Our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion—projection is distributed broadly. This prevents white box attacks from accurately aligning the input to create an adversarial sample effectively. MALADE is applicable to any existing classifier, providing robust defense as well as off-manifold sample detection. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.
Published: 2021

22. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications

Author: Grégoire Montavon, Wojciech Samek, Sebastian Lapuschkin, Klaus-Robert Müller, Christopher J. Anders, and Publica
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial neural network, Computer Science - Artificial Intelligence, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Best practice, Deep learning, Perspective (graphical), Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), Data science, Field (computer science), Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Statistics - Machine Learning, Selection (linguistics), Unsupervised learning, Neural and Evolutionary Computing (cs.NE), Artificial intelligence, Electrical and Electronic Engineering, business, Interpretability
Abstract: With the broader and highly successful usage of machine learning in industry and the sciences, there has been a growing demand for Explainable AI. Interpretability and explanation methods for gaining a better understanding about the problem solving abilities and strategies of nonlinear Machine Learning, in particular, deep neural networks, are therefore receiving increased attention. In this work we aim to (1) provide a timely overview of this active emerging field, with a focus on 'post-hoc' explanations, and explain its theoretical foundations, (2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations, (3) outline best practice aspects i.e. how to best include interpretation methods into the standard usage of machine learning and (4) demonstrate successful usage of explainable AI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of machine learning., Comment: 30 pages, 20 figures
Published: 2021

23. Adaptive Differential Filters for Fast and Communication-Efficient Federated Learning

Author: Daniel Becking, Heiner Kirchhoffer, Gerhard Tech, Paul Haase, Karsten Muller, Heiko Schwarz, and Wojciech Samek
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Distributed, Parallel, and Cluster Computing (cs.DC), Machine Learning (cs.LG)
Abstract: Federated learning (FL) scenarios inherently generate a large communication overhead by frequently transmitting neural network updates between clients and server. To minimize the communication cost, introducing sparsity in conjunction with differential updates is a commonly used technique. However, sparse model updates can slow down convergence speed or unintentionally skip certain update aspects, e.g., learned features, if error accumulation is not properly addressed. In this work, we propose a new scaling method operating at the granularity of convolutional filters which 1) compensates for highly sparse updates in FL processes, 2) adapts the local models to new data domains by enhancing some features in the filter space while diminishing others and 3) motivates extra sparsity in updates and thus achieves higher compression ratios, i.e., savings in the overall data transfer. Compared to unscaled updates and previous work, experimental results on different computer vision tasks (Pascal VOC, CIFAR10, Chest X-Ray) and neural networks (ResNets, MobileNets, VGGs) in uni-, bidirectional and partial update FL settings show that the proposed method improves the performance of the central server model while converging faster and reducing the total amount of transmitted data by up to 377 times., CVPR 2022 FedVision Workshop (CVPRW), 12 pages, 5 figures, 2 tables, supplementary material
Published: 2022

24. Benign Examples: Imperceptible Changes Can Enhance Image Translation Performance

Author: Wojciech Samek, Klaus-Robert Müller, Vignesh Srinivasan, and Shinichi Nakajima
Subjects: Artificial neural network, Computer science, business.industry, Pattern recognition, General Medicine, Translation (geometry), Manifold, Domain (software engineering), Image (mathematics), Constraint (information theory), Consistency (database systems), Image translation, Artificial intelligence, business
Abstract: Unpaired image-to-image domain translation involves the task of transferring an image in one domain to another domain without having pairs of data for supervision. Several methods have been proposed to address this task using Generative Adversarial Networks (GANs) and cycle consistency constraint enforcing the translated image to be mapped back to the original domain. This way, a Deep Neural Network (DNN) learns mapping such that the input training distribution transferred to the target domain matches the target training distribution. However, not all test images are expected to fall inside the data manifold in the input space where the DNN has learned to perform the mapping very well. Such images can have a poor mapping to the target domain. In this paper, we propose to perform Langevin dynamics, which makes a subtle change in the input space bringing them close to the data manifold, producing benign examples. The effect is significant improvement of the mapped image on the target domain. We also show that the score function estimation by denoising autoencoder (DAE), can practically be replaced with any autoencoding structure, which most image-to-image translation methods contain intrinsically due to the cycle consistency constraint. Thus, no additional training is required. We show advantages of our approach for several state-of-the-art image-to-image domain translation models. Quantitative evaluation shows that our proposed method leads to a substantial increase in the accuracy to the target label on multiple state-of-the-art image classifiers, while qualitative user study proves that our method better represents the target domain, achieving better human preference scores.
Published: 2020

25. New definitions of human lymphoid and follicular cell entities in lymphatic tissue by machine learning

Author: Patrick, Wagner, Nils, Strodthoff, Patrick, Wurzel, Arturo, Marban, Sonja, Scharf, Hendrik, Schäfer, Philipp, Seegerer, Andreas, Loth, Sylvia, Hartmann, Frederick, Klauschen, Klaus-Robert, Müller, Wojciech, Samek, and Martin-Leo, Hansmann
Subjects: Machine Learning, Lymphoid Tissue, Humans, T-Lymphocytes, Helper-Inducer, Lymphocytes, Dendritic Cells, Follicular
Abstract: Histological sections of the lymphatic system are usually the basis of static (2D) morphological investigations. Here, we performed a dynamic (4D) analysis of human reactive lymphoid tissue using confocal fluorescent laser microscopy in combination with machine learning. Based on tracks for T-cells (CD3), B-cells (CD20), follicular T-helper cells (PD1) and optical flow of follicular dendritic cells (CD35), we put forward the first quantitative analysis of movement-related and morphological parameters within human lymphoid tissue. We identified correlations of follicular dendritic cell movement and the behavior of lymphocytes in the microenvironment. In addition, we investigated the value of movement and/or morphological parameters for a precise definition of cell types (CD clusters). CD-clusters could be determined based on movement and/or morphology. Differentiating between CD3- and CD20 positive cells is most challenging and long term-movement characteristics are indispensable. We propose morphological and movement-related prototypes of cell entities applying machine learning models. Finally, we define beyond CD clusters new subgroups within lymphocyte entities based on long term movement characteristics. In conclusion, we showed that the combination of 4D imaging and machine learning is able to define characteristics of lymphocytes not visible in 2D histology.
Published: 2022

26. ECQx: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

Author: Daniel Becking, Maximilian Dreyer, Wojciech Samek, Karsten Müller, Sebastian Lapuschkin
Published: 2022
Full Text: View/download PDF

27. xxAI - Beyond Explainable Artificial Intelligence

Author: Andreas Holzinger, Randy Goebel, Ruth Fong, Taesup Moon, Klaus-Robert Müller, and Wojciech Samek
Abstract: The success of statistical machine learning from big data, especially of deep learning, has made artificial intelligence (AI) very popular. Unfortunately, especially with the most successful methods, the results are very difficult to comprehend by human experts. The application of AI in areas that impact human life (e.g., agriculture, climate, forestry, health, etc.) has therefore led to an demand for trust, which can be fostered if the methods can be interpreted and thus explained to humans. The research field of explainable artificial intelligence (XAI) provides the necessary foundations and methods. Historically, XAI has focused on the development of methods to explain the decisions and internal mechanisms of complex AI systems, with much initial research concentrating on explaining how convolutional neural networks produce image classification predictions by producing visualizations which highlight what input patterns are most influential in activating hidden units, or are most responsible for a model’s decision. In this volume, we summarize research that outlines and takes next steps towards a broader vision for explainable AI in moving beyond explaining classifiers via such methods, to include explaining other kinds of models (e.g., unsupervised and reinforcement learning models) via a diverse array of XAI techniques (e.g., question-and-answering systems, structured explanations). In addition, we also intend to move beyond simply providing model explanations to directly improving the transparency, efficiency and generalization ability of models. We hope this volume presents not only exciting research developments in explainable AI but also a guide for what next areas to focus on within this fascinating and highly relevant research field as we enter the second decade of the deep learning revolution. This volume is an outcome of the ICML 2020 workshop on “XXAI: Extending Explainable AI Beyond Deep Models and Classifiers.”
Published: 2022

28. CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations

Author: Ahmed Osman, Wojciech Samek, Leila Arras, and Publica
Subjects: Ground truth, Artificial neural network, Computer science, business.industry, Deep learning, media_common.quotation_subject, Benchmarking, Machine learning, computer.software_genre, Field (computer science), Hardware and Architecture, Signal Processing, Question answering, Benchmark (computing), Quality (business), Artificial intelligence, business, computer, Software, Information Systems, media_common
Abstract: The rise of deep learning in today’s applications entailed an increasing need in explaining the model’s decisions beyond prediction performances in order to foster trust and accountability. Recently, the field of explainable AI (XAI) has developed methods that provide such explanations for already trained neural networks. In computer vision tasks such explanations, termed heatmaps, visualize the contributions of individual pixels to the prediction. So far XAI methods along with their heatmaps were mainly validated qualitatively via human-based assessment, or evaluated through auxiliary proxy tasks such as pixel perturbation, weak object localization or randomization tests. Due to the lack of an objective and commonly accepted quality measure for heatmaps, it was debatable which XAI method performs best and whether explanations can be trusted at all. In the present work, we tackle the problem by proposing a ground truth based evaluation framework for XAI methods based on the CLEVR visual question answering task. Our framework provides a (1) selective, (2) controlled and (3) realistic testbed for the evaluation of neural network explanations. We compare ten different explanation methods, resulting in new insights about the quality and properties of XAI methods, sometimes contradicting with conclusions from previous comparative studies. The CLEVR-XAI dataset and the benchmarking code can be found at https://github.com/ahmedmagdiosman/clevr-xai .
Published: 2022

29. Explain to Not Forget: Defending Against Catastrophic Forgetting with XAI

Author: Sami Ede, Serop Baghdadlian, Leander Weber, An Nguyen, Dario Zanca, Wojciech Samek, and Sebastian Lapuschkin
Published: 2022

30. Active multitask learning with uncertainty-weighted loss for coronary calcium scoring

Author: Bernhard Föllmer, Federico Biavati, Christian Wald, Sebastian Stober, Jackie Ma, Marc Dewey, and Wojciech Samek
Subjects: Humans, Calcium, General Medicine
Abstract: The coronary artery calcification (CAC) score is an independent marker for the risk of cardiovascular events. Automatic methods for quantifying CAC could reduce workload and assist radiologists in clinical decision-making. However, large annotated datasets are needed for training to achieve very good model performance, which is an expensive process and requires expert knowledge. The number of training data required can be reduced in an active learning scenario, which requires only the most informative samples to be labeled. Multitask learning techniques can improve model performance by joint learning of multiple related tasks and extraction of shared informative features.We propose an uncertainty-weighted multitask learning model for coronary calcium scoring in electrocardiogram-gated (ECG-gated), noncontrast-enhanced cardiac calcium scoring CT. The model was trained to solve the two tasks of coronary artery region segmentation (weak labels) and coronary artery calcification segmentation (strong labels) simultaneously in an active learning scenario to improve model performance and reduce the number of samples needed for training. We compared our model with a single-task U-Net and a sequential-task model as well as other state-of-the-art methods. The model was evaluated on 1275 individual patients in three different datasets (DISCHARGE, CADMAN, orCaScore), and the relationship between model performance and various influencing factors (image noise, metal artifacts, motion artifacts, image quality) was analyzed.Joint learning of multiclass coronary artery region segmentation and binary coronary calcium segmentation improved calcium scoring performance. Since shared information can be learned from both tasks for complementary purposes, the model reached optimal performance with only 12% of the training data and one-third of the labeling time in an active learning scenario. We identified image noise as one of the most important factors influencing model performance along with anatomical abnormalities and metal artifacts.Our multitask learning approach with uncertainty-weighted loss improves calcium scoring performance by joint learning of shared features and reduces labeling costs when trained in an active learning scenario.
Published: 2021

31. Estimation of distortion sensitivity for visual quality prediction using a convolutional neural network

Author: Wojciech Samek, Klaus-Robert Müller, Thomas Wiegand, Sebastian Bosse, and Sören Becker
Subjects: Computational complexity theory, Image quality, Computer science, business.industry, Applied Mathematics, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Convolutional neural network, Weighting, Computational Theory and Mathematics, Artificial Intelligence, Distortion, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Sensitivity (control systems), Artificial intelligence, Electrical and Electronic Engineering, Statistics, Probability and Uncertainty, business
Abstract: The PSNR and MSE are the computationally simplest and thus most widely used measures for image quality, although they correlate only poorly with perceived visual quality. More accurate quality models that rely on processing on both the reference and distorted image are potentially difficult to integrate in time-critical communication systems where computational complexity is disadvantageous. This paper derives the concept of distortion sensitivity as a property of the reference image that compensates for a given computational quality model a potential lack of perceptual relevance. This compensation method is applied to the PSNR and leads to a local weighting scheme for the MSE. Local weights are estimated by a deep convolutional neural network and used to improve the PSNR in a computationally graceful distribution of computationally complex processing to the reference image only. The performance of the proposed estimation approach is evaluated on LIVE, TID2013 and CSIQ databases and shows comparable or superior performance compared to benchmark image quality measures.
Published: 2019

32. Unmasking Clever Hans predictors and assessing what machines really learn

Author: Klaus-Robert Müller, Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, Stephan Wäldchen, and Publica
Subjects: 0301 basic medicine, FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer science, Computer Vision and Pattern Recognition (cs.CV), Science, Computer Science - Computer Vision and Pattern Recognition, General Physics and Astronomy, Machine Learning (stat.ML), 02 engineering and technology, Article, General Biochemistry, Genetics and Molecular Biology, Machine Learning (cs.LG), 03 medical and health sciences, Statistics - Machine Learning, Human–computer interaction, Neural and Evolutionary Computing (cs.NE), lcsh:Science, Multidisciplinary, Computer Science - Neural and Evolutionary Computing, General Chemistry, 021001 nanoscience & nanotechnology, 030104 developmental biology, Artificial Intelligence (cs.AI), Work (electrical), lcsh:Q, 0210 nano-technology
Abstract: Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner., Comment: Accepted for publication in Nature Communications
Published: 2019

33. Pruning by explaining: A novel criterion for deep neural network pruning

Author: Alexander Binder, Wojciech Samek, Sebastian Lapuschkin, Seul-Ki Yeom, Simon Wiedemann, Klaus-Robert Müller, Philipp Seegerer, and Publica
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Computation, Machine Learning (stat.ML), Machine learning, computer.software_genre, Convolutional neural network, Machine Learning (cs.LG), Statistics - Machine Learning, Artificial Intelligence, Relevance (information retrieval), Pruning (decision trees), Neural and Evolutionary Computing (cs.NE), Interpretability, Hyperparameter, Artificial neural network, business.industry, Computer Science - Neural and Evolutionary Computing, Range (mathematics), Signal Processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Software
Abstract: The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the weights of various layers while at the same time aiming to not sacrifice performance. In this paper, we propose a novel criterion for CNN pruning inspired by neural network interpretability: The most relevant units, i.e. weights or filters, are automatically found using their relevance scores obtained from concepts of explainable AI (XAI). By exploring this idea, we connect the lines of interpretability and model compression research. We show that our proposed method can efficiently prune CNN models in transfer-learning setups in which networks pre-trained on large corpora are adapted to specialized tasks. The method is evaluated on a broad range of computer vision datasets. Notably, our novel criterion is not only competitive or better compared to state-of-the-art pruning criteria when successive retraining is performed, but clearly outperforms these previous criteria in the resource-constrained application scenario in which the data of the task to be transferred to is very scarce and one chooses to refrain from fine-tuning. Our method is able to compress the model iteratively while maintaining or even improving accuracy. At the same time, it has a computational cost in the order of gradient computation and is comparatively simple to apply without the need for tuning hyperparameters for pruning., Comment: 25 pages + 5 supplementary pages, 13 figures, 6 tables
Published: 2021

34. Interval Neural Networks as Instability Detectors for Image Reconstructions

Author: Luis Oala, Wojciech Samek, Maximilian März, and Jan Macdonald
Subjects: Artificial neural network, business.industry, Computer science, Deep learning, Detector, Iterative reconstruction, Interval (mathematics), Machine learning, computer.software_genre, Image (mathematics), Artificial intelligence, Noise (video), Uncertainty quantification, business, computer
Abstract: This work investigates the detection of instabilities that may occur when utilizing deep learning models for image reconstruction tasks. Although neural networks often empirically outperform traditional reconstruction methods, their usage for sensitive medical applications remains controversial. Indeed, in a recent series of works, it has been demonstrated that deep learning approaches are susceptible to various types of instabilities, caused for instance by adversarial noise or out-ofdistribution features. It is argued that this phenomenon can be observed regardless of the underlying architecture and that there is no easy remedy. Based on this insight, the present work demonstrates, how uncertainty quantification methods can be employed as instability detectors. In particular, it is shown that the recently proposed Interval Neural Networks are highly effective in revealing instabilities of reconstructions. Such an ability is crucial to ensure a safe use of deep learning-based methods for medical image reconstruction.
Published: 2021

35. Causes of Outcome Learning: A causal inference-inspired machine learning approach to disentangling common combinations of potential causes of a health outcome

Author: Andreas Rieckmann, Piotr Dworzynski, Leila Arras, Sebastian Lapuschkin, Wojciech Samek, Onyebuchi Aniweta Arah, Naja Hulvej Rod, and Claus Thorn Ekstrøm
Subjects: Outcome Assessment, Epidemiology, Computer science, Population, sufficient component cause model, complex epidemiology, Machine learning, computer.software_genre, Outcome (game theory), Machine Learning, 2.5 Research design and methodologies (aetiology), Outcome Assessment, Health Care, Methods, supervised clustering, Humans, Relevance (information retrieval), Aetiology, education, Additive model, Cluster analysis, Causal model, education.field_of_study, Artificial neural network, business.industry, Prevention, Statistics, General Medicine, interactions, neural networks, Causality, Health Care, precision public health, Good Health and Well Being, inductive-deductive, explanations, Causal inference, Public Health and Health Services, inductive–deductive, Public Health, Generic health relevance, Artificial intelligence, Causes of effects, business, computer
Abstract: Nearly all diseases can be caused by different combinations of exposures. Yet, most epidemiological studies focus on the causal effect of a single exposure on an outcome. We present the Causes of Outcome Learning (CoOL) approach, which seeks to identify combinations of exposures (which can be interpreted causally if all causal assumptions are met) that could be responsible for an increased risk of a health outcome in population sub-groups. The approach allows for exposures acting alone and in synergy with others. It involves (a) a pre-computational phase that proposes a causal model; (b) a computational phase with three steps, namely (i) analytically fitting a non-negative additive model, (ii) decomposing risk contributions, and (iii) clustering individuals based on the risk contributions into sub-groups based on the predefined causal model; and (c) a post-computational phase on hypothesis development and validation by triangulation on new data before eventually updating the causal model. The computational phase uses a tailored neural network for the non-negative additive model and Layer-wise Relevance Propagation for the risk decomposition through this model. We demonstrate the approach on simulated and real-life data using the R package ‘CoOL’. The presentation is focused on binary exposures and outcomes but can be extended to other measurement types. This approach encourages and enables epidemiologists to identify combinations of pre-outcome exposures as potential causes of the health outcome of interest. Expanding our ability to discover complex causes could eventually result in more effective, targeted, and informed interventions prioritized for their public health impact.
Published: 2020

36. Dependent Scalar Quantization For Neural Network Compression

Author: Thomas Wiegand, Heiner Kirchhoffer, Paul Haase, Simon Wiedemann, Wojciech Samek, Heiko Schwarz, Arturo Marban, Talmaj Marinc, Detlev Marpe, and Karsten Muller
Subjects: Artificial neural network, Computer science, Quantization (signal processing), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, Iterative reconstruction, Reduction (complexity), Compression (functional analysis), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Network performance, Entropy encoding, Algorithm
Abstract: Recent approaches to compression of deep neural networks, like the emerging standard on compression of neural networks for multimedia content description and analysis (MPEG-7 part 17), apply scalar quantization and entropy coding of the quantization indexes. In this paper we present an advanced method for quantization of neural network parameters, which applies dependent scalar quantization (DQ) or trellis-coded quantization (TCQ), and an improved context modeling for the entropy coding of the quantization indexes. We show that the proposed method achieves 5.778% bitrate reduction and virtually no loss (0.37%) of network performance in average, compared to the baseline methods of the second test model (NCTM) of MPEG-7 part 17 for relevant working points.
Published: 2020

37. Deepcabac: Plug & Play Compression of Neural Network Weights and Weight Updates

Author: Detlev Marpe, David Neumann, Heiner Kirchhoffer, Wojciech Samek, Heiko Schwarz, Simon Wiedemann, Felix Sattler, Karsten Muller, and Thomas Wiegand
Subjects: Plug play, Artificial neural network, Computer science, Compression (functional analysis), Server, Distributed computing, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 020201 artificial intelligence & image processing, 02 engineering and technology, Differential (infinitesimal), Data compression, Variety (cybernetics)
Abstract: An increasing number of distributed machine learning applications require efficient communication of neural network parameterizations. DeepCABAC, an algorithm in the current working draft of the emerging MPEG-7 part 17 standard for compression of neural networks for multimedia content description and analysis, has demonstrated high compression gains for a variety of neural network models. In this paper we propose a method for employing DeepCABAC in a Federated Learning scenario for the exchange of intermediate differential parameterizations. Furthermore, we discuss the efficiency of DeepCABAC when compressing trained neural networks. Our experiments on large neural networks show that in both scenarios, DeepCABAC achieves competitive compression rates, without degrading the network accuracy.
Published: 2020

38. Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints

Author: Wojciech Samek, Klaus-Robert Müller, Felix Sattler, and Publica
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, education.field_of_study, Theoretical computer science, Computer Networks and Communications, Population, Multi-task learning, Machine Learning (stat.ML), 02 engineering and technology, Computer Science Applications, Data modeling, Machine Learning (cs.LG), Recurrent neural network, Computer Science - Distributed, Parallel, and Cluster Computing, Statistics - Machine Learning, Artificial Intelligence, Server, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Distributed, Parallel, and Cluster Computing (cs.DC), education, Cluster analysis, Software
Abstract: Federated Learning (FL) is currently the most widely adopted framework for collaborative training of (deep) machine learning models under privacy constraints. Albeit it's popularity, it has been observed that Federated Learning yields suboptimal results if the local clients' data distributions diverge. To address this issue, we present Clustered Federated Learning (CFL), a novel Federated Multi-Task Learning (FMTL) framework, which exploits geometric properties of the FL loss surface, to group the client population into clusters with jointly trainable data distributions. In contrast to existing FMTL approaches, CFL does not require any modifications to the FL communication protocol to be made, is applicable to general non-convex objectives (in particular deep neural networks) and comes with strong mathematical guarantees on the clustering quality. CFL is flexible enough to handle client populations that vary over time and can be implemented in a privacy preserving way. As clustering is only performed after Federated Learning has converged to a stationary point, CFL can be viewed as a post-processing method that will always achieve greater or equal performance than conventional FL by allowing clients to arrive at more specialized models. We verify our theoretical analysis in experiments with deep convolutional and recurrent neural networks on commonly used Federated Learning datasets.
Published: 2020

39. Learning Sparse & Ternary Neural Networks with Entropy-Constrained Trained Ternarization (EC2T)

Author: Daniel Becking, Arturo Marban, Simon Wiedemann, and Wojciech Samek
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Contextual image classification, Artificial neural network, Computer science, business.industry, Information Theory (cs.IT), Computer Science - Information Theory, Computation, Machine Learning (stat.ML), Machine Learning (cs.LG), Statistics - Machine Learning, Deep neural networks, Entropy (information theory), Artificial intelligence, Ternary operation, business, Mobile device, Algorithm, Scaling
Abstract: Deep neural networks (DNN) have shown remarkable success in a variety of machine learning applications. The capacity of these models (i.e., number of parameters), endows them with expressive power and allows them to reach the desired performance. In recent years, there is an increasing interest in deploying DNNs to resource-constrained devices (i.e., mobile devices) with limited energy, memory, and computational budget. To address this problem, we propose Entropy-Constrained Trained Ternarization (EC2T), a general framework to create sparse and ternary neural networks which are efficient in terms of storage (e.g., at most two binary-masks and two full-precision values are required to save a weight matrix) and computation (e.g., MAC operations are reduced to a few accumulations plus two multiplications). This approach consists of two steps. First, a super-network is created by scaling the dimensions of a pre-trained model (i.e., its width and depth). Subsequently, this super-network is simultaneously pruned (using an entropy constraint) and quantized (that is, ternary values are assigned layer-wise) in a training process, resulting in a sparse and ternary network representation. We validate the proposed approach in CIFAR-10, CIFAR-100, and ImageNet datasets, showing its effectiveness in image classification tasks., Proceedings of the CVPR'20 Joint Workshop on Efficient Deep Learning in Computer Vision. Code is available at https://github.com/d-becking/efficientCNNs
Published: 2020

40. PTB-XL, a large publicly available electrocardiography dataset

Author: Tobias Schaeffter, Wojciech Samek, Ralf-Dieter Bousseljot, Patrick Wagner, Dieter Kreiseler, Nils Strodthoff, Fatima I Lunze, and Publica
Subjects: Statistics and Probability, Data Descriptor, Demographics, Computer science, MEDLINE, Electrocardiography - EKG, 030204 cardiovascular system & hematology, Library and Information Sciences, Data publication and archiving, Education, Machine Learning, 03 medical and health sciences, Electrocardiography, 0302 clinical medicine, Resource (project management), Humans, Fraction (mathematics), lcsh:Science, Metadata, Information retrieval, Benchmarking, Computer Science Applications, Cardiovascular diseases, ComputingMethodologies_PATTERNRECOGNITION, Key (cryptography), lcsh:Q, Statistics, Probability and Uncertainty, 030217 neurology & neurosurgery, Algorithms, Information Systems
Abstract: Electrocardiography (ECG) is a key non-invasive diagnostic tool for cardiovascular diseases which is increasingly supported by algorithms based on machine learning. Major obstacles for the development of automatic ECG interpretation algorithms are both the lack of public datasets and well-defined benchmarking procedures to allow comparison s of different algorithms. To address these issues, we put forward PTB-XL, the to-date largest freely accessible clinical 12-lead ECG-waveform dataset comprising 21837 records from 18885 patients of 10 seconds length. The ECG-waveform data was annotated by up to two cardiologists as a multi-label dataset, where diagnostic labels were further aggregated into super and subclasses. The dataset covers a broad range of diagnostic classes including, in particular, a large fraction of healthy records. The combination with additional metadata on demographics, additional diagnostic statements, diagnosis likelihoods, manually annotated signal properties as well as suggested folds for splitting training and test sets turns the dataset into a rich resource for the development and the evaluation of automatic ECG interpretation algorithms., Measurement(s)electrocardiography • cardiovascular systemTechnology Type(s)12 lead electrocardiographyFactor Type(s)presence of co-occurring diseasesSample Characteristic - OrganismHomo sapiens Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12098055
Published: 2020
Full Text: View/download PDF

41. On the Byzantine Robustness of Clustered Federated Learning

Author: Wojciech Samek, Thomas Wiegand, Felix Sattler, and Klaus-Robert Müller
Subjects: education.field_of_study, Computer science, business.industry, Population, Multi-task learning, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Federated learning, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, education, Cluster analysis, business, computer, Byzantine architecture
Abstract: Federated Learning (FL) is currently the most widely adopted framework for collaborative training of (deep) machine learning models under privacy constraints. Albeit it’s popularity, it has been observed that Federated Learning yields suboptimal results if the local clients’ data distributions diverge. The recently proposed Clustered Federated Learning Framework addresses this issue, by separating the client population into different groups based on the pairwise cosine similarities between their parameter updates. In this work we investigate the application of CFL to byzantine settings, where a subset of clients behaves unpredictably or tries to disturb the joint training effort in an directed or undirected way. We perform experiments with deep neural networks on common Federated Learning datasets which demonstrate that CFL (without modifications) is able to reliably detect byzantine clients and remove them from training.
Published: 2020

42. Rotation Invariant Clustering of 3D Cell Nuclei Shapes

Author: Klaus-Robert Müller, Patrick Wagner, Arturo Zychlinsky, Wojciech Samek, and Jakob Paul Morath
Subjects: 0301 basic medicine, Computer science, 02 engineering and technology, 03 medical and health sciences, symbols.namesake, Imaging, Three-Dimensional, 0202 electrical engineering, electronic engineering, information engineering, Animals, Cluster Analysis, Invariant (mathematics), Cluster analysis, Image resolution, Cell Nucleus, Microscopy, Confocal, Fourier Analysis, business.industry, Dimensionality reduction, Spherical coordinate system, 020206 networking & telecommunications, Pattern recognition, Mixture model, 030104 developmental biology, Fourier transform, Principal component analysis, symbols, Artificial intelligence, business
Abstract: Cellular imaging with confocal fluorescence laser microscopy gave rise to many new insights into the cellular machinery. One interesting observation suggests that morphology of cell nucleus plays a key role for neutrophilic function, which is an essential part of the innate immune system of most mammals. Due to the increasing availability of high resolution 3D images coming from the microscope, machine learning becomes a promising tool for automatically discovering underlying hidden structures. Here, the major difficulty consists of selecting an appropriate representation for characterizing the morphology of cell nucleus. In this work we tackle this problem and propose a fully unsupervised mechanism for finding structure in high-throughput 3D image data. The key component of our approach is based on Generic Fourier Transform (GFT) for 2D images, which for 3D involves spherical coordinate transformation prior to fast Discrete Fourier Transformation. On top on GFT we apply dimensionality reduction with Principal Component Analysis, followed by generative cluster analysis with a Gaussian Mixture Model. We validate our new approach first on a synthetic 3D-MNIST dataset with random rotations, where quantitative and qualitative results confirm the applicability of the proposed pipeline for exploring shape space in a purely unsupervised manner. Then we apply our proposed technique to a new collected dataset of high resolution 3D images of neutrophile nuclei suggesting a clustering model with six significant clusters of morphological cell nuclei prototypes. We visualize differences in the cell shape clusters by providing prototypical examples of neutrophilic cell nuclei.
Published: 2020

43. FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons

Author: Thomas Wiegand, Simon Wiedemann, Pablo Wiedemann, Suhas Shivapakash, Wojciech Samek, Daniel Becking, and Friedel Gerfers
Subjects: Hardware architecture, efficient processing of DNNs, FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, business.industry, Deep learning, efficient representation, Process (computing), Electric apparatus and materials. Electric circuits. Electric networks, Perceptron, neural network compression, Machine Learning (cs.LG), DNN accelerator, Computer engineering, Application-specific integrated circuit, Hardware Architecture (cs.AR), Artificial intelligence, Quantization (image processing), business, Field-programmable gate array, Computer Science - Hardware Architecture, TK452-454.4, Throughput (business)
Abstract: With the growing demand for deploying Deep Learning models to the “edge”, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. The work’s approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named FantastIC4, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by $51\times $ in terms of throughput and $145\times $ in terms of area efficiency (GOPS/mm2).
Published: 2020
Full Text: View/download PDF

44. Resolving challenges in deep learning-based analyses of histopathological images using explanation methods

Author: Wojciech Samek, Michael Bockmayr, Alexander Binder, Miriam Hägele, Philipp Seegerer, Klaus-Robert Müller, Sebastian Lapuschkin, Frederick Klauschen, and Publica
Subjects: 0301 basic medicine, FOS: Computer and information sciences, medicine.medical_specialty, Computer science, media_common.quotation_subject, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, lcsh:Medicine, 02 engineering and technology, Machine learning, computer.software_genre, Quantitative Biology - Quantitative Methods, Article, Task (project management), 03 medical and health sciences, Deep Learning, Medical research, Neoplasms, Image Interpretation, Computer-Assisted, 0202 electrical engineering, electronic engineering, information engineering, medicine, FOS: Electrical engineering, electronic engineering, information engineering, Humans, Quality (business), lcsh:Science, Quantitative Methods (q-bio.QM), Cancer, media_common, Class (computer programming), Multidisciplinary, business.industry, Deep learning, lcsh:R, Image and Video Processing (eess.IV), Digital pathology, Electrical Engineering and Systems Science - Image and Video Processing, 030104 developmental biology, ROC Curve, Binary classification, Area Under Curve, FOS: Biological sciences, Cancer imaging, 020201 artificial intelligence & image processing, Histopathology, lcsh:Q, Neural Networks, Computer, Artificial intelligence, business, computer
Abstract: Deep learning has recently gained popularity in digital pathology due to its high prediction quality. However, the medical domain requires explanation and insight for a better understanding beyond standard quantitative performance evaluation. Recently, explanation methods have emerged, which are so far still rarely used in medicine. This work shows their application to generate heatmaps that allow to resolve common challenges encountered in deep learning-based digital histopathology analyses. These challenges comprise biases typically inherent to histopathology data. We study binary classification tasks of tumor tissue discrimination in publicly available haematoxylin and eosin slides of various tumor entities and investigate three types of biases: (1) biases which affect the entire dataset, (2) biases which are by chance correlated with class labels and (3) sampling biases. While standard analyses focus on patch-level evaluation, we advocate pixel-wise heatmaps, which offer a more precise and versatile diagnostic instrument and furthermore help to reveal biases in the data. This insight is shown to not only detect but also to be helpful to remove the effects of common hidden biases, which improves generalization within and across datasets. For example, we could see a trend of improved area under the receiver operating characteristic curve by 5% when reducing a labeling bias. Explanation techniques are thus demonstrated to be a helpful and highly relevant tool for the development and the deployment phases within the life cycle of real-world applications in digital pathology.
Published: 2020

45. Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution

Author: Alexander Binder, Sebastian Lapuschkin, Gary S. W. Goh, Wojciech Samek, and Leander Weber
Subjects: Hyperparameter, FOS: Computer and information sciences, Noise measurement, Artificial neural network, Contextual image classification, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Cognitive neuroscience of visual object recognition, Pattern recognition, Machine Learning (stat.ML), 02 engineering and technology, 01 natural sciences, Noise, Statistics - Machine Learning, 0103 physical sciences, Pattern recognition (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, 010306 general physics, business, Interpretability
Abstract: Integrated Gradients as an attribution method for deep neural network models offers simple implementability. However, it suffers from noisiness of explanations which affects the ease of interpretability. The SmoothGrad technique is proposed to solve the noisiness issue and smoothen the attribution maps of any gradient-based attribution method. In this paper, we present SmoothTaylor as a novel theoretical concept bridging Integrated Gradients and SmoothGrad, from the Taylor's theorem perspective. We apply the methods to the image classification problem, using the ILSVRC2012 ImageNet object recognition dataset, and a couple of pretrained image models to generate attribution maps. These attribution maps are empirically evaluated using quantitative measures for sensitivity and noise level. We further propose adaptive noising to optimize for the noise scale hyperparameter value. From our experiments, we find that the SmoothTaylor approach together with adaptive noising is able to generate better quality saliency maps with lesser noise and higher sensitivity to the relevant points in the input space as compared to Integrated Gradients., Comment: 8 pages, 3 figures. Accepted in 25th International Conference on Pattern Recognition, (ICPR) 2020. In Proceedings: pp. 4949-4956
Published: 2020
Full Text: View/download PDF

46. Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data

Author: Wojciech Samek, Klaus-Robert Müller, Felix Sattler, Simon Wiedemann, and Publica
Subjects: Training set, Distributed database, Computer Networks and Communications, Computer science, business.industry, Deep learning, Distributed computing, Collaborative learning, 02 engineering and technology, Computer Science Applications, Data modeling, Uncompressed video, Artificial Intelligence, Server, Golomb coding, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), 020201 artificial intelligence & image processing, Upstream (networking), Artificial intelligence, business, Software
Abstract: Federated learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning, however, comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods, however, are only of limited utility in the federated learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions, such as i.i.d. distribution of the client data, which typically cannot be found in federated learning. In this article, we propose sparse ternary compression (STC), a new compression framework that is specifically designed to meet the requirements of the federated learning environment. STC extends the existing compression technique of top- $k$ gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms federated averaging in common federated learning scenarios. These results advocate for a paradigm shift in federated optimization toward high-frequency low-bitwidth communication, in particular in the bandwidth-constrained learning environments.
Published: 2020

47. Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training

Author: Kevin Kepp, Temesgen Mehari, Simon Wiedemann, and Wojciech Samek
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial neural network, Contextual image classification, Computational complexity theory, Computer science, business.industry, Machine Learning (stat.ML), Backpropagation, Machine Learning (cs.LG), Set (abstract data type), Statistics - Machine Learning, Artificial intelligence, Dither, business, Algorithm, Energy (signal processing), Sparse matrix
Abstract: Deep Neural Networks are successful but highly computationally expensive learning systems. One of the main sources of time and energy drains is the well known backpropagation (backprop) algorithm, which roughly accounts for 2/3 of the computational complexity of training. In this work we propose a method for reducing the computational cost of backprop, which we named dithered backprop. It consists in applying a stochastic quantization scheme to intermediate results of the method. The particular quantisation scheme, called non-subtractive dither (NSD), induces sparsity which can be exploited by computing efficient sparse matrix multiplications. Experiments on popular image classification tasks show that it induces 92% sparsity on average across a wide set of models at no or negligible accuracy drop in comparison to state-of-the-art approaches, thus significantly reducing the computational complexity of the backward pass. Moreover, we show that our method is fully compatible to state-of-the-art training methods that reduce the bit-precision of training down to 8-bits, as such being able to further reduce the computational requirements. Finally we discuss and show potential benefits of applying dithered backprop in a distributed training setting, where both communication as well as compute efficiency may increase simultaneously with the number of participant nodes.
Published: 2020
Full Text: View/download PDF

48. Explanation-Guided Training for Cross-Domain Few-Shot Classification

Author: Wojciech Samek, Jiamei Sun, Yunqing Zhao, Alexander Binder, Ngai-Man Cheung, and Sebastian Lapuschkin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Source code, Generalization, Computer science, media_common.quotation_subject, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Domain (software engineering), Machine Learning (cs.LG), 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Relevance (information retrieval), 0105 earth and related environmental sciences, media_common, business.industry, Class (biology), Visualization, Pattern recognition (psychology), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Cross-domain few-shot classification task (CD-FSC) combines few-shot classification with the requirement to generalize across domains represented by datasets. This setup faces challenges originating from the limited labeled data in each class and, additionally, from the domain shift between training and test sets. In this paper, we introduce a novel training approach for existing FSC models. It leverages on the explanation scores, obtained from existing explanation methods when applied to the predictions of FSC models, computed for intermediate feature maps of the models. Firstly, we tailor the layer-wise relevance propagation (LRP) method to explain the predictions of FSC models. Secondly, we develop a model-agnostic explanation-guided training strategy that dynamically finds and emphasizes the features which are important for the predictions. Our contribution does not target a novel explanation method but lies in a novel application of explanations for the training phase. We show that explanation-guided training effectively improves the model generalization. We observe improved accuracy for three different FSC models: RelationNet, cross attention network, and a graph neural network-based formulation, on five few-shot learning datasets: miniImagenet, CUB, Cars, Places, and Plantae. The source code is available at https://github.com/SunJiamei/few-shot-lrp-guided
Published: 2020
Full Text: View/download PDF

49. Compact and Computationally Efficient Representation of Deep Neural Networks

Author: Wojciech Samek, Klaus-Robert Müller, Simon Wiedemann, and Publica
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Networks and Communications, Computer science, Entropy, Machine Learning (stat.ML), 02 engineering and technology, Machine Learning (cs.LG), Entropy (classical thermodynamics), Matrix (mathematics), Deep Learning, Statistics - Machine Learning, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Entropy (information theory), Neural and Evolutionary Computing (cs.NE), Entropy (energy dispersal), Entropy (arrow of time), Sparse matrix, Lossless compression, Artificial neural network, Entropy (statistical thermodynamics), Computer Science - Neural and Evolutionary Computing, Dot product, Computer Science Applications, 020201 artificial intelligence & image processing, Neural Networks, Computer, Algorithm, Software, Entropy (order and disorder)
Abstract: At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory complexity by lowering the entropy of the weight matrices of the neural network, e.g., by pruning and quantizing their elements. However, the quantized weight matrices are then usually represented either by a dense or sparse matrix storage format, whose associated dot product complexity is not bounded by the entropy of the matrix. This means that the associated inference complexity ultimately depends on the implicit statistical assumptions that these matrix representations make about the weight distribution, which can be in many cases suboptimal. In this paper we address this issue and present new efficient representations for matrices with low entropy statistics. These new matrix formats have the novel property that their memory and algorithmic complexity are implicitly bounded by the entropy of the matrix, consequently implying that they are guaranteed to become more efficient as the entropy of the matrix is being reduced. In our experiments we show that performing the dot product under these new matrix formats can indeed be more energy and time efficient under practically relevant assumptions. For instance, we are able to attain up to x42 compression ratios, x5 speed ups and x90 energy savings when we convert in a lossless manner the weight matrices of state-of-the-art networks such as AlexNet, VGG-16, ResNet152 and DenseNet into the new matrix formats and benchmark their respective dot product operation., 17 pages, 14 figures
Published: 2020

50. UDSMProt: Universal deep sequence models for protein classification

Author: Wojciech Samek, Patrick Wagner, Markus Wenzel, Nils Strodthoff, and Publica
Subjects: Statistics and Probability, Source code, Computer science, media_common.quotation_subject, Homology (mathematics), Machine learning, computer.software_genre, Biochemistry, Field (computer science), 03 medical and health sciences, 0302 clinical medicine, Software, Simple (abstract algebra), Amino Acid Sequence, Representation (mathematics), Databases, Protein, Molecular Biology, Peptide sequence, 030304 developmental biology, media_common, chemistry.chemical_classification, 0303 health sciences, Sequence, business.industry, A protein, Proteins, Original Papers, Computer Science Applications, Computational Mathematics, Enzyme, ComputingMethodologies_PATTERNRECOGNITION, Computational Theory and Mathematics, chemistry, Key (cryptography), Language model, Artificial intelligence, business, computer, Sequence Analysis, 030217 neurology & neurosurgery, Algorithms
Abstract: Motivation Inferring the properties of a protein from its amino acid sequence is one of the key problems in bioinformatics. Most state-of-the-art approaches for protein classification are tailored to single classification tasks and rely on handcrafted features, such as position-specific-scoring matrices from expensive database searches. We argue that this level of performance can be reached or even be surpassed by learning a task-agnostic representation once, using self-supervised language modeling, and transferring it to specific tasks by a simple fine-tuning step. Results We put forward a universal deep sequence model that is pre-trained on unlabeled protein sequences from Swiss-Prot and fine-tuned on protein classification tasks. We apply it to three prototypical tasks, namely enzyme class prediction, gene ontology prediction and remote homology and fold detection. The proposed method performs on par with state-of-the-art algorithms that were tailored to these specific tasks or, for two out of three tasks, even outperforms them. These results stress the possibility of inferring protein properties from the sequence alone and, on more general grounds, the prospects of modern natural language processing methods in omics. Moreover, we illustrate the prospects for explainable machine learning methods in this field by selected case studies. Availability and implementation Source code is available under https://github.com/nstrodt/UDSMProt. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

142 results on '"Wojciech, Samek"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources