Author: "Wojciech, Samek" / Topic: 02 engineering and technology - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wojciech, Samek"' showing total 63 results

Start Over Author "Wojciech, Samek" Topic 02 engineering and technology

63 results on '"Wojciech, Samek"'

1. Robustifying models against adversarial attacks by Langevin dynamics

Author: Vignesh Srinivasan, Wojciech Samek, Csaba Rohrer, Shinichi Nakajima, Arturo Marban, and Klaus-Robert Müller
Subjects: 0209 industrial biotechnology, Computer science, business.industry, Cognitive Neuroscience, Deep learning, 02 engineering and technology, Conditional probability distribution, Machine learning, computer.software_genre, Adversarial system, Generative model, Deep Learning, 020901 industrial engineering & automation, Discriminative model, Artificial Intelligence, Robustness (computer science), Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Langevin dynamics, business, computer, Computer Security
Abstract: Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a number of defense methods were proposed, which however, have been circumvented by newer and more sophisticated attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains a challenging task. This paper proposes a novel, simple yet effective defense strategy where off-manifold adversarial samples are driven towards high density regions of the data generating distribution of the (unknown) target class by the Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. To achieve this task, we introduce a generative model of the conditional distribution of the inputs given labels that can be learned through a supervised Denoising Autoencoder (sDAE) in alignment with a discriminative classifier. Our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion—projection is distributed broadly. This prevents white box attacks from accurately aligning the input to create an adversarial sample effectively. MALADE is applicable to any existing classifier, providing robust defense as well as off-manifold sample detection. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.
Published: 2021

2. Estimation of distortion sensitivity for visual quality prediction using a convolutional neural network

Author: Wojciech Samek, Klaus-Robert Müller, Thomas Wiegand, Sebastian Bosse, and Sören Becker
Subjects: Computational complexity theory, Image quality, Computer science, business.industry, Applied Mathematics, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Convolutional neural network, Weighting, Computational Theory and Mathematics, Artificial Intelligence, Distortion, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Sensitivity (control systems), Artificial intelligence, Electrical and Electronic Engineering, Statistics, Probability and Uncertainty, business
Abstract: The PSNR and MSE are the computationally simplest and thus most widely used measures for image quality, although they correlate only poorly with perceived visual quality. More accurate quality models that rely on processing on both the reference and distorted image are potentially difficult to integrate in time-critical communication systems where computational complexity is disadvantageous. This paper derives the concept of distortion sensitivity as a property of the reference image that compensates for a given computational quality model a potential lack of perceptual relevance. This compensation method is applied to the PSNR and leads to a local weighting scheme for the MSE. Local weights are estimated by a deep convolutional neural network and used to improve the PSNR in a computationally graceful distribution of computationally complex processing to the reference image only. The performance of the proposed estimation approach is evaluated on LIVE, TID2013 and CSIQ databases and shows comparable or superior performance compared to benchmark image quality measures.
Published: 2019

3. Unmasking Clever Hans predictors and assessing what machines really learn

Author: Klaus-Robert Müller, Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, Stephan Wäldchen, and Publica
Subjects: 0301 basic medicine, FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer science, Computer Vision and Pattern Recognition (cs.CV), Science, Computer Science - Computer Vision and Pattern Recognition, General Physics and Astronomy, Machine Learning (stat.ML), 02 engineering and technology, Article, General Biochemistry, Genetics and Molecular Biology, Machine Learning (cs.LG), 03 medical and health sciences, Statistics - Machine Learning, Human–computer interaction, Neural and Evolutionary Computing (cs.NE), lcsh:Science, Multidisciplinary, Computer Science - Neural and Evolutionary Computing, General Chemistry, 021001 nanoscience & nanotechnology, 030104 developmental biology, Artificial Intelligence (cs.AI), Work (electrical), lcsh:Q, 0210 nano-technology
Abstract: Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner., Comment: Accepted for publication in Nature Communications
Published: 2019

4. Dependent Scalar Quantization For Neural Network Compression

Author: Thomas Wiegand, Heiner Kirchhoffer, Paul Haase, Simon Wiedemann, Wojciech Samek, Heiko Schwarz, Arturo Marban, Talmaj Marinc, Detlev Marpe, and Karsten Muller
Subjects: Artificial neural network, Computer science, Quantization (signal processing), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, Iterative reconstruction, Reduction (complexity), Compression (functional analysis), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Network performance, Entropy encoding, Algorithm
Abstract: Recent approaches to compression of deep neural networks, like the emerging standard on compression of neural networks for multimedia content description and analysis (MPEG-7 part 17), apply scalar quantization and entropy coding of the quantization indexes. In this paper we present an advanced method for quantization of neural network parameters, which applies dependent scalar quantization (DQ) or trellis-coded quantization (TCQ), and an improved context modeling for the entropy coding of the quantization indexes. We show that the proposed method achieves 5.778% bitrate reduction and virtually no loss (0.37%) of network performance in average, compared to the baseline methods of the second test model (NCTM) of MPEG-7 part 17 for relevant working points.
Published: 2020

5. Deepcabac: Plug & Play Compression of Neural Network Weights and Weight Updates

Author: Detlev Marpe, David Neumann, Heiner Kirchhoffer, Wojciech Samek, Heiko Schwarz, Simon Wiedemann, Felix Sattler, Karsten Muller, and Thomas Wiegand
Subjects: Plug play, Artificial neural network, Computer science, Compression (functional analysis), Server, Distributed computing, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 020201 artificial intelligence & image processing, 02 engineering and technology, Differential (infinitesimal), Data compression, Variety (cybernetics)
Abstract: An increasing number of distributed machine learning applications require efficient communication of neural network parameterizations. DeepCABAC, an algorithm in the current working draft of the emerging MPEG-7 part 17 standard for compression of neural networks for multimedia content description and analysis, has demonstrated high compression gains for a variety of neural network models. In this paper we propose a method for employing DeepCABAC in a Federated Learning scenario for the exchange of intermediate differential parameterizations. Furthermore, we discuss the efficiency of DeepCABAC when compressing trained neural networks. Our experiments on large neural networks show that in both scenarios, DeepCABAC achieves competitive compression rates, without degrading the network accuracy.
Published: 2020

6. Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints

Author: Wojciech Samek, Klaus-Robert Müller, Felix Sattler, and Publica
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, education.field_of_study, Theoretical computer science, Computer Networks and Communications, Population, Multi-task learning, Machine Learning (stat.ML), 02 engineering and technology, Computer Science Applications, Data modeling, Machine Learning (cs.LG), Recurrent neural network, Computer Science - Distributed, Parallel, and Cluster Computing, Statistics - Machine Learning, Artificial Intelligence, Server, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Distributed, Parallel, and Cluster Computing (cs.DC), education, Cluster analysis, Software
Abstract: Federated Learning (FL) is currently the most widely adopted framework for collaborative training of (deep) machine learning models under privacy constraints. Albeit it's popularity, it has been observed that Federated Learning yields suboptimal results if the local clients' data distributions diverge. To address this issue, we present Clustered Federated Learning (CFL), a novel Federated Multi-Task Learning (FMTL) framework, which exploits geometric properties of the FL loss surface, to group the client population into clusters with jointly trainable data distributions. In contrast to existing FMTL approaches, CFL does not require any modifications to the FL communication protocol to be made, is applicable to general non-convex objectives (in particular deep neural networks) and comes with strong mathematical guarantees on the clustering quality. CFL is flexible enough to handle client populations that vary over time and can be implemented in a privacy preserving way. As clustering is only performed after Federated Learning has converged to a stationary point, CFL can be viewed as a post-processing method that will always achieve greater or equal performance than conventional FL by allowing clients to arrive at more specialized models. We verify our theoretical analysis in experiments with deep convolutional and recurrent neural networks on commonly used Federated Learning datasets.
Published: 2020

7. On the Byzantine Robustness of Clustered Federated Learning

Author: Wojciech Samek, Thomas Wiegand, Felix Sattler, and Klaus-Robert Müller
Subjects: education.field_of_study, Computer science, business.industry, Population, Multi-task learning, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Federated learning, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, education, Cluster analysis, business, computer, Byzantine architecture
Abstract: Federated Learning (FL) is currently the most widely adopted framework for collaborative training of (deep) machine learning models under privacy constraints. Albeit it’s popularity, it has been observed that Federated Learning yields suboptimal results if the local clients’ data distributions diverge. The recently proposed Clustered Federated Learning Framework addresses this issue, by separating the client population into different groups based on the pairwise cosine similarities between their parameter updates. In this work we investigate the application of CFL to byzantine settings, where a subset of clients behaves unpredictably or tries to disturb the joint training effort in an directed or undirected way. We perform experiments with deep neural networks on common Federated Learning datasets which demonstrate that CFL (without modifications) is able to reliably detect byzantine clients and remove them from training.
Published: 2020

8. Rotation Invariant Clustering of 3D Cell Nuclei Shapes

Author: Klaus-Robert Müller, Patrick Wagner, Arturo Zychlinsky, Wojciech Samek, and Jakob Paul Morath
Subjects: 0301 basic medicine, Computer science, 02 engineering and technology, 03 medical and health sciences, symbols.namesake, Imaging, Three-Dimensional, 0202 electrical engineering, electronic engineering, information engineering, Animals, Cluster Analysis, Invariant (mathematics), Cluster analysis, Image resolution, Cell Nucleus, Microscopy, Confocal, Fourier Analysis, business.industry, Dimensionality reduction, Spherical coordinate system, 020206 networking & telecommunications, Pattern recognition, Mixture model, 030104 developmental biology, Fourier transform, Principal component analysis, symbols, Artificial intelligence, business
Abstract: Cellular imaging with confocal fluorescence laser microscopy gave rise to many new insights into the cellular machinery. One interesting observation suggests that morphology of cell nucleus plays a key role for neutrophilic function, which is an essential part of the innate immune system of most mammals. Due to the increasing availability of high resolution 3D images coming from the microscope, machine learning becomes a promising tool for automatically discovering underlying hidden structures. Here, the major difficulty consists of selecting an appropriate representation for characterizing the morphology of cell nucleus. In this work we tackle this problem and propose a fully unsupervised mechanism for finding structure in high-throughput 3D image data. The key component of our approach is based on Generic Fourier Transform (GFT) for 2D images, which for 3D involves spherical coordinate transformation prior to fast Discrete Fourier Transformation. On top on GFT we apply dimensionality reduction with Principal Component Analysis, followed by generative cluster analysis with a Gaussian Mixture Model. We validate our new approach first on a synthetic 3D-MNIST dataset with random rotations, where quantitative and qualitative results confirm the applicability of the proposed pipeline for exploring shape space in a purely unsupervised manner. Then we apply our proposed technique to a new collected dataset of high resolution 3D images of neutrophile nuclei suggesting a clustering model with six significant clusters of morphological cell nuclei prototypes. We visualize differences in the cell shape clusters by providing prototypical examples of neutrophilic cell nuclei.
Published: 2020

9. Resolving challenges in deep learning-based analyses of histopathological images using explanation methods

Author: Wojciech Samek, Michael Bockmayr, Alexander Binder, Miriam Hägele, Philipp Seegerer, Klaus-Robert Müller, Sebastian Lapuschkin, Frederick Klauschen, and Publica
Subjects: 0301 basic medicine, FOS: Computer and information sciences, medicine.medical_specialty, Computer science, media_common.quotation_subject, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, lcsh:Medicine, 02 engineering and technology, Machine learning, computer.software_genre, Quantitative Biology - Quantitative Methods, Article, Task (project management), 03 medical and health sciences, Deep Learning, Medical research, Neoplasms, Image Interpretation, Computer-Assisted, 0202 electrical engineering, electronic engineering, information engineering, medicine, FOS: Electrical engineering, electronic engineering, information engineering, Humans, Quality (business), lcsh:Science, Quantitative Methods (q-bio.QM), Cancer, media_common, Class (computer programming), Multidisciplinary, business.industry, Deep learning, lcsh:R, Image and Video Processing (eess.IV), Digital pathology, Electrical Engineering and Systems Science - Image and Video Processing, 030104 developmental biology, ROC Curve, Binary classification, Area Under Curve, FOS: Biological sciences, Cancer imaging, 020201 artificial intelligence & image processing, Histopathology, lcsh:Q, Neural Networks, Computer, Artificial intelligence, business, computer
Abstract: Deep learning has recently gained popularity in digital pathology due to its high prediction quality. However, the medical domain requires explanation and insight for a better understanding beyond standard quantitative performance evaluation. Recently, explanation methods have emerged, which are so far still rarely used in medicine. This work shows their application to generate heatmaps that allow to resolve common challenges encountered in deep learning-based digital histopathology analyses. These challenges comprise biases typically inherent to histopathology data. We study binary classification tasks of tumor tissue discrimination in publicly available haematoxylin and eosin slides of various tumor entities and investigate three types of biases: (1) biases which affect the entire dataset, (2) biases which are by chance correlated with class labels and (3) sampling biases. While standard analyses focus on patch-level evaluation, we advocate pixel-wise heatmaps, which offer a more precise and versatile diagnostic instrument and furthermore help to reveal biases in the data. This insight is shown to not only detect but also to be helpful to remove the effects of common hidden biases, which improves generalization within and across datasets. For example, we could see a trend of improved area under the receiver operating characteristic curve by 5% when reducing a labeling bias. Explanation techniques are thus demonstrated to be a helpful and highly relevant tool for the development and the deployment phases within the life cycle of real-world applications in digital pathology.
Published: 2020

10. Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution

Author: Alexander Binder, Sebastian Lapuschkin, Gary S. W. Goh, Wojciech Samek, and Leander Weber
Subjects: Hyperparameter, FOS: Computer and information sciences, Noise measurement, Artificial neural network, Contextual image classification, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Cognitive neuroscience of visual object recognition, Pattern recognition, Machine Learning (stat.ML), 02 engineering and technology, 01 natural sciences, Noise, Statistics - Machine Learning, 0103 physical sciences, Pattern recognition (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, 010306 general physics, business, Interpretability
Abstract: Integrated Gradients as an attribution method for deep neural network models offers simple implementability. However, it suffers from noisiness of explanations which affects the ease of interpretability. The SmoothGrad technique is proposed to solve the noisiness issue and smoothen the attribution maps of any gradient-based attribution method. In this paper, we present SmoothTaylor as a novel theoretical concept bridging Integrated Gradients and SmoothGrad, from the Taylor's theorem perspective. We apply the methods to the image classification problem, using the ILSVRC2012 ImageNet object recognition dataset, and a couple of pretrained image models to generate attribution maps. These attribution maps are empirically evaluated using quantitative measures for sensitivity and noise level. We further propose adaptive noising to optimize for the noise scale hyperparameter value. From our experiments, we find that the SmoothTaylor approach together with adaptive noising is able to generate better quality saliency maps with lesser noise and higher sensitivity to the relevant points in the input space as compared to Integrated Gradients., Comment: 8 pages, 3 figures. Accepted in 25th International Conference on Pattern Recognition, (ICPR) 2020. In Proceedings: pp. 4949-4956
Published: 2020
Full Text: View/download PDF

11. Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data

Author: Wojciech Samek, Klaus-Robert Müller, Felix Sattler, Simon Wiedemann, and Publica
Subjects: Training set, Distributed database, Computer Networks and Communications, Computer science, business.industry, Deep learning, Distributed computing, Collaborative learning, 02 engineering and technology, Computer Science Applications, Data modeling, Uncompressed video, Artificial Intelligence, Server, Golomb coding, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), 020201 artificial intelligence & image processing, Upstream (networking), Artificial intelligence, business, Software
Abstract: Federated learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning, however, comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods, however, are only of limited utility in the federated learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions, such as i.i.d. distribution of the client data, which typically cannot be found in federated learning. In this article, we propose sparse ternary compression (STC), a new compression framework that is specifically designed to meet the requirements of the federated learning environment. STC extends the existing compression technique of top- $k$ gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms federated averaging in common federated learning scenarios. These results advocate for a paradigm shift in federated optimization toward high-frequency low-bitwidth communication, in particular in the bandwidth-constrained learning environments.
Published: 2020

12. Explanation-Guided Training for Cross-Domain Few-Shot Classification

Author: Wojciech Samek, Jiamei Sun, Yunqing Zhao, Alexander Binder, Ngai-Man Cheung, and Sebastian Lapuschkin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Source code, Generalization, Computer science, media_common.quotation_subject, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Domain (software engineering), Machine Learning (cs.LG), 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Relevance (information retrieval), 0105 earth and related environmental sciences, media_common, business.industry, Class (biology), Visualization, Pattern recognition (psychology), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Cross-domain few-shot classification task (CD-FSC) combines few-shot classification with the requirement to generalize across domains represented by datasets. This setup faces challenges originating from the limited labeled data in each class and, additionally, from the domain shift between training and test sets. In this paper, we introduce a novel training approach for existing FSC models. It leverages on the explanation scores, obtained from existing explanation methods when applied to the predictions of FSC models, computed for intermediate feature maps of the models. Firstly, we tailor the layer-wise relevance propagation (LRP) method to explain the predictions of FSC models. Secondly, we develop a model-agnostic explanation-guided training strategy that dynamically finds and emphasizes the features which are important for the predictions. Our contribution does not target a novel explanation method but lies in a novel application of explanations for the training phase. We show that explanation-guided training effectively improves the model generalization. We observe improved accuracy for three different FSC models: RelationNet, cross attention network, and a graph neural network-based formulation, on five few-shot learning datasets: miniImagenet, CUB, Cars, Places, and Plantae. The source code is available at https://github.com/SunJiamei/few-shot-lrp-guided
Published: 2020
Full Text: View/download PDF

13. Compact and Computationally Efficient Representation of Deep Neural Networks

Author: Wojciech Samek, Klaus-Robert Müller, Simon Wiedemann, and Publica
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Networks and Communications, Computer science, Entropy, Machine Learning (stat.ML), 02 engineering and technology, Machine Learning (cs.LG), Entropy (classical thermodynamics), Matrix (mathematics), Deep Learning, Statistics - Machine Learning, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Entropy (information theory), Neural and Evolutionary Computing (cs.NE), Entropy (energy dispersal), Entropy (arrow of time), Sparse matrix, Lossless compression, Artificial neural network, Entropy (statistical thermodynamics), Computer Science - Neural and Evolutionary Computing, Dot product, Computer Science Applications, 020201 artificial intelligence & image processing, Neural Networks, Computer, Algorithm, Software, Entropy (order and disorder)
Abstract: At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory complexity by lowering the entropy of the weight matrices of the neural network, e.g., by pruning and quantizing their elements. However, the quantized weight matrices are then usually represented either by a dense or sparse matrix storage format, whose associated dot product complexity is not bounded by the entropy of the matrix. This means that the associated inference complexity ultimately depends on the implicit statistical assumptions that these matrix representations make about the weight distribution, which can be in many cases suboptimal. In this paper we address this issue and present new efficient representations for matrices with low entropy statistics. These new matrix formats have the novel property that their memory and algorithmic complexity are implicitly bounded by the entropy of the matrix, consequently implying that they are guaranteed to become more efficient as the entropy of the matrix is being reduced. In our experiments we show that performing the dot product under these new matrix formats can indeed be more energy and time efficient under practically relevant assumptions. For instance, we are able to attain up to x42 compression ratios, x5 speed ups and x90 energy savings when we convert in a lossless manner the weight matrices of state-of-the-art networks such as AlexNet, VGG-16, ResNet152 and DenseNet into the new matrix formats and benchmark their respective dot product operation., 17 pages, 14 figures
Published: 2020

14. Viewport Forecasting in 360° Virtual Reality Videos with Machine Learning

Author: Wojciech Samek, Huseyin Camalan, Markus Wenzel, and Johanna Vielhaben
Subjects: 0209 industrial biotechnology, Panorama, Computer science, media_common.quotation_subject, Cloud gaming, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Illusion, Optical head-mounted display, 02 engineering and technology, Virtual reality, Machine learning, computer.software_genre, Rendering (computer graphics), 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, ComputingMethodologies_COMPUTERGRAPHICS, media_common, Viewport, business.industry, Gaze, Simulator sickness, Eye tracking, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Objective. Virtual reality (VR) cloud gaming and 360° video streaming are on the rise. With a VR headset, viewers can individually choose the perspective they see on the head-mounted display by turning their head, which creates the illusion of being in a virtual room. In this experimental study, we applied machine learning methods to anticipate future head rotations (a) from preceding head and eye motions, and (b) from the statistics of other spherical video viewers. Approach. Ten study participants watched each 3 1/3 hours of spherical video clips, while head and eye gaze motions were tracked, using a VR headset with a built-in eye tracker. Machine learning models were trained on the recorded head and gaze trajectories to predict (a) changes of head orientation and (b) the viewport from population statistics. Results. We assembled a dataset of head and gaze trajectories of spherical video viewers with great stimulus variability. We extracted statistical features from these time series and showed that a Support Vector Machine can classify the range of future head movements with a time horizon of up to one second with good accuracy. Even population statistics among only ten subjects show prediction success above chance level. %Both approaches resulted in a considerable amount of prediction success using head movements, but using gaze movement did not contribute to prediction performance in a meaningful way. Even basic machine learning models can successfully predict head movement and aspects thereof, while being naive to visual content. Significance. Viewport forecasting opens up various avenues to optimize VR rendering and transmission. While the viewer can see only a section of the surrounding 360° sphere, the entire panorama has typically to be rendered and/or broadcast. The reason is rooted in the transmission delay, which has to be taken into account in order to avoid simulator sickness due to motion-to-photon latencies. Knowing in advance, where the viewer is going to look at may help to make cloud rendering and video streaming of VR content more efficient and, ultimately, the VR experience more appealing.
Published: 2019

15. Evaluating the Visualization of What a Deep Neural Network Has Learned

Author: Alexander Binder, Sebastian Lapuschkin, Wojciech Samek, Grégoire Montavon, Klaus-Robert Müller, and Publica
Subjects: FOS: Computer and information sciences, Computer Networks and Communications, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Machine learning, computer.software_genre, 03 medical and health sciences, 0302 clinical medicine, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Contextual image classification, Artificial neural network, business.industry, Deep learning, Computer Science Applications, Visualization, 020201 artificial intelligence & image processing, Algorithm design, Artificial intelligence, Deconvolution, business, computer, 030217 neurology & neurosurgery, Software
Abstract: Deep Neural Networks (DNNs) have demonstrated impressive performance in complex machine learning tasks such as image classification or speech recognition. However, due to their multi-layer nonlinear structure, they are not transparent, i.e., it is hard to grasp what makes them arrive at a particular classification or recognition decision given a new unseen data sample. Recently, several approaches have been proposed enabling one to understand and interpret the reasoning embodied in a DNN for a single test image. These methods quantify the ''importance'' of individual pixels wrt the classification decision and allow a visualization in terms of a heatmap in pixel/input space. While the usefulness of heatmaps can be judged subjectively by a human, an objective quality measure is missing. In this paper we present a general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps. We compare heatmaps computed by three different methods on the SUN397, ILSVRC2012 and MIT Places data sets. Our main result is that the recently proposed Layer-wise Relevance Propagation (LRP) algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method. We provide theoretical arguments to explain this result and discuss its practical implications. Finally, we investigate the use of heatmaps for unsupervised assessment of neural network performance., Comment: 13 pages, 8 Figures
Published: 2017

16. Interpretable deep neural networks for single-trial EEG classification

Author: Wojciech Samek, Klaus-Robert Müller, Irene Sturm, Sebastian Lapuschkin, and Publica
Subjects: FOS: Computer and information sciences, Computer science, media_common.quotation_subject, Machine Learning (stat.ML), 02 engineering and technology, Cognitive neuroscience, Machine learning, computer.software_genre, 03 medical and health sciences, 0302 clinical medicine, Statistics - Machine Learning, Perception, 0202 electrical engineering, electronic engineering, information engineering, Animals, Humans, Relevance (information retrieval), Neural and Evolutionary Computing (cs.NE), Brain–computer interface, media_common, Interpretability, Neurons, Brain Mapping, Artificial neural network, business.industry, General Neuroscience, Computer Science - Neural and Evolutionary Computing, Brain, Electroencephalography, Neurophysiology, Brain Waves, Deep neural networks, 020201 artificial intelligence & image processing, Artificial intelligence, Nerve Net, business, computer, 030217 neurology & neurosurgery
Abstract: Background: In cognitive neuroscience the potential of Deep Neural Networks (DNNs) for solving complex classification tasks is yet to be fully exploited. The most limiting factor is that DNNs as notorious 'black boxes' do not provide insight into neurophysiological phenomena underlying a decision. Layer-wise Relevance Propagation (LRP) has been introduced as a novel method to explain individual network decisions. New Method: We propose the application of DNNs with LRP for the first time for EEG data analysis. Through LRP the single-trial DNN decisions are transformed into heatmaps indicating each data point's relevance for the outcome of the decision. Results: DNN achieves classification accuracies comparable to those of CSP-LDA. In subjects with low performance subject-to-subject transfer of trained DNNs can improve the results. The single-trial LRP heatmaps reveal neurophysiologically plausible patterns, resembling CSP-derived scalp maps. Critically, while CSP patterns represent class-wise aggregated information, LRP heatmaps pinpoint neural patterns to single time points in single trials. Comparison with Existing Method(s): We compare the classification performance of DNNs to that of linear CSP-LDA on two data sets related to motor-imaginery BCI. Conclusion: We have demonstrated that DNN is a powerful non-linear tool for EEG analysis. With LRP a new quality of high-resolution assessment of neural activity can be reached. LRP is a potential remedy for the lack of interpretability of DNNs that has limited their utility in neuroscientific applications. The extreme specificity of the LRP-derived heatmaps opens up new avenues for investigating neural activity underlying complex perception or decision-related processes., Comment: 5 pages, 1 figure
Published: 2016

17. Towards Best Practice in Explaining Neural Network Decisions with LRP

Author: Alexander Bauer, Alexander Binder, Sebastian Lapuschkin, Shinichi Nakajima, Maximilian Kohlbrenner, and Wojciech Samek
Subjects: FOS: Computer and information sciences, 0303 health sciences, Class (computer programming), Computer Science - Machine Learning, Artificial neural network, Computer science, business.industry, Process (engineering), Computer Vision and Pattern Recognition (cs.CV), Best practice, Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), 02 engineering and technology, Object (philosophy), Field (computer science), Machine Learning (cs.LG), 03 medical and health sciences, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Feedforward neural network, 020201 artificial intelligence & image processing, Relevance (information retrieval), Artificial intelligence, business, 030304 developmental biology
Abstract: Within the last decade, neural network based predictors have demonstrated impressive - and at times super-human - capabilities. This performance is often paid for with an intransparent prediction process and thus has sparked numerous contributions in the novel field of explainable artificial intelligence (XAI). In this paper, we focus on a popular and widely used method of XAI, the Layer-wise Relevance Propagation (LRP). Since its initial proposition LRP has evolved as a method, and a best practice for applying the method has tacitly emerged, based however on humanly observed evidence alone. In this paper we investigate - and for the first time quantify - the effect of this current best practice on feedforward neural networks in a visual object detection setting. The results verify that the layer-dependent approach to LRP applied in recent literature better represents the model's reasoning, and at the same time increases the object localization and class discriminativity of LRP., 7 pages, 4 figures, 1 table. fixed table row compared to v2. Presented virtually at IJCNN 2020
Published: 2019

18. Estimation of interaction forces in robotic surgery using a semi-supervised deep neural network model

Author: Wojciech Samek, Josep Fernández, Alicia Casals, Arturo Marban, Vignesh Srinivasan, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, and Universitat Politècnica de Catalunya. GRINS - Grup de Recerca en Robòtica Intel·ligent i Sistemes
Subjects: 0209 industrial biotechnology, Informàtica::Intel·ligència artificial::Aprenentatge automàtic [Àrees temàtiques de la UPC], Computer science, Robot, 02 engineering and technology, Semi-supervised learning, Machine learning, computer.software_genre, surgery, 020901 industrial engineering & automation, Robot vision, Deep neural networks, Aprenentatge automàtic, 0202 electrical engineering, electronic engineering, information engineering, Representation (mathematics), Artificial neural network, business.industry, Supervised learning, Frame (networking), Robotic surgery, Robòtica en medicina, Visió artificial (Robòtica), Robot-Assisted Minimally Invasive Surgery, Feature (computer vision), Robotics in medicine, Unsupervised learning, 020201 artificial intelligence & image processing, Vision based force sensing, Artificial intelligence, business, Informàtica::Robòtica [Àrees temàtiques de la UPC], Encoder, computer
Abstract: Providing force feedback as a feature in current Robot-Assisted Minimally Invasive Surgery systems still remains a challenge. In recent years, Vision-Based Force Sensing (VBFS) has emerged as a promising approach to address this problem. Existing methods have been developed in a Supervised Learning (SL) setting. Nonetheless, most of the video sequences related to robotic surgery are not provided with ground-truth force data, which can be easily acquired in a controlled environment. A powerful approach to process unlabeled video sequences and find a compact representation for each video frame relies on using an Unsupervised Learning (UL) method. Afterward, a model trained in an SL setting can take advantage of the available ground-truth force data. In the present work, UL and SL techniques are used to investigate a model in a Semi-Supervised Learning (SSL) framework, consisting of an encoder network and a Long-Short Term Memory (LSTM) network. First, a Convolutional Auto-Encoder (CAE) is trained to learn a compact representation for each RGB frame in a video sequence. To facilitate the reconstruction of high and low frequencies found in images, this CAE is optimized using an adversarial framework and a L1-loss, respectively. Thereafter, the encoder network of the CAE is serially connected with an LSTM network and trained jointly to minimize the difference between ground-truth and estimated force data. Datasets addressing the force estimation task are scarce. Therefore, the experiments have been validated in a custom dataset. The results suggest that the proposed approach is promising.
Published: 2019
Full Text: View/download PDF

19. Achieving Generalizable Robustness of Deep Neural Networks by Stability Training

Author: Wojciech Samek, Nils Strodthoff, and Jan Laermann
Subjects: Hyperparameter, Contextual image classification, Computer science, business.industry, Training (meteorology), Stability (learning theory), 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Range (mathematics), Robustness (computer science), Distortion, 0202 electrical engineering, electronic engineering, information engineering, Deep neural networks, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, 0105 earth and related environmental sciences
Abstract: We study the recently introduced stability training as a general-purpose method to increase the robustness of deep neural networks against input perturbations. In particular, we explore its use as an alternative to data augmentation and validate its performance against a number of distortion types and transformations including adversarial examples. In our image classification experiments using ImageNet data stability training performs on a par or even outperforms data augmentation for specific transformations, while consistently offering improved robustness against a broader range of distortion strengths and types unseen during training, a considerably smaller hyperparameter dependence and less potentially negative side effects compared to data augmentation.
Published: 2019

20. DRAU: Dual Recurrent Attention Units for Visual Question Answering

Author: Wojciech Samek, Ahmed Osman, and Publica
Subjects: Single model, Computer science, business.industry, 020207 software engineering, 02 engineering and technology, DUAL (cognitive architecture), Machine learning, computer.software_genre, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Question answering, Visual attention, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Relevant information, computer, Software
Abstract: Visual Question Answering (VQA) requires AI models to comprehend data in two domains, vision and text. Current state-of-the-art models use learned attention mechanisms to extract relevant information from the input domains to answer a certain question. Thus, robust attention mechanisms are essential for powerful VQA models. In this paper, we propose a recurrent attention mechanism and show its benefits compared to the traditional convolutional approach. We perform two ablation studies to evaluate recurrent attention. First, we introduce a baseline VQA model with visual attention and test the performance difference between convolutional and recurrent attention on the VQA 2.0 dataset. Secondly, we design an architecture for VQA which utilizes dual (textual and visual) Recurrent Attention Units (RAUs). Using this model, we show the effect of all possible combinations of recurrent and convolutional dual attention. Our single model outperforms the first place winner on the VQA 2016 challenge and to the best of our knowledge, it is the second best performing single model on the VQA 1.0 dataset. Furthermore, our model noticeably improves upon the winner of the VQA 2017 challenge. Moreover, we experiment replacing attention mechanisms in state-of-the-art models with our RAUs and show increased performance.
Published: 2019

21. Enhanced Machine Learning Techniques for Early HARQ Feedback Prediction in 5G

Author: Wojciech Samek, Cornelius Hellge, Baris Goktepe, Thomas Schierl, Nils Strodthoff, and Publica
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Networks and Communications, Computer science, business.industry, Computer Science - Information Theory, Information Theory (cs.IT), Probabilistic logic, Hybrid automatic repeat request, 020206 networking & telecommunications, 02 engineering and technology, Data_CODINGANDINFORMATIONTHEORY, Machine learning, computer.software_genre, Scheduling (computing), Machine Learning (cs.LG), Statistical classification, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Electrical and Electronic Engineering, business, computer, 5G, Decoding methods, Communication channel
Abstract: We investigate Early Hybrid Automatic Repeat reQuest (E-HARQ) feedback schemes enhanced by machine learning techniques as a path towards ultra-reliable and low-latency communication (URLLC). To this end, we propose machine learning methods to predict the outcome of the decoding process ahead of the end of the transmission. We discuss different input features and classification algorithms ranging from traditional methods to newly developed supervised autoencoders. These methods are evaluated based on their prospects of complying with the URLLC requirements of effective block error rates below $10^{-5}$ at small latency overheads. We provide realistic performance estimates in a system model incorporating scheduling effects to demonstrate the feasibility of E-HARQ across different signal-to-noise ratios, subcode lengths, channel conditions and system loads, and show the benefit over regular HARQ and existing E-HARQ schemes without machine learning., Comment: 14 pages, 15 figures; accepted version
Published: 2019

22. Evaluating Recurrent Neural Network Explanations

Author: Klaus-Robert Müller, Wojciech Samek, Ahmed Osman, and Leila Arras
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Machine Learning (stat.ML), 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Machine Learning (cs.LG), Task (project management), Negation, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Relevance (information retrieval), Neural and Evolutionary Computing (cs.NE), 0105 earth and related environmental sciences, business.industry, Sentiment analysis, Computer Science - Neural and Evolutionary Computing, Variable (computer science), Recurrent neural network, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, Word (computer architecture)
Abstract: Recently, several methods have been proposed to explain the predictions of recurrent neural networks (RNNs), in particular of LSTMs. The goal of these methods is to understand the network's decisions by assigning to each input variable, e.g., a word, a relevance indicating to which extent it contributed to a particular prediction. In previous works, some of these methods were not yet compared to one another, or were evaluated only qualitatively. We close this gap by systematically and quantitatively comparing these methods in different settings, namely (1) a toy arithmetic task which we use as a sanity check, (2) a five-class sentiment prediction of movie reviews, and besides (3) we explore the usefulness of word relevances to build sentence-level representations. Lastly, using the method that performed best in our experiments, we show how specific linguistic phenomena such as the negation in sentiment analysis reflect in terms of relevance patterns, and how the relevance visualization can help to understand the misclassification of individual samples., 14 pages, accepted for ACL'19 Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Published: 2019

23. Multi-Kernel Prediction Networks for Denoising of Burst Images

Author: Cornelius Hellge, Wojciech Samek, Serhan Gül, Talmaj Marinc, and Vignesh Srinivasan
Subjects: Well-posed problem, FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Noise reduction, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Machine Learning (stat.ML), 02 engineering and technology, Convolution, Machine Learning (cs.LG), Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Image denoising, Noise measurement, Artificial neural network, Pixel, business.industry, Photography, 020207 software engineering, Pattern recognition, Kernel (image processing), Computer Science::Computer Vision and Pattern Recognition, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: In low light or short-exposure photography the image is often corrupted by noise. While longer exposure helps reduce the noise, it can produce blurry results due to the object and camera motion. The reconstruction of a noise-less image is an ill posed problem. Recent approaches for image denoising aim to predict kernels which are convolved with a set of successively taken images (burst) to obtain a clear image. We propose a deep neural network based approach called Multi-Kernel Prediction Networks (MKPN) for burst image denoising. MKPN predicts kernels of not just one size but of varying sizes and performs fusion of these different kernels resulting in one kernel per pixel. The advantages of our method are two fold: (a) the different sized kernels help in extracting different information from the image which results in better reconstruction and (b) kernel fusion assures retaining of the extracted information while maintaining computational efficiency. Experimental results reveal that MKPN outperforms state-of-the-art on our synthetic datasets with different noise levels., Comment: 5 pages, 4 figures
Published: 2019
Full Text: View/download PDF

24. Black-box decision based adversarial attack with symmetric ?-stable distribution

Author: Wojciech Samek, Shinichi Nakajima, Vignesh Srinivasan, Ercan E. Kuruoglu, and Klaus-Robert Müller
Subjects: Black box (phreaking), Theoretical computer science, adversarial attack, Computer science, Gaussian, Boundary (topology), alpha-stable distribution, 020206 networking & telecommunications, 02 engineering and technology, Random walk, Stable distribution, Image (mathematics), symbols.namesake, deep neural networks, 0202 electrical engineering, electronic engineering, information engineering, symbols, 020201 artificial intelligence & image processing, Random variable, MNIST database, Computer Science::Cryptography and Security, image classification
Abstract: Developing techniques for adversarial attack and defense is an important research field for establishing reliable machine learning and its applications. Many existing methods employ Gaussian random variables for exploring the data space to find the most adversarial (for attacking) or least adversarial (for defense) point. However, the Gaussian distribution is not necessarily the optimal choice when the exploration is required to follow the complicated structure that most real-world data distributions exhibit. In this paper, we investigate how statistics of random variables affect such random walk exploration. Specifically, we generalize the Boundary Attack, a state-of-the-art blackbox decision based attacking strategy, and propose the Le’vy-Attack, where the random walk is driven by symmetric α-stable random variables. Our experiments on MNIST and CIFAR10 datasets show that the Le’vy-Attack explores the image data space more efficiently, and significantly improves the performance. Our results also give an insight into the recently found fact in the whitebox attacking scenario that the choice of the norm for measuring the amplitude of the adversarial patterns is essential.
Published: 2019

25. Entropy-Constrained Training of Deep Neural Networks

Author: Simon Wiedemann, Wojciech Samek, Klaus-Robert Müller, and Arturo Marban
Subjects: FOS: Computer and information sciences, Network architecture, Computer Science - Machine Learning, Artificial neural network, Computer science, Entropy (statistical thermodynamics), Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), 02 engineering and technology, Data_CODINGANDINFORMATIONTHEORY, 010501 environmental sciences, 01 natural sciences, Machine Learning (cs.LG), Entropy (classical thermodynamics), Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Entropy (information theory), Deep neural networks, 020201 artificial intelligence & image processing, Neural and Evolutionary Computing (cs.NE), Entropy (energy dispersal), Algorithm, Entropy (arrow of time), 0105 earth and related environmental sciences, Entropy (order and disorder)
Abstract: We propose a general framework for neural network compression that is motivated by the Minimum Description Length (MDL) principle. For that we first derive an expression for the entropy of a neural network, which measures its complexity explicitly in terms of its bit-size. Then, we formalize the problem of neural network compression as an entropy-constrained optimization objective. This objective generalizes many of the compression techniques proposed in the literature, in that pruning or reducing the cardinality of the weight elements of the network can be seen special cases of entropy-minimization techniques. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient based optimization techniques. Finally, we show that we can reach state-of-the-art compression results on different network architectures and data sets, e.g. achieving x71 compression gains on a VGG-like architecture., 8 pages, 6 figures
Published: 2018

26. Accurate and robust neural networks for face morphing attack detection

Author: Anna Hilsmann, Clemens Seibold, Wojciech Samek, and Peter Eisert
Subjects: Generality, Biometrics, Artificial neural network, Exploit, Computer Networks and Communications, Computer science, business.industry, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Facial recognition system, Morphing, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Decision-making, Safety, Risk, Reliability and Quality, business, computer, Software
Abstract: Artificial neural networks tend to use only what they need for a task. For example, to recognize a rooster, a network might only considers the rooster’s red comb and wattle and ignores the rest of the animal. This makes them vulnerable to attacks on their decision making process and can worsen their generality. Thus, this phenomenon has to be considered during the training of networks, especially in safety and security related applications. In this paper, we propose neural network training schemes, which are based on different alternations of the training data, to increase robustness and generality. Precisely, we limit the amount and position of information available to the neural network for the decision making process and study their effects on the accuracy, generality, and robustness against semantic and black box attacks for the particular example of face morphing attacks. In addition, we exploit layer-wise relevance propagation (LRP) to analyze the differences in the decision making process of the differently trained neural networks. A face morphing attack is an attack on a biometric facial recognition system, where the system is fooled to match two different individuals with the same synthetic face image. Such a synthetic image can be created by aligning and blending images of the two individuals that should be matched with this image. We train neural networks for face morphing attack detection using our proposed training schemes and show that they lead to an improvement of robustness against attacks on neural networks. Using LRP, we show that the improved training forces the networks to develop and use reliable models for all regions of the analyzed image. This redundancy in representation is of crucial importance to security related applications.
Published: 2020

27. Neural Network-Based Estimation of Distortion Sensitivity for Image Quality Prediction

Author: Wojciech Samek, Zacharias V. Fisches, Sören Becker, Sebastian Bosse, and Thomas Wiegand
Subjects: Artificial neural network, Computer science, Property (programming), business.industry, Image quality, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Convolutional neural network, Distortion, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Quality (business), Sensitivity (control systems), Artificial intelligence, business, media_common
Abstract: Due to its computational simplicity, the PSNR is a popular and widely used image quality measure, although it correlates poorly with perceived visual quality. Distortion sensitivity, a reference image specific property, can be used to compensate for the lack of perceptual relevance of the PSNR. Based on the functional mapping between perceptual and computational quality a deep convolutional neural network is used to estimate patchwise distortion sensitivity. The local estimates are used for an imagewise perceptual adaptation of the PSNR. The performance of the proposed estimation approach is evaluated on the LIVE and TID2013 databases and shows comparable or superior performance as compared to benchmark image quality measures.
Published: 2018

28. Neural network based intra prediction for video coding

Author: Wojciech Samek, Dominique Maniry, Philipp Helle, Thomas Wiegand, H. Schwarz, S. Kaltenstadler, Detlev Marpe, and Jonathan Pfaff
Subjects: Image pattern, Artificial neural network, business.industry, Computer science, Coding systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Pattern recognition, 02 engineering and technology, Artificial intelligence, business, Coding (social sciences)
Abstract: Today’s hybrid video coding systems typically perform an intra-picture prediction whereby blocks of samples are predicted from previously decoded samples of the same picture. For example, HEVC uses a set of angular prediction patterns to exploit directional sample correlations. In this paper, we propose new intra-picture prediction modes whose construction consists of two steps: First, a set of features is extracted from the decoded samples. Second, these features are used to select a predefined image pattern as the prediction signal. Since several intra prediction modes are proposed for each block-shape, a specific signalization scheme is also proposed. Our intra prediction modes lead to significant coding gains over state of the art video coding technologies.
Published: 2018

29. A Recurrent Convolutional Neural Network Approach for Sensorless Force Estimation in Robotic Surgery

Author: Josep Fernández, Alicia Casals, Vignesh Srinivasan, Wojciech Samek, Arturo Marban, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya. GRINS - Grup de Recerca en Robòtica Intel·ligent i Sistemes, and Publica
Subjects: FOS: Computer and information sciences, Computer science, Process (engineering), Computer Vision and Pattern Recognition (cs.CV), 0206 medical engineering, Computer Science - Computer Vision and Pattern Recognition, Health Informatics, Context (language use), 02 engineering and technology, LSTM networks, Convolutional neural network, Neural networks (Computer science), 03 medical and health sciences, 0302 clinical medicine, FOS: Electrical engineering, electronic engineering, information engineering, Xarxes neuronals (Informàtica), Neural and Evolutionary Computing (cs.NE), Haptic technology, Artificial neural network, Work (physics), Image and Video Processing (eess.IV), Computer Science - Neural and Evolutionary Computing, Control engineering, Robotic surgery, Robòtica en medicina, Electrical Engineering and Systems Science - Image and Video Processing, 020601 biomedical engineering, Task (computing), Robotics in medicine, Signal Processing, Trajectory, Convolutional neural networks, Force estimation, Informàtica::Robòtica [Àrees temàtiques de la UPC], 030217 neurology & neurosurgery
Abstract: Providing force feedback as relevant information in current Robot-Assisted Minimally Invasive Surgery systems constitutes a technological challenge due to the constraints imposed by the surgical environment. In this context, force estimation techniques represent a potential solution, enabling to sense the interaction forces between the surgical instruments and soft-tissues. Specifically, if visual feedback is available for observing soft-tissues’ deformation, this feedback can be used to estimate the forces applied to these tissues. To this end, a force estimation model, based on Convolutional Neural Networks and Long-Short Term Memory networks, is proposed in this work. This model is designed to process both, the spatiotemporal information present in video sequences and the temporal structure of tool data (the surgical tool-tip trajectory and its grasping status). A series of analyses are carried out to reveal the advantages of the proposal and the challenges that remain for real applications. This research work focuses on two surgical task scenarios, referred to as pushing and pulling tissue. For these two scenarios, different input data modalities and their effect on the force estimation quality are investigated. These input data modalities are tool data, video sequences and a combination of both. The results suggest that the force estimation quality is better when both, the tool data and video sequences, are processed by the neural network model. Moreover, this study reveals the need for a loss function, designed to promote the modeling of smooth and sharp details found in force signals. Finally, the results show that the modeling of forces due to pulling tasks is more challenging than for the simplest pushing actions.
Published: 2018

30. Transferring Information Between Neural Networks

Author: Christopher Ehmann and Wojciech Samek
Subjects: 0301 basic medicine, Training set, Mean squared error, Artificial neural network, Computer science, 02 engineering and technology, Regularization (mathematics), Backpropagation, Term (time), 03 medical and health sciences, symbols.namesake, 030104 developmental biology, Jacobian matrix and determinant, 0202 electrical engineering, electronic engineering, information engineering, symbols, 020201 artificial intelligence & image processing, Sensitivity (control systems), Algorithm
Abstract: This paper investigates techniques to transfer information between deep neural networks. We demonstrate that a student network, which has access to information computed by a teacher network on the training data, learns faster, can be less deep and requires less labeled examples to achieve a given performance level. For that we force the student to mimic the teacher by adding a penalty term to the student's objective. We evaluate different penalty terms: (1) mean squared error between the cost gradients, (2) the Jacobian of the pre-softmax layer, (3) its row-summed version, (4) the cost gradient differences to standard double backpropagation and (5) a targeted double backpropagation via gradient derived masks. The Jacobian method improves the accuracy proportional to the difference in training examples, in contrast to the cost gradient. If the difference in accuracy between teacher and student is large enough, we find an improvement from the Jacobian information, even if both had seen the same training data. This indicates that information transfer has a regularization effect.
Published: 2018

31. Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

Author: Wojciech Samek, Simon Wiedemann, Felix Sattler, and Klaus-Robert Müller
Subjects: FOS: Computer and information sciences, Computer science, Computer Science - Artificial Intelligence, Binary number, Machine Learning (stat.ML), 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Machine Learning (cs.LG), Reduction (complexity), Statistics - Machine Learning, Encoding (memory), 0202 electrical engineering, electronic engineering, information engineering, Upstream (networking), 0105 earth and related environmental sciences, business.industry, Deep learning, 020206 networking & telecommunications, Computer Science - Learning, Recurrent neural network, Artificial Intelligence (cs.AI), Computer engineering, Computer Science - Distributed, Parallel, and Cluster Computing, Artificial intelligence, Distributed, Parallel, and Cluster Computing (cs.DC), business
Abstract: Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. These challenges become even more pressing, as the number of computation nodes increases. To counteract this development we propose sparse binary compression (SBC), a compression framework that allows for a drastic reduction of communication cost for distributed training. SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits. By doing so, our method also allows us to smoothly trade-off gradient sparsity and temporal sparsity to adapt to the requirements of the learning task. Our experiments show, that SBC can reduce the upstream communication on a variety of convolutional and recurrent neural network architectures by more than four orders of magnitude without significantly harming the convergence speed in terms of forward-backward passes. For instance, we can train ResNet50 on ImageNet in the same number of iterations to the baseline accuracy, using $\times 3531$ less bits or train it to a $1\%$ lower accuracy using $\times 37208$ less bits. In the latter case, the total upstream communication required is cut from 125 terabytes to 3.35 gigabytes for every participating client.
Published: 2018
Full Text: View/download PDF

32. Methods for Interpreting and Understanding Deep Neural Networks

Author: Wojciech Samek, Klaus-Robert Müller, Grégoire Montavon, and Publica
Subjects: FOS: Computer and information sciences, 0301 basic medicine, Computer science, Machine Learning (stat.ML), 02 engineering and technology, Machine learning, computer.software_genre, Machine Learning (cs.LG), Set (abstract data type), 03 medical and health sciences, Artificial Intelligence, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Relevance (information retrieval), Electrical and Electronic Engineering, Interpretability, Artificial neural network, business.industry, Applied Mathematics, Entry point, Computer Science - Learning, 030104 developmental biology, Computational Theory and Mathematics, Signal Processing, Deep neural networks, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Statistics, Probability and Uncertainty, business, computer
Abstract: This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications., 14 pages, 10 figures
Published: 2018

33. Wasserstein stationary subspace analysis

Author: Wojciech Samek, Shinichi Nakajima, Klaus-Robert Müller, Stephan Kaltenstadler, and Publica
Subjects: Spatial filter, Computer science, 02 engineering and technology, 03 medical and health sciences, 0302 clinical medicine, Interfacing, Robustness (computer science), Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Segmentation, Algorithm design, Electrical and Electronic Engineering, Time series, Algorithm, 030217 neurology & neurosurgery, Subspace topology, Decoding methods
Abstract: Learning under non-stationarity can be achieved by decomposing the data into a subspace that is stationary and a non-stationary one (stationary subspace analysis (SSA)). While SSA has been used in various applications, its robustness and computational efficiency has limits due to the difficulty in optimizing the Kullback-Leibler divergence based objective. In this paper we contribute by extending SSA twofold: we propose SSA with (a) higher numerical efficiency by defining analytical SSA variants and (b) higher robustness by utilizing the Wasserstein-2 distance (Wasserstein SSA). We show the usefulness of our novel algorithms for toy data demonstrating their mathematical properties and for real-world data (1) allowing better segmentation of time series and (2) brain-computer interfacing, where the Wasserstein-based measure of non-stationarity is used for spatial filter regularization and gives rise to higher decoding performance.
Published: 2018

34. Assessing Perceived Image Quality Using Steady-State Visual Evoked Potentials and Spatio-Spectral Decomposition

Author: Sebastian Bosse, Klaus-Robert Müller, Benjamin Blankertz, Wojciech Samek, Thomas Wiegand, Laura Acqualagna, Gabriel Curio, Anne K. Porbadnigk, and Publica
Subjects: Visual perception, Steady state (electronics), genetic structures, Channel (digital image), Sensory processing, Image quality, Computer science, medicine.medical_treatment, 02 engineering and technology, Visual evoked potentials, Electroencephalography, Matrix decomposition, 03 medical and health sciences, 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, medicine, Electrical and Electronic Engineering, medicine.diagnostic_test, business.industry, Dimensionality reduction, Pattern recognition, 020201 artificial intelligence & image processing, Artificial intelligence, business, 030217 neurology & neurosurgery
Abstract: Steady-state visual evoked potentials (SSVEPs) are neural responses, measurable using electroencephalography (EEG), that are directly linked to sensory processing of visual stimuli. In this paper, SSVEP is used to assess the perceived quality of texture images. The EEG-based assessment method is compared with conventional methods, and recorded EEG data are correlated to obtained mean opinion scores (MOSs). A dimensionality reduction technique for EEG data called spatio-spectral decomposition (SSD) is adapted for the SSVEP framework and used to extract physiologically meaningful and plausible neural components from the EEG recordings. It is shown that the use of SSD not only increases the correlation between neural features and MOS to $r=-0.93$ , but also solves the problem of channel selection in an EEG-based image-quality assessment.
Published: 2018

35. Quality assessment of 3D visualizations with vertical disparity: An ERP approach

Author: Forooz Shahbazi, Sebastian Bosse, Guido Nolte, Thomas Wiegand, and Wojciech Samek
Subjects: Vision Disparity, media_common.quotation_subject, 0206 medical engineering, Stereoscopy, 02 engineering and technology, Stimulus (physiology), law.invention, 03 medical and health sciences, 0302 clinical medicine, Imaging, Three-Dimensional, law, Perception, parasitic diseases, Computer vision, media_common, Mathematics, Neural correlates of consciousness, Depth Perception, business.industry, Quality assessment, Pattern recognition, 020601 biomedical engineering, Amplitude, Artificial intelligence, Depth perception, business, 030217 neurology & neurosurgery
Abstract: In an objective approach for the assessment of quality of experience the neural correlates of EEG data are studied when stereoscopic images are presented in three different conditions containing vertical disparity. These conditions are compared to a similar image in 2D both on the channel level by studying the ERP components and on the source level by the localization of the corresponding ERP component. Our findings posit that P1 component in the occipital cortex has significantly increased in amplitude for 3D condition without vertical disparity compared to the 2D condition. According to previous studies, this component increases when depth information are added to the stimulus which is in line with our findings. However the amplitude of this component has significantly decreased for 3D condition with maximum vertical disparity compared to the 3D condition without vertical disparity. We have concluded that the perception of stereoscopic depth by subjects have decreased in this case due to the distortion introduced by vertical disparity. The underlying sources corresponding to P1 component are localized. Except for the power differences, the source locations do not differ for different conditions.
Published: 2017

36. Estimating Position & Velocity in 3D Space from Monocular Video Sequences Using a Deep Neural Network

Author: Wojciech Samek, Josep Fernández, Arturo Marban, Alicia Casals, Vignesh Srinivasan, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, and Universitat Politècnica de Catalunya. GRINS - Grup de Recerca en Robòtica Intel·ligent i Sistemes
Subjects: Mean squared error, Computer science, Feature extraction, Context (language use), 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Residual neural network, Position (vector), Machine learning, Aprenentatge automàtic, 0202 electrical engineering, electronic engineering, information engineering, 0105 earth and related environmental sciences, Imatges tridimensionals en medicina, Training set, Artificial neural network, business.industry, Deep learning, Visió per ordinador, Regression analysis, Pattern recognition, Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo [Àrees temàtiques de la UPC], Three dimensional imaging in medicine, Robòtica en medicina, Robotics in medicine, Computer vision, Vision based sensor substitution, 020201 artificial intelligence & image processing, Artificial intelligence, Informàtica::Robòtica [Àrees temàtiques de la UPC], business
Abstract: This work describes a regression model based on Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) networks for tracking objects from monocular video sequences. The target application being pursued is Vision-Based Sensor Substitution (VBSS). In particular, the tool-tip position and velocity in 3D space of a pair of surgical robotic instruments (SRI) are estimated for three surgical tasks, namely suturing, needle-passing and knot-tying. The CNN extracts features from individual video frames and the LSTM network processes these features over time and continuously outputs a 12-dimensional vector with the estimated position and velocity values. A series of analyses and experiments are carried out in the regression model to reveal the benefits and drawbacks of different design choices. First, the impact of the loss function is investigated by adequately weighing the Root Mean Squared Error (RMSE) and Gradient Difference Loss (GDL), using the VGG16 neural network for feature extraction. Second, this analysis is extended to a Residual Neural Network designed for feature extraction, which has fewer parameters than the VGG16 model, resulting in a reduction of ~96.44 % in the neural network size. Third, the impact of the number of time steps used to model the temporal information processed by the LSTM network is investigated. Finally, the capability of the regression model to generalize to the data related to "unseen" surgical tasks (unavailable in the training set) is evaluated. The aforesaid analyses are experimentally validated on the public dataset JIGSAWS. These analyses provide some guidelines for the design of a regression model in the context of VBSS, specifically when the objective is to estimate a set of 1D time series signals from video sequences.
Published: 2017

37. A perceptually relevant shearlet-based adaptation of the PSNR

Author: Thomas Wiegand, Sebastian Bosse, Mischa Siekmann, and Wojciech Samek
Subjects: Computer science, Image quality, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Adaptation (eye), 030229 sport sciences, 02 engineering and technology, Visualization, 03 medical and health sciences, 0302 clinical medicine, Shearlet, Nonlinear distortion, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Sensitivity (control systems), Artificial intelligence, business
Abstract: Although being one of the simplest and most widely used image quality metrics (IQMs) the peak signal-to-noise ratio (PSNR) correlates only poorly with visual quality as perceived by humans. Based on an analysis of the non-linear mapping from PSNR to mean opinion scores (MOS) we identify a functional mapping parameter to adapt the PSNR perceptually meaningful. Neurophysiologically motivated, a shearlet-based correction is proposed for controlling this perceptual PSNR adaption. The performance of the proposed perceptually adapted PSNR is evaluated on the LIVE and TID2013 databases and shows to be superior or comparable to benchmark IQMs.
Published: 2017

38. Explaining Recurrent Neural Network Predictions in Sentiment Analysis

Author: Grégoire Montavon, Leila Arras, Klaus-Robert Müller, and Wojciech Samek
Subjects: FOS: Computer and information sciences, Computer science, Computer Science - Artificial Intelligence, Machine Learning (stat.ML), 02 engineering and technology, Machine learning, computer.software_genre, Task (project management), Statistics - Machine Learning, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Relevance (information retrieval), Neural and Evolutionary Computing (cs.NE), Network architecture, Computer Science - Computation and Language, business.industry, Sentiment analysis, Computer Science - Neural and Evolutionary Computing, Artificial Intelligence (cs.AI), Recurrent neural network, 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Word (computer architecture)
Abstract: Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluate the resulting LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work., 9 pages, 4 figures, accepted for EMNLP'17 Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA)
Published: 2017

39. Object Boundary Detection and Classification with Image-Level Labels

Author: Alexander Binder, Wojciech Samek, Klaus-Robert Müller, and Jing Yu Koh
Subjects: Pixel, Artificial neural network, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Boundary (topology), Pattern recognition, 02 engineering and technology, 010501 environmental sciences, Object (computer science), 01 natural sciences, Edge detection, Task (project management), Object-class detection, Bounding overwatch, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: Semantic boundary and edge detection aims at simultaneously detecting object edge pixels in images and assigning class labels to them. Systematic training of predictors for this task requires the labeling of edges in images which is a particularly tedious task. We propose a novel strategy for solving this task, when pixel-level annotations are not available, performing it in an almost zero-shot manner by relying on conventional whole image neural net classifiers that were trained using large bounding boxes. Our method performs the following two steps at test time. Firstly it predicts the class labels by applying the trained whole image network to the test images. Secondly, it computes pixel-wise scores from the obtained predictions by applying backprop gradients as well as recent visualization algorithms such as deconvolution and layer-wise relevance propagation. We show that high pixel-wise scores are indicative for the location of semantic boundaries, which suggests that the semantic boundary problem can be approached without using edge labels during the training phase.
Published: 2017

40. Detection of Face Morphing Attacks by Deep Learning

Author: Wojciech Samek, Peter Eisert, Anna Hilsmann, and Clemens Seibold
Subjects: 021110 strategic, defence & security studies, Artificial neural network, Biometrics, Computer science, business.industry, Deep learning, Data_MISCELLANEOUS, Fingerprint (computing), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 0211 other engineering and technologies, Pattern recognition, 02 engineering and technology, Convolutional neural network, Facial recognition system, Morphing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Mobile device
Abstract: Identification by biometric features has become more popular in the last decade. High quality video and fingerprint sensors have become less expensive and are nowadays standard components in many mobile devices. Thus, many devices can be unlocked via fingerprint or face verification. The state of the art accuracy of biometric facial recognition systems prompted even systems that need high security standards like border control at airports to rely on biometric systems. While most biometric facial recognition systems perform quite accurate under a controlled environment, they can easily be tricked by morphing attacks. The concept of a morphing attack is to create a synthetic face image that contains characteristics of two different individuals and to use this image on a document or as reference image in a database. Using this image for authentication, a biometric facial recognition system accepts both individuals. In this paper, we propose a morphing attack detection approach based on convolutional neural networks. We present an automatic morphing pipeline to generate morphing attacks, train neural networks based on this data and analyze their accuracy. The accuracy of different well-known network architectures are compared and the advantage of using pretrained networks compared to networks learned from scratch is studied.
Published: 2017

41. 'What is relevant in a text document?': An interpretable machine learning approach

Author: Franziska Horn, Wojciech Samek, Klaus-Robert Müller, Leila Arras, Grégoire Montavon, and Publica
Subjects: FOS: Computer and information sciences, Support Vector Machine, Word embedding, Computer science, Vector Spaces, Social Sciences, lcsh:Medicine, 02 engineering and technology, computer.software_genre, Vocabulary, Convolutional neural network, Machine Learning (cs.LG), Machine Learning, Animal Cells, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, lcsh:Science, Neurons, Principal Component Analysis, Computer Science - Computation and Language, Multidisciplinary, Artificial neural network, Software Engineering, Semantics, Categorization, Physical Sciences, Engineering and Technology, 020201 artificial intelligence & image processing, Cellular Types, Computation and Language (cs.CL), Information Retrieval (cs.IR), Research Article, Computer and Information Sciences, Neural Networks, Imaging Techniques, Machine Learning (stat.ML), Documentation, Research and Analysis Methods, Machine learning, Computer Science - Information Retrieval, Artificial Intelligence, Support Vector Machines, 020204 information systems, Relevance (information retrieval), Preprocessing, business.industry, lcsh:R, Biology and Life Sciences, Linguistics, Cell Biology, Support vector machine, Computer Science - Learning, Algebra, Recurrent neural network, Linear Algebra, Cellular Neuroscience, lcsh:Q, Neural Networks, Computer, Artificial intelligence, business, computer, Mathematics, Neuroscience
Abstract: Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications., Comment: 19 pages, 7 figures
Published: 2017

42. Understanding and Comparing Deep Neural Networks for Age and Gender Classification

Author: Alexander Binder, Wojciech Samek, Sebastian Lapuschkin, and Klaus-Robert Müller
Subjects: FOS: Computer and information sciences, Computer Science - Artificial Intelligence, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Initialization, Machine Learning (stat.ML), 02 engineering and technology, Machine learning, computer.software_genre, 01 natural sciences, Computer Science - Information Retrieval, Machine Learning (cs.LG), Statistics - Machine Learning, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, Relevance (information retrieval), Artificial neural network, business.industry, 010401 analytical chemistry, 0104 chemical sciences, Computer Science - Learning, Artificial Intelligence (cs.AI), Test set, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Information Retrieval (cs.IR)
Abstract: Recently, deep neural networks have demonstrated excellent performances in recognizing the age and gender on human face images. However, these models were applied in a black-box manner with no information provided about which facial features are actually used for prediction and how these features depend on image preprocessing, model initialization and architecture choice. We present a study investigating these different effects. In detail, our work compares four popular neural network architectures, studies the effect of pretraining, evaluates the robustness of the considered alignment preprocessings via cross-method test set swapping and intuitively visualizes the model's prediction strategies in given preprocessing conditions using the recent Layer-wise Relevance Propagation (LRP) algorithm. Our evaluations on the challenging Adience benchmark show that suitable parameter initialization leads to a holistic perception of the input, compensating artefactual data representations. With a combination of simple preprocessing steps, we reach state of the art performance in gender recognition., Comment: 8 pages, 5 figures, 5 tables. Presented at ICCV 2017 Workshop: 7th IEEE International Workshop on Analysis and Modeling of Faces and Gestures
Published: 2017
Full Text: View/download PDF

43. Alternative CSP approaches for multimodal distributed BCI data

Author: Stephanie Brandl, Klaus-Robert Müller, and Wojciech Samek
Subjects: Class (computer programming), medicine.diagnostic_test, InformationSystems_INFORMATIONINTERFACESANDPRESENTATION(e.g.,HCI), Computer science, business.industry, Calibration (statistics), 0206 medical engineering, 02 engineering and technology, Electroencephalography, Machine learning, computer.software_genre, 020601 biomedical engineering, 03 medical and health sciences, 0302 clinical medicine, Motor imagery, Distraction, medicine, Artificial intelligence, Noise (video), business, computer, 030217 neurology & neurosurgery, Brain–computer interface
Abstract: Brain-Computer Interfaces (BCIs) are trained to distinguish between two (or more) mental states, e.g., left and right hand motor imagery, from the recorded brain signals. Common Spatial Patterns (CSP) is a popular method to optimally separate data from two motor imagery tasks under the assumption of an unimodal class distribution. In out of lab environments where users are distracted by additional noise sources this assumption may not hold. This paper systematically investigates BCI performance under such distractions and proposes two novel CSP variants, ensemble CSP and 2-step CSP, which can cope with multimodal class distributions. The proposed algorithms are evaluated using simulations and BCI data of 16 healthy participants performing motor imagery under 6 different types of distraction. Both methods are shown to significantly enhance the performance compared to the standard procedure.
Published: 2016

44. Brain-Computer Interfacing for multimedia quality assessment

Author: Thomas Wiegand, Sebastian Bosse, Wojciech Samek, and Klaus-Robert Müller
Subjects: Multimedia, Quality assessment, Computer science, media_common.quotation_subject, 02 engineering and technology, computer.software_genre, Field (computer science), Visualization, 03 medical and health sciences, InformationSystems_MODELSANDPRINCIPLES, 0302 clinical medicine, Brain computer interfacing, Interfacing, Human–computer interaction, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Quality (business), computer, 030217 neurology & neurosurgery, media_common
Abstract: The assessment of perceived multimedia quality is a central research field in information and media technology. Conventionally, psychophysical techniques are used for determining the quality of multimedia signals. Recently, Brain-Computer Interfacing (BCI)-based methods have been proposed for the assessment of perceived multimedia signal quality. In this paper we give an overview over the shortcomings of conventional approaches, present the state-of-the art of BCI-based methods and discuss open questions and challenges relevant to the BCI community.
Published: 2016

45. On the robustness of action recognition methods in compressed and pixel domain

Author: Wojciech Samek, Jan Meyer, Sebastian Bosse, Vignesh Srinivasan, Serhan Gül, Cornelius Hellge, and Thomas Schierl
Subjects: Pixel, business.industry, Computer science, 020207 software engineering, Pattern recognition, 02 engineering and technology, Convolutional neural network, Robustness (computer science), Histogram, 0202 electrical engineering, electronic engineering, information engineering, Action recognition, Codec, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Practical implications, Decoding methods
Abstract: This paper investigates the robustness of two state-of-theart action recognition algorithms: a pixel domain approach based on 3D convolutional neural networks (C3D) and a compressed domain approach requiring only partial decoding of the video, based on feature description using motion vectors and Fisher vector encoding (MV-FV). We study the robustness of the two algorithms against: (i) quality variations, (ii) changes in video encoding scheme, (iii) changes in resolutions. Experiments are performed on the HMDB51 dataset. Our main findings are that C3D is robust to variations of these parameters while the MV-FV is very sensitive. Hence, we consider C3D as a baseline method for our analysis. We also analyze the reasons behind these different behaviors and discuss their practical implications.
Published: 2016

46. A deep neural network for image quality assessment

Author: Thomas Wiegand, Dominique Maniry, Wojciech Samek, and Sebastian Bosse
Subjects: Artificial neural network, Computer science, business.industry, Image quality, Quantization (signal processing), Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Scene statistics, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Convolutional neural network, Pearson product-moment correlation coefficient, symbols.namesake, Distortion, 0202 electrical engineering, electronic engineering, information engineering, symbols, 020201 artificial intelligence & image processing, Artificial intelligence, business, Quantization (image processing), Feature detection (computer vision)
Abstract: This paper presents a no reference image (NR) quality assessment (IQA) method based on a deep convolutional neural network (CNN). The CNN takes unpreprocessed image patches as an input and estimates the quality without employing any domain knowledge. By that, features and natural scene statistics are learnt purely data driven and combined with pooling and regression in one framework. We evaluate the network on the LIVE database and achieve a linear Pearson correlation superior to state-of-the-art NR IQA methods. We also apply the network to the image forensics task of decoder-sided quantization parameter estimation and also here achieve correlations of r = 0.989.
Published: 2016

47. Brain-computer interfacing under distraction: an evaluation study

Author: Stephanie Brandl, Klaus-Robert Müller, Wojciech Samek, Laura Frølich, and Johannes Höhne
Subjects: Adult, Male, Computer science, Movement, Biomedical Engineering, Poison control, 02 engineering and technology, Artifact (software development), Machine learning, computer.software_genre, Field (computer science), Functional Laterality, 03 medical and health sciences, Cellular and Molecular Neuroscience, Young Adult, 0302 clinical medicine, Robustness (computer science), Distraction, Evoked Potentials, Somatosensory, 0202 electrical engineering, electronic engineering, information engineering, Humans, Brain–computer interface, business.industry, Discriminant Analysis, Reproducibility of Results, Electroencephalography, Linear discriminant analysis, Pipeline (software), Brain-Computer Interfaces, Imagination, 020201 artificial intelligence & image processing, Female, Artificial intelligence, business, Artifacts, ddc:006, computer, 030217 neurology & neurosurgery, Algorithms, Psychomotor Performance
Abstract: OBJECTIVE: While motor-imagery based brain-computer interfaces (BCIs) have been studied over many years by now, most of these studies have taken place in controlled lab settings. Bringing BCI technology into everyday life is still one of the main challenges in this field of research. APPROACH: This paper systematically investigates BCI performance under 6 types of distractions that mimic out-of-lab environments. MAIN RESULTS: We report results of 16 participants and show that the performance of the standard common spatial patterns (CSP) + regularized linear discriminant analysis classification pipeline drops significantly in this 'simulated' out-of-lab setting. We then investigate three methods for improving the performance: (1) artifact removal, (2) ensemble classification, and (3) a 2-step classification approach. While artifact removal does not enhance the BCI performance significantly, both ensemble classification and the 2-step classification combined with CSP significantly improve the performance compared to the standard procedure. SIGNIFICANCE: Systematically analyzing out-of-lab scenarios is crucial when bringing BCI into everyday life. Algorithms must be adapted to overcome nonstationary environments in order to tackle real-world challenges. Language: en
Published: 2016

48. Hybrid video object tracking in H.265/HEVC video streams

Author: Wojciech Samek, Thomas Schierl, Cornelius Hellge, Serhan Gül, and Jan Meyer
Subjects: Markov random field, Pixel, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Video sequence, 02 engineering and technology, Video tracking, 0202 electrical engineering, electronic engineering, information engineering, Coherence (signal processing), 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Bitstream, business, Decoding methods
Abstract: In this paper we propose a hybrid tracking method which detects moving objects in videos compressed according to H.265/HEVC standard. Our framework largely depends on motion vectors (MV) and block types obtained by partially decoding the video bit stream and occasionally uses pixel domain information to distinguish between two objects. The compressed domain method is based on a Markov Random Field (MRF) model that captures spatial and temporal coherence of the moving object and is updated on a frame-to-frame basis. The hybrid nature of our approach stems from the usage of a pixel domain method that extracts the color information from the fully-decoded I frames and is updated only after completion of each Group-of-Pictures (GOP). We test the tracking accuracy of our method using standard video sequences and show that our hybrid framework provides better tracking accuracy than a state-of-the-art MRF model.
Published: 2016

49. Shearlet-based reduced reference image quality assessment

Author: Mischa Siekmann, Wojciech Samek, Sebastian Bosse, Qiaobo Chen, and Thomas Wiegand
Subjects: Location parameter, business.industry, Image quality, Pooling, Feature extraction, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Shearlet, Feature (computer vision), Distortion, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Transform coding, Mathematics
Abstract: This paper proposes a reduced reference image quality assessment method using only a low number of features. It involves a shearlet decomposition, directional pooling of the obtained coefficient and extracts the scalewise statistical location parameter as a feature. The proposed method is tested and compared to similar approaches on the LIVE image database. On this database it outperforms the compared methods on five of seven distortion types and on the full testset with a linear correlation of = 0.89.
Published: 2016

50. Quality assessment of image patches distorted by image compression using crowdsourcing

Author: Wojciech Samek, Sebastian Bosse, Jennifer Rasch, Thomas Wiegand, and Mischa Siekmann
Subjects: Compression artifact, Pixel, business.industry, Computer science, Image quality, Mean opinion score, media_common.quotation_subject, 030229 sport sciences, 02 engineering and technology, Crowdsourcing, 03 medical and health sciences, 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Quality (business), Artificial intelligence, business, Spatial analysis, Image compression, media_common
Abstract: Three experiments addressing the assessment of perceived image quality in a patch-based manner are compared for HEVC compression artifacts. It is shown that image patches of a size small as 128×128 pixel are large enough to evaluate the perceived image quality in a Degradation Category Rating (DCR) setting. Ratings obtained with 128×128 pixel sized images patches and 512×512 pixel sized images of the same spatial statistics show a correlation of r=0.99. Based on this finding, image quality assessment of 128×128 pixel sized image patches degraded by HEVC compression is compared for controlled lab environment and uncontrolled crowdsourcing settings. Although we find high overall correlation between the quality ratings obtained in the two environments, observers tend to give worse ratings in the crowdsourcing setting and for conditions of higher quality a reduction of correlation is observed. These findings have implications for choosing controlled vs. uncontrolled viewing conditions for image quality assessment for real-life applications.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

63 results on '"Wojciech, Samek"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources