273 results on '"Borth, Damian"'
Search Results
2. Towards Scalable and Versatile Weight Space Learning
- Author
-
Schürholt, Konstantin, Mahoney, Michael W., and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
Learning representations of well-trained neural network models holds the promise to provide an understanding of the inner workings of those models. However, previous work has either faced limitations when processing larger networks or was task-specific to either discriminative or generative tasks. This paper introduces the SANE approach to weight-space learning. SANE overcomes previous limitations by learning task-agnostic representations of neural networks that are scalable to larger models of varying architectures and that show capabilities beyond a single task. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights, thus allowing one to embed larger neural networks as a set of tokens into the learned representation space. SANE reveals global model information from layer-wise embeddings, and it can sequentially generate unseen neural network models, which was unattainable with previous hyper-representation learning methods. Extensive empirical evaluation demonstrates that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger ResNet architectures., Comment: Accepted at ICML 2024
- Published
- 2024
3. Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation
- Author
-
Xiong, Zhitong, Wang, Yi, Zhang, Fahong, Stewart, Adam J., Hanna, Joëlle, Borth, Damian, Papoutsis, Ioannis, Saux, Bertrand Le, Camps-Valls, Gustau, and Zhu, Xiao Xiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data. Traditional models have been siloed, tailored to specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders the potential for a holistic analysis that could benefit from the combined strengths of these diverse data sources. Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science to integrate various data modalities into a single framework adaptively. This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks, including sensors never seen during pretraining. DOFA's innovative design offers a promising leap towards more accurate, efficient, and unified Earth observation analysis, showcasing remarkable adaptability and performance in harnessing the potential of multimodal Earth observation data., Comment: 36 pages, 7 figures
- Published
- 2024
4. Sample Weight Estimation Using Meta-Updates for Online Continual Learning
- Author
-
Hemati, Hamed and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
The loss function plays an important role in optimizing the performance of a learning system. A crucial aspect of the loss function is the assignment of sample weights within a mini-batch during loss computation. In the context of continual learning (CL), most existing strategies uniformly treat samples when calculating the loss value, thereby assigning equal weights to each sample. While this approach can be effective in certain standard benchmarks, its optimal effectiveness, particularly in more complex scenarios, remains underexplored. This is particularly pertinent in training "in the wild," such as with self-training, where labeling is automated using a reference model. This paper introduces the Online Meta-learning for Sample Importance (OMSI) strategy that approximates sample weights for a mini-batch in an online CL stream using an inner- and meta-update mechanism. This is done by first estimating sample weight parameters for each sample in the mini-batch, then, updating the model with the adapted sample weights. We evaluate OMSI in two distinct experimental settings. First, we show that OMSI enhances both learning and retained accuracy in a controlled noisy-labeled data stream. Then, we test the strategy in three standard benchmarks and compare it with other popular replay-based strategies. This research aims to foster the ongoing exploration in the area of self-adaptive CL.
- Published
- 2024
5. FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation
- Author
-
Sattarov, Timur, Schreyer, Marco, and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
Realistic synthetic tabular data generation encounters significant challenges in preserving privacy, especially when dealing with sensitive information in domains like finance and healthcare. In this paper, we introduce \textit{Federated Tabular Diffusion} (FedTabDiff) for generating high-fidelity mixed-type tabular data without centralized access to the original tabular datasets. Leveraging the strengths of \textit{Denoising Diffusion Probabilistic Models} (DDPMs), our approach addresses the inherent complexities in tabular data, such as mixed attribute types and implicit relationships. More critically, FedTabDiff realizes a decentralized learning scheme that permits multiple entities to collaboratively train a generative model while respecting data privacy and locality. We extend DDPMs into the federated setting for tabular data generation, which includes a synchronous update scheme and weighted averaging for effective model aggregation. Experimental evaluations on real-world financial and medical datasets attest to the framework's capability to produce synthetic data that maintains high fidelity, utility, privacy, and coverage., Comment: 9 pages, 2 figures, 2 tables, preprint version, currently under review
- Published
- 2024
6. Transformer-based Entity Legal Form Classification
- Author
-
Arimond, Alexander, Molteni, Mauro, Jany, Dominik, Manolova, Zornitsa, Borth, Damian, and Hoepner, Andreas G. F.
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
We propose the application of Transformer-based language models for classifying entity legal forms from raw legal entity names. Specifically, we employ various BERT variants and compare their performance against multiple traditional baselines. Our evaluation encompasses a substantial subset of freely available Legal Entity Identifier (LEI) data, comprising over 1.1 million legal entities from 30 different legal jurisdictions. The ground truth labels for classification per jurisdiction are taken from the Entity Legal Form (ELF) code standard (ISO 20275). Our findings demonstrate that pre-trained BERT variants outperform traditional text classification approaches in terms of F1 score, while also performing comparably well in the Macro F1 Score. Moreover, the validity of our proposal is supported by the outcome of third-party expert reviews conducted in ten selected jurisdictions. This study highlights the significant potential of Transformer-based models in advancing data standardization and data integration. The presented approaches can greatly benefit financial institutions, corporations, governments and other organizations in assessing business relationships, understanding risk exposure, and promoting effective governance.
- Published
- 2023
7. FinDiff: Diffusion Models for Financial Tabular Data Generation
- Author
-
Sattarov, Timur, Schreyer, Marco, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Quantitative Finance - Statistical Finance - Abstract
The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both academics and practitioners to conduct collaborative research effectively. The emergence of generative models, particularly diffusion models, capable of synthesizing data mimicking the underlying distributions of real-world data presents a compelling solution. This work introduces 'FinDiff', a diffusion model designed to generate real-world financial tabular data for a variety of regulatory downstream tasks, for example economic scenario modeling, stress tests, and fraud detection. The model uses embedding encodings to model mixed modality financial data, comprising both categorical and numeric attributes. The performance of FinDiff in generating synthetic tabular financial data is evaluated against state-of-the-art baseline models using three real-world financial datasets (including two publicly available datasets and one proprietary dataset). Empirical results demonstrate that FinDiff excels in generating synthetic tabular financial data with high fidelity, privacy, and utility., Comment: 9 pages, 5 figures, 3 tables, preprint version, currently under review
- Published
- 2023
8. Ben-ge: Extending BigEarthNet with Geographical and Environmental Data
- Author
-
Mommert, Michael, Kesseli, Nicolas, Hanna, Joëlle, Scheibenreif, Linus, Borth, Damian, and Demir, Begüm
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep learning methods have proven to be a powerful tool in the analysis of large amounts of complex Earth observation data. However, while Earth observation data are multi-modal in most cases, only single or few modalities are typically considered. In this work, we present the ben-ge dataset, which supplements the BigEarthNet-MM dataset by compiling freely and globally available geographical and environmental data. Based on this dataset, we showcase the value of combining different data modalities for the downstream tasks of patch-based land-use/land-cover classification and land-use/land-cover segmentation. ben-ge is freely available and expected to serve as a test bed for fully supervised and self-supervised Earth observation applications., Comment: Accepted for presentation at the IEEE International Geoscience and Remote Sensing Symposium 2023
- Published
- 2023
9. Partial Hypernetworks for Continual Learning
- Author
-
Hemati, Hamed, Lomonaco, Vincenzo, Bacciu, Davide, and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
Hypernetworks mitigate forgetting in continual learning (CL) by generating task-dependent weights and penalizing weight changes at a meta-model level. Unfortunately, generating all weights is not only computationally expensive for larger architectures, but also, it is not well understood whether generating all model weights is necessary. Inspired by latent replay methods in CL, we propose partial weight generation for the final layers of a model using hypernetworks while freezing the initial layers. With this objective, we first answer the question of how many layers can be frozen without compromising the final performance. Through several experiments, we empirically show that the number of layers that can be frozen is proportional to the distributional similarity in the CL stream. Then, to demonstrate the effectiveness of hypernetworks, we show that noisy streams can significantly impact the performance of latent replay methods, leading to increased forgetting when features from noisy experiences are replayed with old samples. In contrast, partial hypernetworks are more robust to noise by maintaining accuracy on previous experiences. Finally, we conduct experiments on the split CIFAR-100 and TinyImagenet benchmarks and compare different versions of partial hypernetworks to latent replay methods. We conclude that partial weight generation using hypernetworks is a promising solution to the problem of forgetting in neural networks. It can provide an effective balance between computation and final test accuracy in CL streams., Comment: Accepted to the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023
- Published
- 2023
10. Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
- Author
-
Wang, Shijun, Guðnason, Jón, and Borth, Damian
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computation and Language ,Computer Science - Sound - Abstract
Effective speech emotional representations play a key role in Speech Emotion Recognition (SER) and Emotional Text-To-Speech (TTS) tasks. However, emotional speech samples are more difficult and expensive to acquire compared with Neutral style speech, which causes one issue that most related works unfortunately neglect: imbalanced datasets. Models might overfit to the majority Neutral class and fail to produce robust and effective emotional representations. In this paper, we propose an Emotion Extractor to address this issue. We use augmentation approaches to train the model and enable it to extract effective and generalizable emotional representations from imbalanced datasets. Our empirical results show that (1) for the SER task, the proposed Emotion Extractor surpasses the state-of-the-art baseline on three imbalanced datasets; (2) the produced representations from our Emotion Extractor benefit the TTS model, and enable it to synthesize more expressive speech., Comment: Accepted by INTERSPEECH2023
- Published
- 2023
11. Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models
- Author
-
Honegger, Dominik, Schürholt, Konstantin, and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
With growing size of Neural Networks (NNs), model sparsification to reduce the computational cost and memory demand for model inference has become of vital interest for both research and production. While many sparsification methods have been proposed and successfully applied on individual models, to the best of our knowledge their behavior and robustness has not yet been studied on large populations of models. With this paper, we address that gap by applying two popular sparsification methods on populations of models (so called model zoos) to create sparsified versions of the original zoos. We investigate the performance of these two methods for each zoo, compare sparsification layer-wise, and analyse agreement between original and sparsified populations. We find both methods to be very robust with magnitude pruning able outperform variational dropout with the exception of high sparsification ratios above 80%. Further, we find sparsified models agree to a high degree with their original non-sparsified counterpart, and that the performance of original and sparsified model is highly correlated. Finally, all models of the model zoos and their sparsified model twins are publicly available: modelzoos.cc., Comment: Accepted at ICLR 2023 Workshop on Sparsity in Neural Networks
- Published
- 2023
12. Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities
- Author
-
Wang, Shijun, Guðnason, Jón, and Borth, Damian
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
State-of-the-art Text-To-Speech (TTS) models are capable of producing high-quality speech. The generated speech, however, is usually neutral in emotional expression, whereas very often one would want fine-grained emotional control of words or phonemes. Although still challenging, the first TTS models have been recently proposed that are able to control voice by manually assigning emotion intensity. Unfortunately, due to the neglect of intra-class distance, the intensity differences are often unrecognizable. In this paper, we propose a fine-grained controllable emotional TTS, that considers both inter- and intra-class distances and be able to synthesize speech with recognizable intensity difference. Our subjective and objective experiments demonstrate that our model exceeds two state-of-the-art controllable TTS models for controllability, emotion expressiveness and naturalness., Comment: Accepted by ICASSP2023
- Published
- 2023
13. Class-Incremental Learning with Repetition
- Author
-
Hemati, Hamed, Cossu, Andrea, Carta, Antonio, Hurtado, Julio, Pellegrini, Lorenzo, Bacciu, Davide, Lomonaco, Vincenzo, and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
Real-world data streams naturally include the repetition of previous concepts. From a Continual Learning (CL) perspective, repetition is a property of the environment and, unlike replay, cannot be controlled by the agent. Nowadays, the Class-Incremental (CI) scenario represents the leading test-bed for assessing and comparing CL strategies. This scenario type is very easy to use, but it never allows revisiting previously seen classes, thus completely neglecting the role of repetition. We focus on the family of Class-Incremental with Repetition (CIR) scenario, where repetition is embedded in the definition of the stream. We propose two stochastic stream generators that produce a wide range of CIR streams starting from a single dataset and a few interpretable control parameters. We conduct the first comprehensive evaluation of repetition in CL by studying the behavior of existing CL strategies under different CIR streams. We then present a novel replay strategy that exploits repetition and counteracts the natural imbalance present in the stream. On both CIFAR100 and TinyImageNet, our strategy outperforms other replay approaches, which are not designed for environments with repetition., Comment: Accepted to the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023 19 pages
- Published
- 2023
14. Federated Continual Learning to Detect Accounting Anomalies in Financial Auditing
- Author
-
Schreyer, Marco, Hemati, Hamed, Borth, Damian, and Vasarhelyi, Miklos A.
- Subjects
Computer Science - Machine Learning - Abstract
The International Standards on Auditing require auditors to collect reasonable assurance that financial statements are free of material misstatement. At the same time, a central objective of Continuous Assurance is the real-time assessment of digital accounting journal entries. Recently, driven by the advances in artificial intelligence, Deep Learning techniques have emerged in financial auditing to examine vast quantities of accounting data. However, learning highly adaptive audit models in decentralised and dynamic settings remains challenging. It requires the study of data distribution shifts over multiple clients and time periods. In this work, we propose a Federated Continual Learning framework enabling auditors to learn audit models from decentral clients continuously. We evaluate the framework's ability to detect accounting anomalies in common scenarios of organizational activity. Our empirical results, using real-world datasets and combined federated continual learning strategies, demonstrate the learned model's ability to detect anomalies in audit settings of data distribution shifts., Comment: 6 pages (excl. appendix), 5 figures, 1 table, preprint version, currently under review
- Published
- 2022
15. Model Zoos: A Dataset of Diverse Populations of Neural Network Models
- Author
-
Schürholt, Konstantin, Taskiran, Diyar, Knyazev, Boris, Giró-i-Nieto, Xavier, and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (i.e., their weights and biases) evolve on unique trajectories in weight space during training. Following, a population of such neural network models (referred to as model zoo) would form structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can reveal latent properties of individual models. With such model zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of NN weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of NNs. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of NN models for further research. In total the proposed model zoo dataset is based on eight image datasets, consists of 27 model zoos trained with varying hyperparameter combinations and includes 50'360 unique NN models as well as their sparsified twins, resulting in over 3'844'360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks. The dataset can be found at www.modelzoos.cc., Comment: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks
- Published
- 2022
16. Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights
- Author
-
Schürholt, Konstantin, Knyazev, Boris, Giró-i-Nieto, Xavier, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we extend hyper-representations for generative use to sample new model weights. We propose layer-wise loss normalization which we demonstrate is key to generate high-performing models and several sampling methods based on the topology of hyper-representations. The models generated using our methods are diverse, performant and capable to outperform strong baselines as evaluated on several downstream tasks: initialization, ensemble sampling and transfer learning. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations thereby paving the avenue for novel research directions., Comment: 36th Conference on Neural Information Processing Systems (NeurIPS 2022). arXiv admin note: text overlap with arXiv:2207.10951
- Published
- 2022
17. RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations
- Author
-
Müller, Ricardo, Schreyer, Marco, Sattarov, Timur, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Computer Science - Computational Engineering, Finance, and Science ,Quantitative Finance - Statistical Finance - Abstract
Detecting accounting anomalies is a recurrent challenge in financial statement audits. Recently, novel methods derived from Deep-Learning (DL) have been proposed to audit the large volumes of a statement's underlying accounting records. However, due to their vast number of parameters, such models exhibit the drawback of being inherently opaque. At the same time, the concealing of a model's inner workings often hinders its real-world application. This observation holds particularly true in financial audits since auditors must reasonably explain and justify their audit decisions. Nowadays, various Explainable AI (XAI) techniques have been proposed to address this challenge, e.g., SHapley Additive exPlanations (SHAP). However, in unsupervised DL as often applied in financial audits, these methods explain the model output at the level of encoded variables. As a result, the explanations of Autoencoder Neural Networks (AENNs) are often hard to comprehend by human auditors. To mitigate this drawback, we propose (RESHAPE), which explains the model output on an aggregated attribute-level. In addition, we introduce an evaluation framework to compare the versatility of XAI methods in auditing. Our experimental results show empirical evidence that RESHAPE results in versatile explanations compared to state-of-the-art baselines. We envision such attribute-level explanations as a necessary next step in the adoption of unsupervised DL techniques in financial auditing., Comment: 9 pages, 4 figures, 5 tables, preprint version, currently under review
- Published
- 2022
18. Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits
- Author
-
Schreyer, Marco, Sattarov, Timur, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Computer Science - Computational Engineering, Finance, and Science ,Computer Science - Cryptography and Security - Abstract
The ongoing 'digital transformation' fundamentally changes audit evidence's nature, recording, and volume. Nowadays, the International Standards on Auditing (ISA) requires auditors to examine vast volumes of a financial statement's underlying digital accounting records. As a result, audit firms also 'digitize' their analytical capabilities and invest in Deep Learning (DL), a successful sub-discipline of Machine Learning. The application of DL offers the ability to learn specialized audit models from data of multiple clients, e.g., organizations operating in the same industry or jurisdiction. In general, regulations require auditors to adhere to strict data confidentiality measures. At the same time, recent intriguing discoveries showed that large-scale DL models are vulnerable to leaking sensitive training data information. Today, it often remains unclear how audit firms can apply DL models while complying with data protection regulations. In this work, we propose a Federated Learning framework to train DL models on auditing relevant accounting data of multiple clients. The framework encompasses Differential Privacy and Split Learning capabilities to mitigate data confidentiality risks at model inference. We evaluate our approach to detect accounting anomalies in three real-world datasets of city payments. Our results provide empirical evidence that auditors can benefit from DL models that accumulate knowledge from multiple sources of proprietary client data., Comment: 8 pages, 5 figures, 3 tables, preprint version, currently under review
- Published
- 2022
19. Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition
- Author
-
Wang, Shijun, Hemati, Hamed, Guðnason, Jón, and Borth, Damian
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Speech Emotion Recognition (SER) is crucial for human-computer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet network, to improve SER performance given imbalanced and insufficient training data. We conduct experiments and demonstrate: 1) With a highly imbalanced dataset, our augmentation strategy significantly improves the SER performance (+8% recall score compared with the baseline). 2) Moreover, in a cross-lingual benchmark, where we train a model with enough source language utterances but very few target language utterances (around 50 in our experiments), our augmentation strategy brings benefits for the SER performance of all three target languages., Comment: Published in INTERSPEECH 2022
- Published
- 2022
20. Hyper-Representations for Pre-Training and Transfer Learning
- Author
-
Schürholt, Konstantin, Knyazev, Boris, Giró-i-Nieto, Xavier, and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we extend hyper-representations for generative use to sample new model weights as pre-training. We propose layer-wise loss normalization which we demonstrate is key to generate high-performing models and a sampling method based on the empirical density of hyper-representations. The models generated using our methods are diverse, performant and capable to outperform conventional baselines for transfer learning. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations thereby paving the avenue for novel research directions.
- Published
- 2022
21. Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data
- Author
-
Hemati, Hamed, Schreyer, Marco, and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
International audit standards require the direct assessment of a financial statement's underlying accounting journal entries. Driven by advances in artificial intelligence, deep-learning inspired audit techniques emerged to examine vast quantities of journal entry data. However, in regular audits, most of the proposed methods are applied to learn from a comparably stationary journal entry population, e.g., of a financial quarter or year. Ignoring situations where audit relevant distribution changes are not evident in the training data or become incrementally available over time. In contrast, in continuous auditing, deep-learning models are continually trained on a stream of recorded journal entries, e.g., of the last hour. Resulting in situations where previous knowledge interferes with new information and will be entirely overwritten. This work proposes a continual anomaly detection framework to overcome both challenges and designed to learn from a stream of journal entry data experiences. The framework is evaluated based on deliberately designed audit scenarios and two real-world datasets. Our experimental results provide initial evidence that such a learning scheme offers the ability to reduce false-positive alerts and false-negative decisions., Comment: AAAI 2022 Workshop on AI in Financial Services: Adaptiveness, Resilience & Governance
- Published
- 2021
22. Saliency Diversified Deep Ensemble for Robustness to Adversaries
- Author
-
Bogun, Alex, Kostadinov, Dimche, and Borth, Damian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,I.2.0 - Abstract
Deep learning models have shown incredible performance on numerous image recognition, classification, and reconstruction tasks. Although very appealing and valuable due to their predictive capabilities, one common threat remains challenging to resolve. A specifically trained attacker can introduce malicious input perturbations to fool the network, thus causing potentially harmful mispredictions. Moreover, these attacks can succeed when the adversary has full access to the target model (white-box) and even when such access is limited (black-box setting). The ensemble of models can protect against such attacks but might be brittle under shared vulnerabilities in its members (attack transferability). To that end, this work proposes a novel diversity-promoting learning approach for the deep ensembles. The idea is to promote saliency map diversity (SMD) on ensemble members to prevent the attacker from targeting all ensemble members at once by introducing an additional term in our learning objective. During training, this helps us minimize the alignment between model saliencies to reduce shared member vulnerabilities and, thus, increase ensemble robustness to adversaries. We empirically show a reduced transferability between ensemble members and improved performance compared to the state-of-the-art ensemble defense against medium and high strength white-box attacks. In addition, we demonstrate that our approach combined with existing methods outperforms state-of-the-art ensemble algorithms for defense under white-box and black-box attacks., Comment: Accepted to AAAI Workshop on Adversarial Machine Learning and Beyond 2022
- Published
- 2021
23. Hyper-Representations: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction
- Author
-
Schürholt, Konstantin, Kostadinov, Dimche, and Borth, Damian
- Subjects
Computer Science - Machine Learning - Abstract
Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn hyper-representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings., Comment: Published at 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia. 31 Pages, 14 figures
- Published
- 2021
24. Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
- Author
-
Wang, Shijun, Kostadinov, Dimche, and Borth, Damian
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research topic as it enables a range of applications like voice customizing, animation production, and others. Recent work in this area made progress with disentanglement methods that separate utterance content and speaker characteristics from speech audio recordings. However, many of these methods are subject to the leakage of prosody (e.g., pitch, volume), causing the speaker voice in the synthesized speech to be different from the desired target speakers. To prevent this issue, we propose a novel self-supervised approach that effectively learns disentangled pitch and volume representations that can represent the prosody styles of different speakers. We then use the learned prosodic representations as conditional information to train and enhance our VC model for zero-shot conversion. In our experiments, we show that our prosody representations are disentangled and rich in prosody information. Moreover, we demonstrate that the addition of our prosody representations improves our VC performance and surpasses state-of-the-art zero-shot VC performances., Comment: Published in: 2022 International Joint Conference on Neural Networks (IJCNN)
- Published
- 2021
25. Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks
- Author
-
Schreyer, Marco, Sattarov, Timur, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Computer Science - Computational Engineering, Finance, and Science - Abstract
International audit standards require the direct assessment of a financial statement's underlying accounting transactions, referred to as journal entries. Recently, driven by the advances in artificial intelligence, deep learning inspired audit techniques have emerged in the field of auditing vast quantities of journal entry data. Nowadays, the majority of such methods rely on a set of specialized models, each trained for a particular audit task. At the same time, when conducting a financial statement audit, audit teams are confronted with (i) challenging time-budget constraints, (ii) extensive documentation obligations, and (iii) strict model interpretability requirements. As a result, auditors prefer to harness only a single preferably `multi-purpose' model throughout an audit engagement. We propose a contrastive self-supervised learning framework designed to learn audit task invariant accounting data representations to meet this requirement. The framework encompasses deliberate interacting data augmentation policies that utilize the attribute characteristics of journal entry data. We evaluate the framework on two real-world datasets of city payments and transfer the learned representations to three downstream audit tasks: anomaly detection, audit sampling, and audit documentation. Our experimental results provide empirical evidence that the proposed framework offers the ability to increase the efficiency of audits by learning rich and interpretable `multi-task' representations., Comment: 8 pages (excl. appendix), 4 Figures, 3 Tables
- Published
- 2021
26. Heterogeneous Ensemble for ESG Ratings Prediction
- Author
-
Krappel, Tim, Bogun, Alex, and Borth, Damian
- Subjects
Computer Science - Artificial Intelligence ,J.4 - Abstract
Over the past years, topics ranging from climate change to human rights have seen increasing importance for investment decisions. Hence, investors (asset managers and asset owners) who wanted to incorporate these issues started to assess companies based on how they handle such topics. For this assessment, investors rely on specialized rating agencies that issue ratings along the environmental, social and governance (ESG) dimensions. Such ratings allow them to make investment decisions in favor of sustainability. However, rating agencies base their analysis on subjective assessment of sustainability reports, not provided by every company. Furthermore, due to human labor involved, rating agencies are currently facing the challenge to scale up the coverage in a timely manner. In order to alleviate these challenges and contribute to the overall goal of supporting sustainability, we propose a heterogeneous ensemble model to predict ESG ratings using fundamental data. This model is based on feedforward neural network, CatBoost and XGBoost ensemble members. Given the public availability of fundamental data, the proposed method would allow cost-efficient and scalable creation of initial ESG ratings (also for companies without sustainability reporting). Using our approach we are able to explain 54% of the variation in ratings R2 using fundamental data and outperform prior work in this area., Comment: Accepted to KDD Workshop on Machine Learning in Finance 2021
- Published
- 2021
27. Learning Interpretable Concept Groups in CNNs
- Author
-
Varshneya, Saurabh, Ledent, Antoine, Vandermeulen, Robert A., Lei, Yunwen, Enders, Matthias, Borth, Damian, and Kloft, Marius
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
We propose a novel training methodology -- Concept Group Learning (CGL) -- that encourages training of interpretable CNN filters by partitioning filters in each layer into concept groups, each of which is trained to learn a single visual concept. We achieve this through a novel regularization strategy that forces filters in the same group to be active in similar image regions for a given layer. We additionally use a regularizer to encourage a sparse weighting of the concept groups in each layer so that a few concept groups can have greater importance than others. We quantitatively evaluate CGL's model interpretability using standard interpretability evaluation techniques and find that our method increases interpretability scores in most cases. Qualitatively we compare the image regions that are most active under filters learned using CGL versus filters learned without CGL and find that CGL activation regions more strongly concentrate around semantically relevant features.
- Published
- 2021
- Full Text
- View/download PDF
28. Estimation of Air Pollution with Remote Sensing Data: Revealing Greenhouse Gas Emissions from Space
- Author
-
Scheibenreif, Linus, Mommert, Michael, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,I.4 - Abstract
Air pollution is a major driver of climate change. Anthropogenic emissions from the burning of fossil fuels for transportation and power generation emit large amounts of problematic air pollutants, including Greenhouse Gases (GHGs). Despite the importance of limiting GHG emissions to mitigate climate change, detailed information about the spatial and temporal distribution of GHG and other air pollutants is difficult to obtain. Existing models for surface-level air pollution rely on extensive land-use datasets which are often locally restricted and temporally static. This work proposes a deep learning approach for the prediction of ambient air pollution that only relies on remote sensing data that is globally available and frequently updated. Combining optical satellite imagery with satellite-based atmospheric column density air pollution measurements enables the scaling of air pollution estimates (in this case NO$_2$) to high spatial resolution (up to $\sim$10m) at arbitrary locations and adds a temporal component to these estimates. The proposed model performs with high accuracy when evaluated against air quality measurements from ground stations (mean absolute error $<$6$~\mu g/m^3$). Our results enable the identification and temporal monitoring of major sources of air pollution and GHGs., Comment: for associated codebase, see https://www.github.com/HSG-AIML/RemoteSensingNO2Estimation
- Published
- 2021
29. Power Plant Classification from Remote Imaging with Deep Learning
- Author
-
Mommert, Michael, Scheibenreif, Linus, Hanna, Joëlle, and Borth, Damian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Satellite remote imaging enables the detailed study of land use patterns on a global scale. We investigate the possibility to improve the information content of traditional land use classification by identifying the nature of industrial sites from medium-resolution remote sensing images. In this work, we focus on classifying different types of power plants from Sentinel-2 imaging data. Using a ResNet-50 deep learning model, we are able to achieve a mean accuracy of 90.0% in distinguishing 10 different power plant types and a background class. Furthermore, we are able to identify the cooling mechanisms utilized in thermal power plants with a mean accuracy of 87.5%. Our results enable us to qualitatively investigate the energy mix from Sentinel-2 imaging data, and prove the feasibility to classify industrial sites on a global scale from freely available satellite imagery., Comment: Presented at the 2021 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
- Published
- 2021
30. NoiseVC: Towards High Quality Zero-Shot Voice Conversion
- Author
-
Wang, Shijun and Borth, Damian
- Subjects
Computer Science - Sound ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Voice conversion (VC) is a task that transforms voice from target audio to source without losing linguistic contents, it is challenging especially when source and target speakers are unseen during training (zero-shot VC). Previous approaches require a pre-trained model or linguistic data to do the zero-shot conversion. Meanwhile, VC models with Vector Quantization (VQ) or Instance Normalization (IN) are able to disentangle contents from audios and achieve successful conversions. However, disentanglement in these models highly relies on heavily constrained bottleneck layers, thus, the sound quality is drastically sacrificed. In this paper, we propose NoiseVC, an approach that can disentangle contents based on VQ and Contrastive Predictive Coding (CPC). Additionally, Noise Augmentation is performed to further enhance disentanglement capability. We conduct several experiments and demonstrate that NoiseVC has a strong disentanglement ability with a small sacrifice of quality.
- Published
- 2021
31. Continual Speaker Adaptation for Text-to-Speech Synthesis
- Author
-
Hemati, Hamed and Borth, Damian
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Training a multi-speaker Text-to-Speech (TTS) model from scratch is computationally expensive and adding new speakers to the dataset requires the model to be re-trained. The naive solution of sequential fine-tuning of a model for new speakers can lead to poor performance of older speakers. This phenomenon is known as catastrophic forgetting. In this paper, we look at TTS modeling from a continual learning perspective, where the goal is to add new speakers without forgetting previous speakers. Therefore, we first propose an experimental setup and show that serial fine-tuning for new speakers can cause the forgetting of the earlier speakers. Then we exploit two well-known techniques for continual learning, namely experience replay and weight regularization. We reveal how one can mitigate the effect of degradation in speech synthesis diversity in sequential training of new speakers using these methods. Finally, we present a simple extension to experience replay to improve the results in extreme setups where we have access to very small buffers., Comment: Preprint
- Published
- 2021
32. Leaking Sensitive Financial Accounting Data in Plain Sight using Deep Autoencoder Neural Networks
- Author
-
Schreyer, Marco, Schulze, Chistian, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Nowadays, organizations collect vast quantities of sensitive information in `Enterprise Resource Planning' (ERP) systems, such as accounting relevant transactions, customer master data, or strategic sales price information. The leakage of such information poses a severe threat for companies as the number of incidents and the reputational damage to those experiencing them continue to increase. At the same time, discoveries in deep learning research revealed that machine learning models could be maliciously misused to create new attack vectors. Understanding the nature of such attacks becomes increasingly important for the (internal) audit and fraud examination practice. The creation of such an awareness holds in particular for the fraudulent data leakage using deep learning-based steganographic techniques that might remain undetected by state-of-the-art `Computer Assisted Audit Techniques' (CAATs). In this work, we introduce a real-world `threat model' designed to leak sensitive accounting data. In addition, we show that a deep steganographic process, constituted by three neural networks, can be trained to hide such data in unobtrusive `day-to-day' images. Finally, we provide qualitative and quantitative evaluations on two publicly available real-world payment datasets., Comment: 8 pages (excl. appendix), 4 Figures, 2 Tables, AAAI-21 Workshop on Knowledge Discovery from Unstructured Data in Financial Services, this paper is the initial accepted version
- Published
- 2020
33. Characterization of Industrial Smoke Plumes from Remote Sensing Data
- Author
-
Mommert, Michael, Sigel, Mario, Neuhausler, Marcel, Scheibenreif, Linus, and Borth, Damian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The major driver of global warming has been identified as the anthropogenic release of greenhouse gas (GHG) emissions from industrial activities. The quantitative monitoring of these emissions is mandatory to fully understand their effect on the Earth's climate and to enforce emission regulations on a large scale. In this work, we investigate the possibility to detect and quantify industrial smoke plumes from globally and freely available multi-band image data from ESA's Sentinel-2 satellites. Using a modified ResNet-50, we can detect smoke plumes of different sizes with an accuracy of 94.3%. The model correctly ignores natural clouds and focuses on those imaging channels that are related to the spectral absorption from aerosols and water vapor, enabling the localization of smoke. We exploit this localization ability and train a U-Net segmentation model on a labeled sub-sample of our data, resulting in an Intersection-over-Union (IoU) metric of 0.608 and an overall accuracy for the detection of any smoke plume of 94.0%; on average, our model can reproduce the area covered by smoke in an image to within 5.6%. The performance of our model is mostly limited by occasional confusion with surface objects, the inability to identify semi-transparent smoke, and human limitations to properly identify smoke based on RGB-only images. Nevertheless, our results enable us to reliably detect and qualitatively estimate the level of smoke activity in order to monitor activity in industrial plants across the globe. Our data set and code base are publicly available., Comment: To be presented at the "Tackling Climate Change with Machine Learning" workshop at NeurIPS 2020
- Published
- 2020
34. Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement
- Author
-
Hemati, Hamed and Borth, Damian
- Subjects
Computer Science - Sound ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them for new speakers or languages is not straightforward in a low-resource setup. In this paper, we show that by applying minor modifications to a Tacotron model, one can transfer an existing TTS model for new speakers from the same or a different language using only 20 minutes of data. For this purpose, we first introduce a base multi-lingual Tacotron with language-agnostic input, then demonstrate how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways., Comment: Preprint
- Published
- 2020
35. Maschinelles Lernen
- Author
-
Borth, Damian, Hüllermeier, Eyke, Kauermann, Göran, Gillhuber, Andreas, editor, Kauermann, Göran, editor, and Hauner, Wolfgang, editor
- Published
- 2023
- Full Text
- View/download PDF
36. Learning Sampling in Financial Statement Audits using Vector Quantised Autoencoder Neural Networks
- Author
-
Schreyer, Marco, Sattarov, Timur, Gierbl, Anita, Reimer, Bernd, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
The audit of financial statements is designed to collect reasonable assurance that an issued statement is free from material misstatement 'true and fair presentation'. International audit standards require the assessment of a statements' underlying accounting relevant transactions referred to as 'journal entries' to detect potential misstatements. To efficiently audit the increasing quantities of such entries, auditors regularly conduct a sample-based assessment referred to as 'audit sampling'. However, the task of audit sampling is often conducted early in the overall audit process. Often at a stage, in which an auditor might be unaware of all generative factors and their dynamics that resulted in the journal entries in-scope of the audit. To overcome this challenge, we propose the application of Vector Quantised-Variational Autoencoder (VQ-VAE) neural networks. We demonstrate, based on two real-world city payment datasets, that such artificial neural networks are capable of learning a quantised representation of accounting data. We show that the learned quantisation uncovers (i) the latent factors of variation and (ii) can be utilised as a highly representative audit sample in financial statement audits., Comment: 8 pages, 5 figures, 3 tables, to appear in Proceedings of the ACM's International Conference on AI in Finance (ICAIF'20), this paper is the initial accepted version
- Published
- 2020
37. Facial Recognition: A cross-national Survey on Public Acceptance, Privacy, and Discrimination
- Author
-
Steinacker, Léa, Meckel, Miriam, Kostka, Genia, and Borth, Damian
- Subjects
Computer Science - Computers and Society ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
With rapid advances in machine learning (ML), more of this technology is being deployed into the real world interacting with us and our environment. One of the most widely applied application of ML is facial recognition as it is running on millions of devices. While being useful for some people, others perceive it as a threat when used by public authorities. This discrepancy and the lack of policy increases the uncertainty in the ML community about the future direction of facial recognition research and development. In this paper we present results from a cross-national survey about public acceptance, privacy, and discrimination of the use of facial recognition technology (FRT) in the public. This study provides insights about the opinion towards FRT from China, Germany, the United Kingdom (UK), and the United States (US), which can serve as input for policy makers and legal regulators., Comment: ICML 2020 - Law and Machine Learning Workshop, Vienna, Austria
- Published
- 2020
38. An Investigation of the Weight Space to Monitor the Training Progress of Neural Networks
- Author
-
Schürholt, Konstantin and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
Safe use of Deep Neural Networks (DNNs) requires careful testing. However, deployed models are often trained further to improve in performance. As rigorous testing and evaluation is expensive, triggers are in need to determine the degree of change of a model. In this paper we investigate the weight space of DNN models for structure that can be exploited to that end. Our results show that DNN models evolve on unique, smooth trajectories in weight space which can be used to track DNN training progress. We hypothesize that curvature and smoothness of the trajectories as well as step length along it may contain information on the state of training as well as potential domain shifts. We show that the model trajectories can be separated and the order of checkpoints on the trajectories recovered, which may serve as a first step towards DNN model versioning., Comment: 8 pages, 9 figures
- Published
- 2020
39. Neural Networks and Value at Risk
- Author
-
Arimond, Alexander, Borth, Damian, Hoepner, Andreas, Klawunn, Michael, and Weisheit, Stefan
- Subjects
Quantitative Finance - Risk Management ,Computer Science - Machine Learning ,Economics - Econometrics - Abstract
Utilizing a generative regime switching framework, we perform Monte-Carlo simulations of asset returns for Value at Risk threshold estimation. Using equity markets and long term bonds as test assets in the global, US, Euro area and UK setting over an up to 1,250 weeks sample horizon ending in August 2018, we investigate neural networks along three design steps relating (i) to the initialization of the neural network, (ii) its incentive function according to which it has been trained and (iii) the amount of data we feed. First, we compare neural networks with random seeding with networks that are initialized via estimations from the best-established model (i.e. the Hidden Markov). We find latter to outperform in terms of the frequency of VaR breaches (i.e. the realized return falling short of the estimated VaR threshold). Second, we balance the incentive structure of the loss function of our networks by adding a second objective to the training instructions so that the neural networks optimize for accuracy while also aiming to stay in empirically realistic regime distributions (i.e. bull vs. bear market frequencies). In particular this design feature enables the balanced incentive recurrent neural network (RNN) to outperform the single incentive RNN as well as any other neural network or established approach by statistically and economically significant levels. Third, we half our training data set of 2,000 days. We find our networks when fed with substantially less data (i.e. 1,000 days) to perform significantly worse which highlights a crucial weakness of neural networks in their dependence on very large data sets ..., Comment: 2019 Financial Data Science Association Paper, San Francisco
- Published
- 2020
40. Adversarial Learning of Deepfakes in Accounting
- Author
-
Schreyer, Marco, Sattarov, Timur, Reimer, Bernd, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Nowadays, organizations collect vast quantities of accounting relevant transactions, referred to as 'journal entries', in 'Enterprise Resource Planning' (ERP) systems. The aggregation of those entries ultimately defines an organization's financial statement. To detect potential misstatements and fraud, international audit standards demand auditors to directly assess journal entries using 'Computer Assisted AuditTechniques' (CAATs). At the same time, discoveries in deep learning research revealed that machine learning models are vulnerable to 'adversarial attacks'. It also became evident that such attack techniques can be misused to generate 'Deepfakes' designed to directly attack the perception of humans by creating convincingly altered media content. The research of such developments and their potential impact on the finance and accounting domain is still in its early stage. We believe that it is of vital relevance to investigate how such techniques could be maliciously misused in this sphere. In this work, we show an adversarial attack against CAATs using deep neural networks. We first introduce a real-world 'thread model' designed to camouflage accounting anomalies such as fraudulent journal entries. Second, we show that adversarial autoencoder neural networks are capable of learning a human interpretable model of journal entries that disentangles the entries latent generative factors. Finally, we demonstrate how such a model can be maliciously misused by a perpetrator to generate robust 'adversarial' journal entries that mislead CAATs., Comment: 17 pages, 10 figures, and, 5 tables
- Published
- 2019
41. Detection of Accounting Anomalies in the Latent Space using Adversarial Autoencoder Neural Networks
- Author
-
Schreyer, Marco, Sattarov, Timur, Schulze, Christian, Reimer, Bernd, and Borth, Damian
- Subjects
Computer Science - Machine Learning ,Quantitative Finance - Statistical Finance ,Statistics - Machine Learning - Abstract
The detection of fraud in accounting data is a long-standing challenge in financial statement audits. Nowadays, the majority of applied techniques refer to handcrafted rules derived from known fraud scenarios. While fairly successful, these rules exhibit the drawback that they often fail to generalize beyond known fraud scenarios and fraudsters gradually find ways to circumvent them. In contrast, more advanced approaches inspired by the recent success of deep learning often lack seamless interpretability of the detected results. To overcome this challenge, we propose the application of adversarial autoencoder networks. We demonstrate that such artificial neural networks are capable of learning a semantic meaningful representation of real-world journal entries. The learned representation provides a holistic view on a given set of journal entries and significantly improves the interpretability of detected accounting anomalies. We show that such a representation combined with the networks reconstruction error can be utilized as an unsupervised and highly adaptive anomaly assessment. Experiments on two datasets and initial feedback received by forensic accountants underpinned the effectiveness of the approach., Comment: 11 pages, 9 figures, 2nd KDD Workshop on Anomaly Detection in Finance, August 05, 2019, Anchorage, Alaska
- Published
- 2019
42. Overcoming Missing and Incomplete Modalities with Generative Adversarial Networks for Building Footprint Segmentation
- Author
-
Bischke, Benjamin, Helber, Patrick, König, Florian, Borth, Damian, and Dengel, Andreas
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The integration of information acquired with different modalities, spatial resolution and spectral bands has shown to improve predictive accuracies. Data fusion is therefore one of the key challenges in remote sensing. Most prior work focusing on multi-modal fusion, assumes that modalities are always available during inference. This assumption limits the applications of multi-modal models since in practice the data collection process is likely to generate data with missing, incomplete or corrupted modalities. In this paper, we show that Generative Adversarial Networks can be effectively used to overcome the problems that arise when modalities are missing or incomplete. Focusing on semantic segmentation of building footprints with missing modalities, our approach achieves an improvement of about 2% on the Intersection over Union (IoU) against the same network that relies only on the available modality.
- Published
- 2018
43. What do Deep Networks Like to See?
- Author
-
Palacio, Sebastian, Folz, Joachim, Hees, Jörn, Raue, Federico, Borth, Damian, and Dengel, Andreas
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Learning - Abstract
We propose a novel way to measure and understand convolutional neural networks by quantifying the amount of input signal they let in. To do this, an autoencoder (AE) was fine-tuned on gradients from a pre-trained classifier with fixed parameters. We compared the reconstructed samples from AEs that were fine-tuned on a set of image classifiers (AlexNet, VGG16, ResNet-50, and Inception~v3) and found substantial differences. The AE learns which aspects of the input space to preserve and which ones to ignore, based on the information encoded in the backpropagated gradients. Measuring the changes in accuracy when the signal of one classifier is used by a second one, a relation of total order emerges. This order depends directly on each classifier's input signal but it does not correlate with classification accuracy or network size. Further evidence of this phenomenon is provided by measuring the normalized mutual information between original images and auto-encoded reconstructions from different fine-tuned AEs. These findings break new ground in the area of neural network understanding, opening a new way to reason, debug, and interpret their results. We present four concrete examples in the literature where observations can now be explained in terms of the input signal that a model uses.
- Published
- 2018
44. Adversarial Defense based on Structure-to-Signal Autoencoders
- Author
-
Folz, Joachim, Palacio, Sebastian, Hees, Joern, Borth, Damian, and Dengel, Andreas
- Subjects
Computer Science - Learning ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Adversarial attack methods have demonstrated the fragility of deep neural networks. Their imperceptible perturbations are frequently able fool classifiers into potentially dangerous misclassifications. We propose a novel way to interpret adversarial perturbations in terms of the effective input signal that classifiers actually use. Based on this, we apply specially trained autoencoders, referred to as S2SNets, as defense mechanism. They follow a two-stage training scheme: first unsupervised, followed by a fine-tuning of the decoder, using gradients from an existing classifier. S2SNets induce a shift in the distribution of gradients propagated through them, stripping them from class-dependent signal. We analyze their robustness against several white-box and gray-box scenarios on the large ImageNet dataset. Our approach reaches comparable resilience in white-box attack scenarios as other state-of-the-art defenses in gray-box scenarios. We further analyze the relationships of AlexNet, VGG 16, ResNet 50 and Inception v3 in adversarial space, and found that VGG 16 is the easiest to fool, while perturbations from ResNet 50 are the most transferable.
- Published
- 2018
45. Field Studies with Multimedia Big Data: Opportunities and Challenges (Extended Version)
- Author
-
Krell, Mario Michael, Bernd, Julia, Li, Yifan, Ma, Daniel, Choi, Jaeyoung, Ellsworth, Michael, Borth, Damian, and Friedland, Gerald
- Subjects
Computer Science - Multimedia - Abstract
Social multimedia users are increasingly sharing all kinds of data about the world. They do this for their own reasons, not to provide data for field studies-but the trend presents a great opportunity for scientists. The Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset comprises 99 million images and nearly 800 thousand videos from Flickr, all shared under Creative Commons licenses. To enable scientists to leverage these media records for field studies, we propose a new framework that extracts targeted subcorpora from the YFCC100M, in a format usable by researchers who are not experts in big data retrieval and processing. This paper discusses a number of examples from the literature-as well as some entirely new ideas-of natural and social science field studies that could be piloted, supplemented, replicated, or conducted using YFCC100M data. These examples illustrate the need for a general new open-source framework for Multimedia Big Data Field Studies. There is currently a gap between the separate aspects of what multimedia researchers have shown to be possible with consumer-produced big data and the follow-through of creating a comprehensive field study framework that supports scientists across other disciplines. To bridge this gap, we must meet several challenges. For example, the framework must handle unlabeled and noisily labeled data to produce a filtered dataset for a scientist-who naturally wants it to be both as large and as clean as possible. This requires an iterative approach that provides access to statistical summaries and refines the search by constructing new classifiers. The first phase of our framework is available as Multimedia Commons Search, an intuitive interface that enables complex search queries at a large scale...
- Published
- 2017
46. Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks
- Author
-
Bischke, Benjamin, Helber, Patrick, Folz, Joachim, Borth, Damian, and Dengel, Andreas
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The increased availability of high resolution satellite imagery allows to sense very detailed structures on the surface of our planet. Access to such information opens up new directions in the analysis of remote sensing imagery. However, at the same time this raises a set of new challenges for existing pixel-based prediction methods, such as semantic segmentation approaches. While deep neural networks have achieved significant advances in the semantic segmentation of high resolution images in the past, most of the existing approaches tend to produce predictions with poor boundaries. In this paper, we address the problem of preserving semantic segmentation boundaries in high resolution satellite imagery by introducing a new cascaded multi-task loss. We evaluate our approach on Inria Aerial Image Labeling Dataset which contains large-scale and high resolution images. Our results show that we are able to outperform state-of-the-art methods by 8.3\% without any additional post-processing step.
- Published
- 2017
47. Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks
- Author
-
Schreyer, Marco, Sattarov, Timur, Borth, Damian, Dengel, Andreas, and Reimer, Bernd
- Subjects
Computer Science - Machine Learning ,Computer Science - Computational Engineering, Finance, and Science ,I.2.1 - Abstract
Learning to detect fraud in large-scale accounting data is one of the long-standing challenges in financial statement audits or fraud investigations. Nowadays, the majority of applied techniques refer to handcrafted rules derived from known fraud scenarios. While fairly successful, these rules exhibit the drawback that they often fail to generalize beyond known fraud scenarios and fraudsters gradually find ways to circumvent them. To overcome this disadvantage and inspired by the recent success of deep learning we propose the application of deep autoencoder neural networks to detect anomalous journal entries. We demonstrate that the trained network's reconstruction error obtainable for a journal entry and regularized by the entry's individual attribute probabilities can be interpreted as a highly adaptive anomaly assessment. Experiments on two real-world datasets of journal entries, show the effectiveness of the approach resulting in high f1-scores of 32.93 (dataset A) and 16.95 (dataset B) and less false positive alerts compared to state of the art baseline methods. Initial feedback received by chartered accountants and fraud examiners underpinned the quality of the approach in capturing highly relevant accounting anomalies., Comment: 19 pages, 6 figures, 3 tables
- Published
- 2017
48. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
- Author
-
Helber, Patrick, Bischke, Benjamin, Dengel, Andreas, and Borth, Damian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
In this paper, we address the challenge of land use and land cover classification using Sentinel-2 satellite images. The Sentinel-2 satellite images are openly and freely accessible provided in the Earth observation program Copernicus. We present a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes with in total 27,000 labeled and geo-referenced images. We provide benchmarks for this novel dataset with its spectral bands using state-of-the-art deep Convolutional Neural Network (CNNs). With the proposed novel dataset, we achieved an overall classification accuracy of 98.57%. The resulting classification system opens a gate towards a number of Earth observation applications. We demonstrate how this classification system can be used for detecting land use and land cover changes and how it can assist in improving geographical maps. The geo-referenced dataset EuroSAT is made publicly available at https://github.com/phelber/eurosat.
- Published
- 2017
49. An Evolutionary Algorithm to Learn SPARQL Queries for Source-Target-Pairs: Finding Patterns for Human Associations in DBpedia
- Author
-
Hees, Jörn, Bauer, Rouven, Folz, Joachim, Borth, Damian, and Dengel, Andreas
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Databases ,Computer Science - Neural and Evolutionary Computing ,Statistics - Machine Learning ,68Txx, 68T05, 68T10, 68T30, 05C85 ,I.2 ,I.2.4 ,I.2.6 ,I.5 ,I.5.2 ,I.5.3 ,G.2.2 - Abstract
Efficient usage of the knowledge provided by the Linked Data community is often hindered by the need for domain experts to formulate the right SPARQL queries to answer questions. For new questions they have to decide which datasets are suitable and in which terminology and modelling style to phrase the SPARQL query. In this work we present an evolutionary algorithm to help with this challenging task. Given a training list of source-target node-pair examples our algorithm can learn patterns (SPARQL queries) from a SPARQL endpoint. The learned patterns can be visualised to form the basis for further investigation, or they can be used to predict target nodes for new source nodes. Amongst others, we apply our algorithm to a dataset of several hundred human associations (such as "circle - square") to find patterns for them in DBpedia. We show the scalability of the algorithm by running it against a SPARQL endpoint loaded with > 7.9 billion triples. Further, we use the resulting SPARQL queries to mimic human associations with a Mean Average Precision (MAP) of 39.9 % and a Recall@10 of 63.9 %., Comment: 15 pages, 2 figures, as of 2016-09-13 6a19d5d7020770dc0711081ce2c1e52f71bf4b86
- Published
- 2016
- Full Text
- View/download PDF
50. AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis
- Author
-
Sager, Sebastian, Elizalde, Benjamin, Borth, Damian, Schulze, Christian, Raj, Bhiksha, and Lane, Ian
- Subjects
Computer Science - Sound ,Computer Science - Computation and Language - Abstract
Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels. A second contribution is to show the degree of correlation between the audio content and the labels through sound recognition experiments, which yielded results of 70% accuracy, hence also providing a performance benchmark. The results and study in this paper encourage further exploration of the nuances in audio and are meant to complement similar research performed on images and text in multimedia analysis., Comment: This paper is a revised version of "AudioSentibank: Large-scale Semantic Ontology of Acoustic Concepts for Audio Content Analysis"
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.