Author: "Filippone, Maurizio" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Filippone, Maurizio"' showing total 267 results

Start Over Author "Filippone, Maurizio" Publication Year Range Last 10 years

267 results on '"Filippone, Maurizio"'

1. Zero-shot Model-based Reinforcement Learning using Large Language Models

Author: Benechehab, Abdelhakim, Hili, Youssef Attia El, Odonnat, Ambroise, Zekri, Oussama, Thomas, Albert, Paolo, Giuseppe, Filippone, Maurizio, Redko, Ievgen, and Kégl, Balázs
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: The emerging zero-shot capabilities of Large Language Models (LLMs) have led to their applications in areas extending well beyond natural language processing tasks. In reinforcement learning, while LLMs have been extensively used in text-based environments, their integration with continuous state spaces remains understudied. In this paper, we investigate how pre-trained LLMs can be leveraged to predict in context the dynamics of continuous Markov decision processes. We identify handling multivariate data and incorporating the control signal as key challenges that limit the potential of LLMs' deployment in this setup and propose Disentangled In-Context Learning (DICL) to address them. We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning, supported by theoretical analysis of the proposed methods. Our experiments further demonstrate that our approach produces well-calibrated uncertainty estimates. We release the code at https://github.com/abenechehab/dicl.
Published: 2024

2. Robust Classification by Coupling Data Mollification with Label Smoothing

Author: Heinonen, Markus, Tran, Ba-Hien, Kampffmeyer, Michael, and Filippone, Maurizio
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Introducing training-time augmentations is a key technique to enhance generalization and prepare deep neural networks against test-time corruptions. Inspired by the success of generative diffusion models, we propose a novel approach coupling data augmentation, in the form of image noising and blurring, with label smoothing to align predicted label confidences with image degradation. The method is simple to implement, introduces negligible overheads, and can be combined with existing augmentations. We demonstrate improved robustness and uncertainty quantification on the corrupted image benchmarks of the CIFAR and TinyImageNet datasets., Comment: Under review
Published: 2024

3. A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning

Author: Benechehab, Abdelhakim, Thomas, Albert, Paolo, Giuseppe, Filippone, Maurizio, and Kégl, Balázs
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In model-based reinforcement learning, most algorithms rely on simulating trajectories from one-step models of the dynamics learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as the length of the trajectory grows. In this paper we tackle this issue by using a multi-step objective to train one-step models. Our objective is a weighted sum of the mean squared error (MSE) loss at various future horizons. We find that this new loss is particularly useful when the data is noisy (additive Gaussian noise in the observations), which is often the case in real-life environments. To support the multi-step loss, first we study its properties in two tractable cases: i) uni-dimensional linear system, and ii) two-parameter non-linear system. Second, we show in a variety of tasks (environments or datasets) that the models learned with this loss achieve a significant improvement in terms of the averaged R2-score on future prediction horizons. Finally, in the pure batch reinforcement learning setting, we demonstrate that one-step models serve as strong baselines when dynamics are deterministic, while multi-step models would be more advantageous in the presence of noise, highlighting the potential of our approach in real-world applications.
Published: 2024

4. Variational DAG Estimation via State Augmentation With Stochastic Permutations

Author: Bonilla, Edwin V., Elinas, Pantelis, Zhao, He, Filippone, Maurizio, Kitsios, Vassili, and O'Kane, Terry
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Estimating the structure of a Bayesian network, in the form of a directed acyclic graph (DAG), from observational data is a statistically and computationally hard problem with essential applications in areas such as causal discovery. Bayesian approaches are a promising direction for solving this task, as they allow for uncertainty quantification and deal with well-known identifiability issues. From a probabilistic inference perspective, the main challenges are (i) representing distributions over graphs that satisfy the DAG constraint and (ii) estimating a posterior over the underlying combinatorial space. We propose an approach that addresses these challenges by formulating a joint distribution on an augmented space of DAGs and permutations. We carry out posterior estimation via variational inference, where we exploit continuous relaxations of discrete distributions. We show that our approach performs competitively when compared with a wide range of Bayesian and non-Bayesian benchmarks on a range of synthetic and real datasets.
Published: 2024

5. Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

Author: Papamarkou, Theodore, Skoularidou, Maria, Palla, Konstantina, Aitchison, Laurence, Arbel, Julyan, Dunson, David, Filippone, Maurizio, Fortuin, Vincent, Hennig, Philipp, Hernández-Lobato, José Miguel, Hubin, Aliaksandr, Immer, Alexander, Karaletsos, Theofanis, Khan, Mohammad Emtiyaz, Kristiadi, Agustinus, Li, Yingzhen, Mandt, Stephan, Nemeth, Christopher, Osborne, Michael A., Rudner, Tim G. J., Rügamer, David, Teh, Yee Whye, Welling, Max, Wilson, Andrew Gordon, and Zhang, Ruqi
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential., Comment: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024
Published: 2024

6. Spatial Bayesian Neural Networks

Author: Zammit-Mangion, Andrew, Kaminski, Michael D., Tran, Ba-Hien, Filippone, Maurizio, and Cressie, Noel
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: interpretable, and well understood models that are routinely employed even though, as is revealed through prior and posterior predictive checks, these can poorly characterise the spatial heterogeneity in the underlying process of interest. Here, we propose a new, flexible class of spatial-process models, which we refer to as spatial Bayesian neural networks (SBNNs). An SBNN leverages the representational capacity of a Bayesian neural network; it is tailored to a spatial setting by incorporating a spatial ``embedding layer'' into the network and, possibly, spatially-varying network parameters. An SBNN is calibrated by matching its finite-dimensional distribution at locations on a fine gridding of space to that of a target process of interest. That process could be easy to simulate from or we may have many realisations from it. We propose several variants of SBNNs, most of which are able to match the finite-dimensional distribution of the target process at the selected grid better than conventional BNNs of similar complexity. We also show that an SBNN can be used to represent a variety of spatial processes often used in practice, such as Gaussian processes, lognormal processes, and max-stable processes. We briefly discuss the tools that could be used to make inference with SBNNs, and we conclude with a discussion of their advantages and limitations., Comment: 35 pages, 21 figures
Published: 2023

7. Multi-timestep models for Model-based Reinforcement Learning

Author: Benechehab, Abdelhakim, Paolo, Giuseppe, Thomas, Albert, Filippone, Maurizio, and Kégl, Balázs
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as length of the trajectory grows. In this paper we tackle this issue by using a multi-timestep objective to train one-step models. Our objective is a weighted sum of a loss function (e.g., negative log-likelihood) at various future horizons. We explore and test a range of weights profiles. We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score. This improvement is particularly noticeable when the models were evaluated on noisy data. Finally, using a soft actor-critic (SAC) agent in pure batch reinforcement learning (RL) and iterated batch RL scenarios, we found that our multi-timestep models outperform or match standard one-step models. This was especially evident in a noisy variant of the considered environment, highlighting the potential of our approach in real-world applications.
Published: 2023

8. One-Line-of-Code Data Mollification Improves Optimization of Likelihood-based Generative Models

Author: Tran, Ba-Hien, Franzese, Giulio, Michiardi, Pietro, and Filippone, Maurizio
Subjects: Computer Science - Machine Learning
Abstract: Generative Models (GMs) have attracted considerable attention due to their tremendous success in various domains, such as computer vision where they are capable to generate impressive realistic-looking images. Likelihood-based GMs are attractive due to the possibility to generate new data by a single model evaluation. However, they typically achieve lower sample quality compared to state-of-the-art score-based diffusion models (DMs). This paper provides a significant step in the direction of addressing this limitation. The idea is to borrow one of the strengths of score-based DMs, which is the ability to perform accurate density estimation in low-density regions and to address manifold overfitting by means of data mollification. We connect data mollification through the addition of Gaussian noise to Gaussian homotopy, which is a well-known technique to improve optimization. Data mollification can be implemented by adding one line of code in the optimization loop, and we demonstrate that this provides a boost in generation quality of likelihood-based GMs, without computational overheads. We report results on image data sets with popular likelihood-based GMs, including variants of variational autoencoders and normalizing flows, showing large improvements in FID score., Comment: NeurIPS 2023
Published: 2023

9. When is Importance Weighting Correction Needed for Covariate Shift Adaptation?

Author: Gogolashvili, Davit, Zecchin, Matteo, Kanagawa, Motonobu, Kountouris, Marios, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: This paper investigates when the importance weighting (IW) correction is needed to address covariate shift, a common situation in supervised learning where the input distributions of training and test data differ. Classic results show that the IW correction is needed when the model is parametric and misspecified. In contrast, recent results indicate that the IW correction may not be necessary when the model is nonparametric and well-specified. We examine the missing case in the literature where the model is nonparametric and misspecified, and show that the IW correction is needed for obtaining the best approximation of the true unknown function for the test distribution. We do this by analyzing IW-corrected kernel ridge regression, covering a variety of settings, including parametric and nonparametric models, well-specified and misspecified settings, and arbitrary weighting functions.
Published: 2023

10. Continuous-Time Functional Diffusion Processes

Author: Franzese, Giulio, Corallo, Giulio, Rossi, Simone, Heinonen, Markus, Filippone, Maurizio, and Michiardi, Pietro
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. FDPs require a new mathematical framework to describe the forward and backward dynamics, and several extensions to derive practical training objectives. These include infinite-dimensional versions of Girsanov theorem, in order to be able to compute an ELBO, and of the sampling theorem, in order to guarantee that functional evaluations in a countable set of points are equivalent to infinite-dimensional functions. We use FDPs to build a new breed of generative models in function spaces, which do not require specialized network architectures, and that can work with any kind of continuous data. Our results on real data show that FDPs achieve high-quality image generation, using a simple MLP architecture with orders of magnitude fewer parameters than existing diffusion models., Comment: Under review
Published: 2023

11. Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes.

Author: Tran, Ba-Hien, Shahbaba, Babak, Mandt, Stephan, and Filippone, Maurizio
Abstract: We present a fully Bayesian autoencoder model that treats both local latent variables and global decoder parameters in a Bayesian fashion. This approach allows for flexible priors and posterior approximations while keeping the inference costs low. To achieve this, we introduce an amortized MCMC approach by utilizing an implicit stochastic network to learn sampling from the posterior over local latent variables. Furthermore, we extend the model by incorporating a Sparse Gaussian Process prior over the latent space, allowing for a fully Bayesian treatment of inducing points and kernel hyperparameters and leading to improved scalability. Additionally, we enable Deep Gaussian Process priors on the latent space and the handling of missing data. We evaluate our model on a range of experiments focusing on dynamic representation learning and generative modeling, demonstrating the strong performance of our approach in comparison to existing methods that combine Gaussian Processes and autoencoders.
Published: 2023

12. Locally Smoothed Gaussian Process Regression

Author: Gogolashvili, Davit, Kozyrskiy, Bogdan, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We develop a novel framework to accelerate Gaussian process regression (GPR). In particular, we consider localization kernels at each data point to down-weigh the contributions from other data points that are far away, and we derive the GPR model stemming from the application of such localization operation. Through a set of experiments, we demonstrate the competitive performance of the proposed approach compared to full GPR, other localized models, and deep Gaussian processes. Crucially, these performances are obtained with considerable speedups compared to standard global GPR due to the sparsification effect of the Gram matrix induced by the localization operation.
Published: 2022

13. How Much is Enough? A Study on Diffusion Times in Score-based Generative Models

Author: Franzese, Giulio, Rossi, Simone, Yang, Lixuan, Finamore, Alessandro, Rossi, Dario, Filippone, Maurizio, and Michiardi, Pietro
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Score-based diffusion models are a class of generative models whose dynamics is described by stochastic differential equations that map noise into data. While recent works have started to lay down a theoretical foundation for these models, an analytical understanding of the role of the diffusion time T is still lacking. Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution; however, a smaller value of T should be preferred for a better approximation of the score-matching objective and higher computational efficiency. Starting from a variational interpretation of diffusion models, in this work we quantify this trade-off, and suggest a new method to improve quality and efficiency of both training and sampling, by adopting smaller diffusion times. Indeed, we show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process. Empirical results support our analysis; for image data, our method is competitive w.r.t. the state-of-the-art, according to standard sample quality metrics and log-likelihood.
Published: 2022
Full Text: View/download PDF

14. Local Random Feature Approximations of the Gaussian Kernel

Author: Wacker, Jonas and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Computation
Abstract: A fundamental drawback of kernel-based statistical models is their limited scalability to large data sets, which requires resorting to approximations. In this work, we focus on the popular Gaussian kernel and on techniques to linearize kernel-based models by means of random feature approximations. In particular, we do so by studying a less explored random feature approximation based on Maclaurin expansions and polynomial sketches. We show that such approaches yield poor results when modelling high-frequency data, and we propose a novel localization scheme that improves kernel approximations and downstream performance significantly in this regime. We demonstrate these gains on a number of experiments involving the application of Gaussian process regression to synthetic and real-world data of different data sizes and dimensions., Comment: 11 pages
Published: 2022

15. Complex-to-Real Sketches for Tensor Products with Applications to the Polynomial Kernel

Author: Wacker, Jonas, Ohana, Ruben, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Computation
Abstract: Randomized sketches of a tensor product of $p$ vectors follow a tradeoff between statistical efficiency and computational acceleration. Commonly used approaches avoid computing the high-dimensional tensor product explicitly, resulting in a suboptimal dependence of $\mathcal{O}(3^p)$ in the embedding dimension. We propose a simple Complex-to-Real (CtR) modification of well-known sketches that replaces real random projections by complex ones, incurring a lower $\mathcal{O}(2^p)$ factor in the embedding dimension. The output of our sketches is real-valued, which renders their downstream use straightforward. In particular, we apply our sketches to $p$-fold self-tensored inputs corresponding to the feature maps of the polynomial kernel. We show that our method achieves state-of-the-art performance in terms of accuracy and speed compared to other randomized approximations from the literature., Comment: 32 pages
Published: 2022

16. Improved Random Features for Dot Product Kernels

Author: Wacker, Jonas, Kanagawa, Motonobu, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Computation
Abstract: Dot product kernels, such as polynomial and exponential (softmax) kernels, are among the most widely used kernels in machine learning, as they enable modeling the interactions between input features, which is crucial in applications like computer vision, natural language processing, and recommender systems. We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels, to make these kernels more useful in large scale learning. First, we present a generalization of existing random feature approximations for polynomial kernels, such as Rademacher and Gaussian sketches and TensorSRHT, using complex-valued random features. We show empirically that the use of complex features can significantly reduce the variances of these approximations. Second, we provide a theoretical analysis for understanding the factors affecting the efficiency of various random feature approximations, by deriving closed-form expressions for their variances. These variance formulas elucidate conditions under which certain approximations (e.g., TensorSRHT) achieve lower variances than others (e.g., Rademacher sketches), and conditions under which the use of complex features leads to lower variances than real features. Third, by using these variance formulas, which can be evaluated in practice, we develop a data-driven optimization approach to improve random feature approximations for general dot product kernels, which is also applicable to the Gaussian kernel. We describe the improvements brought by these contributions with extensive experiments on a variety of tasks and datasets., Comment: To appear in Journal of Machine Learning Research (JMLR)
Published: 2022

17. Revisiting the Effects of Stochasticity for Hamiltonian Samplers

Author: Franzese, Giulio, Milios, Dimitrios, Filippone, Maurizio, and Michiardi, Pietro
Subjects: Computer Science - Machine Learning, Statistics - Computation
Abstract: We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is $\mathcal{O}(\eta^2)$, with $\eta$ being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.
Published: 2021

18. Model Selection for Bayesian Autoencoders

Author: Tran, Ba-Hien, Rossi, Simone, Milios, Dimitrios, Michiardi, Pietro, Bonilla, Edwin V., and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We develop a novel method for carrying out model selection for Bayesian autoencoders (BAEs) by means of prior hyper-parameter optimization. Inspired by the common practice of type-II maximum likelihood optimization and its equivalence to Kullback-Leibler divergence minimization, we propose to optimize the distributional sliced-Wasserstein distance (DSWD) between the output of the autoencoder and the empirical data distribution. The advantages of this formulation are that we can estimate the DSWD based on samples and handle high-dimensional problems. We carry out posterior estimation of the BAE parameters via stochastic gradient Hamiltonian Monte Carlo and turn our BAE into a generative model by fitting a flexible Dirichlet mixture model in the latent space. Consequently, we obtain a powerful alternative to variational autoencoders, which are the preferred choice in modern applications of autoencoders for representation learning with uncertainty. We evaluate our approach qualitatively and quantitatively using a vast experimental campaign on a number of unsupervised learning tasks and show that, in small-data regimes where priors matter, our approach provides state-of-the-art results, outperforming multiple competitive baselines.
Published: 2021

19. Spatial Bayesian neural networks

Author: Zammit-Mangion, Andrew, Kaminski, Michael D., Tran, Ba-Hien, Filippone, Maurizio, and Cressie, Noel
Published: 2024
Full Text: View/download PDF

20. All You Need is a Good Functional Prior for Bayesian Deep Learning

Author: Tran, Ba-Hien, Rossi, Simone, Milios, Dimitrios, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: The Bayesian treatment of neural networks dictates that a prior distribution is specified over their weight and bias parameters. This poses a challenge because modern neural networks are characterized by a large number of parameters, and the choice of these priors has an uncontrolled effect on the induced functional prior, which is the distribution of the functions obtained by sampling the parameters from their prior distribution. We argue that this is a hugely limiting aspect of Bayesian deep learning, and this work tackles this limitation in a practical and effective way. Our proposal is to reason in terms of functional priors, which are easier to elicit, and to "tune" the priors of neural network parameters in a way that they reflect such functional priors. Gaussian processes offer a rigorous framework to define prior distributions over functions, and we propose a novel and robust framework to match their prior with the functional prior of neural networks based on the minimization of their Wasserstein distance. We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches. We consider this work a considerable step in the direction of making the long-standing challenge of carrying out a fully Bayesian treatment of neural networks, including convolutional neural networks, a concrete possibility.
Published: 2020

21. Sparse within Sparse Gaussian Processes using Neighbor Information

Author: Tran, Gia-Lac, Milios, Dimitrios, Michiardi, Pietro, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Approximations to Gaussian processes based on inducing variables, combined with variational inference techniques, enable state-of-the-art sparse approaches to infer GPs at scale through mini batch-based learning. In this work, we address one limitation of sparse GPs, which is due to the challenge in dealing with a large number of inducing variables without imposing a special structure on the inducing inputs. In particular, we introduce a novel hierarchical prior, which imposes sparsity on the set of inducing variables. We treat our model variationally, and we experimentally show considerable computational gains compared to standard sparse GPs when sparsity on the inducing variables is realized considering the nearest inducing inputs of a random mini-batch of the data. We perform an extensive experimental validation that demonstrates the effectiveness of our approach compared to the state-of-the-art. Our approach enables the possibility to use sparse GPs using a large number of inducing points without incurring a prohibitive computational cost., Comment: 10 pages
Published: 2020

22. An Identifiable Double VAE For Disentangled Representations

Author: Mita, Graziano, Filippone, Maurizio, and Michiardi, Pietro
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: A large part of the literature on learning disentangled representations focuses on variational autoencoders (VAE). Recent developments demonstrate that disentanglement cannot be obtained in a fully unsupervised setting without inductive biases on models and data. However, Khemakhem et al., AISTATS, 2020 suggest that employing a particular form of factorized prior, conditionally dependent on auxiliary variables complementing input observations, can be one such bias, resulting in an identifiable model with guarantees on disentanglement. Working along this line, we propose a novel VAE-based generative model with theoretical guarantees on identifiability. We obtain our conditional prior over the latents by learning an optimal representation, which imposes an additional strength on their regularization. We also extend our method to semi-supervised settings. Experimental results indicate superior performance with respect to state-of-the-art approaches, according to several established metrics proposed in the literature on disentanglement.
Published: 2020

23. Isotropic SGD: a Practical Approach to Bayesian Posterior Sampling

Author: Franzese, Giulio, Candela, Rosa, Milios, Dimitrios, Filippone, Maurizio, and Michiardi, Pietro
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning, 65C05, G.3
Abstract: In this work we define a unified mathematical framework to deepen our understanding of the role of stochastic gradient (SG) noise on the behavior of Markov chain Monte Carlo sampling (SGMCMC) algorithms. Our formulation unlocks the design of a novel, practical approach to posterior sampling, which makes the SG noise isotropic using a fixed learning rate that we determine analytically, and that requires weaker assumptions than existing algorithms. In contrast, the common traits of existing \sgmcmc algorithms is to approximate the isotropy condition either by drowning the gradients in additive noise (annealing the learning rate) or by making restrictive assumptions on the \sg noise covariance and the geometry of the loss landscape. Extensive experimental validations indicate that our proposal is competitive with the state-of-the-art on \sgmcmc, while being much more practical to use.
Published: 2020

24. A Variational View on Bootstrap Ensembles as Bayesian Inference

Author: Milios, Dimitrios, Michiardi, Pietro, and Filippone, Maurizio
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper, we employ variational arguments to establish a connection between ensemble methods for Neural Networks and Bayesian inference. We consider an ensemble-based scheme where each model/particle corresponds to a perturbation of the data by means of parametric bootstrap and a perturbation of the prior. We derive conditions under which any optimization steps of the particles makes the associated distribution reduce its divergence to the posterior over model parameters. Such conditions do not require any particular form for the approximation and they are purely geometrical, giving insights on the behavior of the ensemble on a number of interesting models such as Neural Networks with ReLU activations. Experiments confirm that ensemble methods can be a valid alternative to approximate Bayesian inference; the theoretical developments in the paper seek to explain this behavior.
Published: 2020

25. Model Monitoring and Dynamic Model Selection in Travel Time-series Forecasting

Author: Candela, Rosa, Michiardi, Pietro, Filippone, Maurizio, and Zuluaga, Maria A.
Subjects: Statistics - Applications
Abstract: Accurate travel products price forecasting is a highly desired feature that allows customers to take informed decisions about purchases, and companies to build and offer attractive tour packages. Thanks to machine learning (ML), it is now relatively cheap to develop highly accurate statistical models for price time-series forecasting. However, once models are deployed in production, it is their monitoring, maintenance and improvement which carry most of the costs and difficulties over time. We introduce a data-driven framework to continuously monitor and maintain deployed time-series forecasting models' performance, to guarantee stable performance of travel products price forecasting models. Under a supervised learning approach, we predict the errors of time-series forecasting models over time, and use this predicted performance measure to achieve both model monitoring and maintenance. We validate the proposed method on a dataset of 18K time-series from flight and hotel prices collected over two years and on two public benchmarks.
Published: 2020
Full Text: View/download PDF

26. Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations

Author: Rossi, Simone, Heinonen, Markus, Bonilla, Edwin V., Shen, Zheyang, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Variational inference techniques based on inducing variables provide an elegant framework for scalable posterior estimation in Gaussian process (GP) models. Besides enabling scalability, one of their main advantages over sparse approximations using direct marginal likelihood maximization is that they provide a robust alternative for point estimation of the inducing inputs, i.e. the location of the inducing variables. In this work we challenge the common wisdom that optimizing the inducing inputs in the variational framework yields optimal performance. We show that, by revisiting old model approximations such as the fully-independent training conditionals endowed with powerful sampling-based inference methods, treating both inducing locations and GP hyper-parameters in a Bayesian way can improve performance significantly. Based on stochastic gradient Hamiltonian Monte Carlo, we develop a fully Bayesian approach to scalable GP and deep GP models, and demonstrate its state-of-the-art performance through an extensive experimental campaign across several regression and classification problems.
Published: 2020

27. Efficient Approximate Inference with Walsh-Hadamard Variational Inference

Author: Rossi, Simone, Marmin, Sebastien, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Variational inference offers scalable and flexible tools to tackle intractable Bayesian inference of modern statistical models like Bayesian neural networks and Gaussian processes. For largely over-parameterized models, however, the over-regularization property of the variational objective makes the application of variational inference challenging. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference, which uses Walsh-Hadamard-based factorization strategies to reduce model parameterization, accelerate computations, and increase the expressiveness of the approximate posterior beyond fully factorized ones., Comment: Paper accepted at the 4th Workshop on Bayesian Deep Learning (NeurIPS 2019), Vancouver, Canada. arXiv admin note: substantial text overlap with arXiv:1905.11248
Published: 2019

28. LIBRE: Learning Interpretable Boolean Rule Ensembles

Author: Mita, Graziano, Papotti, Paolo, Filippone, Maurizio, and Michiardi, Pietro
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We present a novel method - LIBRE - to learn an interpretable classifier, which materializes as a set of Boolean rules. LIBRE uses an ensemble of bottom-up weak learners operating on a random subset of features, which allows for the learning of rules that generalize well on unseen data even in imbalanced settings. Weak learners are combined with a simple union so that the final ensemble is also interpretable. Experimental results indicate that LIBRE efficiently strikes the right balance between prediction accuracy, which is competitive with black box methods, and interpretability, which is often superior to alternative methods from the literature.
Published: 2019

29. Kernel computations from large-scale random features obtained by Optical Processing Units

Author: Ohana, Ruben, Wacker, Jonas, Dong, Jonathan, Marmin, Sébastien, Krzakala, Florent, Filippone, Maurizio, and Daudet, Laurent
Subjects: Computer Science - Emerging Technologies, Computer Science - Machine Learning
Abstract: Approximating kernel functions with random features (RFs)has been a successful application of random projections for nonparametric estimation. However, performing random projections presents computational challenges for large-scale problems. Recently, a new optical hardware called Optical Processing Unit (OPU) has been developed for fast and energy-efficient computation of large-scale RFs in the analog domain. More specifically, the OPU performs the multiplication of input vectors by a large random matrix with complex-valued i.i.d. Gaussian entries, followed by the application of an element-wise squared absolute value operation - this last nonlinearity being intrinsic to the sensing process. In this paper, we show that this operation results in a dot-product kernel that has connections to the polynomial kernel, and we extend this computation to arbitrary powers of the feature map. Experiments demonstrate that the OPU kernel and its RF approximation achieve competitive performance in applications using kernel ridge regression and transfer learning for image classification. Crucially, thanks to the use of the OPU, these results are obtained with time and energy savings., Comment: 5 pages, 3 figures, submitted to ICASSP 2020
Published: 2019
Full Text: View/download PDF

30. Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD

Author: Candela, Rosa, Franzese, Giulio, Filippone, Maurizio, and Michiardi, Pietro
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Large scale machine learning is increasingly relying on distributed optimization, whereby several machines contribute to the training process of a statistical model. In this work we study the performance of asynchronous, distributed settings, when applying sparsification, a technique used to reduce communication overheads. In particular, for the first time in an asynchronous, non-convex setting, we theoretically prove that, in presence of staleness, sparsification does not harm SGD performance: the ergodic convergence rate matches the known result of standard SGD, that is $\mathcal{O} \left( 1/\sqrt{T} \right)$. We also carry out an empirical study to complement our theory, and confirm that the effects of sparsification on the convergence rate are negligible, when compared to 'vanilla' SGD, even in the challenging scenario of an asynchronous, distributed system.
Published: 2019

31. Deep Compositional Spatial Models

Author: Zammit-Mangion, Andrew, Ng, Tin Lok James, Vu, Quan, and Filippone, Maurizio
Subjects: Statistics - Methodology, Statistics - Computation, Statistics - Machine Learning
Abstract: Spatial processes with nonstationary and anisotropic covariance structure are often used when modelling, analysing and predicting complex environmental phenomena. Such processes may often be expressed as ones that have stationary and isotropic covariance structure on a warped spatial domain. However, the warping function is generally difficult to fit and not constrained to be injective, often resulting in `space-folding.' Here, we propose modelling an injective warping function through a composition of multiple elemental injective functions in a deep-learning framework. We consider two cases; first, when these functions are known up to some weights that need to be estimated, and, second, when the weights in each layer are random. Inspired by recent methodological and technological advances in deep learning and deep Gaussian processes, we employ approximate Bayesian methods to make inference with these models using graphics processing units. Through simulation studies in one and two dimensions we show that the deep compositional spatial models are quick to fit, and are able to provide better predictions and uncertainty quantification than other deep stochastic models of similar complexity. We also show their remarkable capacity to model nonstationary, anisotropic spatial data using radiances from the MODIS instrument aboard the Aqua satellite., Comment: 46 pages, 14 figures
Published: 2019

32. Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Author: Rossi, Simone, Marmin, Sebastien, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Over-parameterized models, such as DeepNets and ConvNets, form a class of models that are routinely adopted in a wide variety of applications, and for which Bayesian inference is desirable but extremely challenging. Variational inference offers the tools to tackle this challenge in a scalable way and with some degree of flexibility on the approximation, but for over-parameterized models this is challenging due to the over-regularization property of the variational objective. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference (WHVI), which uses Walsh-Hadamard-based factorization strategies to reduce the parameterization and accelerate computations, thus avoiding over-regularization issues with the variational objective. Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference.
Published: 2019

33. A comparative evaluation of novelty detection algorithms for discrete sequences

Author: Domingues, Rémi, Michiardi, Pietro, Barlet, Jérémie, and Filippone, Maurizio
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning, I.2.6
Abstract: The identification of anomalies in temporal data is a core component of numerous research areas such as intrusion detection, fault prevention, genomics and fraud detection. This article provides an experimental comparison of the novelty detection problem applied to discrete sequences. The objective of this study is to identify which state-of-the-art methods are efficient and appropriate candidates for a given use case. These recommendations rely on extensive novelty detection experiments based on a variety of public datasets in addition to novel industrial datasets. We also perform thorough scalability and memory usage tests resulting in new supplementary insights of the methods' performance, key selection criterion to solve problems relying on large volumes of data and to meet the expectations of applications subject to strict response time constraints., Comment: Submitted to Artificial Intelligence Review journal; 24 pages, 4 tables, 11 figures
Published: 2019
Full Text: View/download PDF

34. Variational Calibration of Computer Models

Author: Marmin, Sébastien and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Applications, Statistics - Methodology
Abstract: Bayesian calibration of black-box computer models offers an established framework to obtain a posterior distribution over model parameters. Traditional Bayesian calibration involves the emulation of the computer model and an additive model discrepancy term using Gaussian processes; inference is then carried out using MCMC. These choices pose computational and statistical challenges and limitations, which we overcome by proposing the use of approximate Deep Gaussian processes and variational inference techniques. The result is a practical and scalable framework for calibration, which obtains competitive performance compared to the state-of-the-art.
Published: 2018

35. Good Initializations of Variational Bayes for Deep Models

Author: Rossi, Simone, Michiardi, Pietro, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Stochastic variational inference is an established way to carry out approximate Bayesian inference for deep models. While there have been effective proposals for good initializations for loss minimization in deep learning, far less attention has been devoted to the issue of initialization of stochastic variational inference. We address this by proposing a novel layer-wise initialization strategy based on Bayesian linear models. The proposed method is extensively validated on regression and classification tasks, including Bayesian DeepNets and ConvNets, showing faster and better convergence compared to alternatives inspired by the literature on initializations for loss minimization., Comment: 8 pages of main paper (+3 for references and +6 of supplement material)
Published: 2018

36. Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification

Author: Milios, Dimitrios, Camoriano, Raffaello, Michiardi, Pietro, Rosasco, Lorenzo, and Filippone, Maurizio
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: In this paper, we study the problem of deriving fast and accurate classification algorithms with uncertainty quantification. Gaussian process classification provides a principled approach, but the corresponding computational burden is hardly sustainable in large-scale problems and devising efficient alternatives is a challenge. In this work, we investigate if and how Gaussian process regression directly applied to the classification labels can be used to tackle this question. While in this case training time is remarkably faster, predictions need be calibrated for classification and uncertainty estimation. To this aim, we propose a novel approach based on interpreting the labels as the output of a Dirichlet distribution. Extensive experimental results show that the proposed approach provides essentially the same accuracy and uncertainty quantification of Gaussian process classification while requiring only a fraction of computational resources.
Published: 2018

37. Calibrating Deep Convolutional Gaussian Processes

Author: Tran, Gia-Lac, Bonilla, Edwin V., Cunningham, John P., Michiardi, Pietro, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: The wide adoption of Convolutional Neural Networks (CNNs) in applications where decision-making under uncertainty is fundamental, has brought a great deal of attention to the ability of these models to accurately quantify the uncertainty in their predictions. Previous work on combining CNNs with Gaussian processes (GPs) has been developed under the assumption that the predictive probabilities of these models are well-calibrated. In this paper we show that, in fact, current combinations of CNNs and GPs are miscalibrated. We proposes a novel combination that considerably outperforms previous approaches on this aspect, while achieving state-of-the-art performance on image classification tasks., Comment: 12 pages
Published: 2018

38. Constraining the Dynamics of Deep Probabilistic Models

Author: Lorenzi, Marco and Filippone, Maurizio
Subjects: Statistics - Machine Learning
Abstract: We introduce a novel generative formulation of deep probabilistic models implementing "soft" constraints on their function dynamics. In particular, we develop a flexible methodological framework where the modeled functions and derivatives of a given order are subject to inequality or equality constraints. We then characterize the posterior distribution over model and constraint parameters through stochastic variational inference. As a result, the proposed approach allows for accurate and scalable uncertainty quantification on the predictions and on all parameters. We demonstrate the application of equality constraints in the challenging problem of parameter inference in ordinary differential equation models, while we showcase the application of inequality constraints on the problem of monotonic regression of count data. The proposed approach is extensively tested in several experimental settings, leading to highly competitive results in challenging modeling applications, while offering high expressiveness, flexibility and scalability., Comment: 13 pages
Published: 2018

39. Assessing Bayesian Nonparametric Log-Linear Models: an application to Disclosure Risk estimation

Author: Carota, Cinzia, Filippone, Maurizio, and Polettini, Silvia
Subjects: Statistics - Methodology, Statistics - Applications
Abstract: We present a method for identification of models with good predictive performances in the family of Bayesian log-linear mixed models with Dirichlet process random effects. Such a problem arises in many different applications; here we consider it in the context of disclosure risk estimation, an increasingly relevant issue raised by the increasing demand for data collected under a pledge of confidentiality. Two different criteria are proposed and jointly used via a two-stage selection procedure, in a M-open view. The first stage is devoted to identifying a path of search; then, at the second, a small number of nonparametric models is evaluated through an application-specific score based Bayesian information criterion. We test our method on a variety of contingency tables based on microdata samples from the US Census Bureau and the Italian National Security Administration, treated here as populations, and carefully discuss its features. This leads us to a journey around different forms and sources of bias along which we show that (i) while based on the so called "score+search" paradigm, our method is by construction well protected from the selection-induced bias, and (ii) models with good performances are invariably characterized by an extraordinarily simple structure of fixed effects. The complexity of model selection - a very challenging and difficult task in a strictly parametric context with large and sparse tables - is therefore significantly defused by our approach. An attractive collateral result of our analysis are fruitful new ideas about modeling in small area estimation problems, where interest is in total counts over cells with a small number of observations., Comment: 32 pages, 7 figures
Published: 2018

40. Decentralized Deep Scheduling for Interference Channels

Author: de Kerret, Paul, Gesbert, David, and Filippone, Maurizio
Subjects: Computer Science - Information Theory
Abstract: In this paper, we study the problem of decentralized scheduling in Interference Channels (IC). In this setting, each Transmitter (TX) receives an arbitrary amount of feedback regarding the global multi-user channel state based on which it decides whether to transmit or to stay silent without any form of communication with the other TXs. While many methods have been proposed to tackle the problem of link scheduling in the presence of reliable Channel State Information (CSI), finding the optimally robust transmission strategy in the presence of arbitrary channel uncertainties at each TX has remained elusive for the past years. In this work, we recast the link scheduling problem as a decentralized classification problem and we propose the use of Collaborative Deep Neural Networks (C-DNNs) to solve this problem. After adequate training, the scheduling obtained using the C-DNNs flexibly adapts to the decentralized CSI configuration to outperform other scheduling algorithms., Comment: Submitted to the 2018 IEEE International Conference on Communications (ICC)
Published: 2017

41. Pseudo-extended Markov chain Monte Carlo

Author: Nemeth, Christopher, Lindsten, Fredrik, Filippone, Maurizio, and Hensman, James
Subjects: Statistics - Methodology, Statistics - Computation, Statistics - Machine Learning
Abstract: Sampling from posterior distributions using Markov chain Monte Carlo (MCMC) methods can require an exhaustive number of iterations, particularly when the posterior is multi-modal as the MCMC sampler can become trapped in a local mode for a large number of iterations. In this paper, we introduce the pseudo-extended MCMC method as a simple approach for improving the mixing of the MCMC sampler for multi-modal posterior distributions. The pseudo-extended method augments the state-space of the posterior using pseudo-samples as auxiliary variables. On the extended space, the modes of the posterior are connected, which allows the MCMC sampler to easily move between well-separated posterior modes. We demonstrate that the pseudo-extended approach delivers improved MCMC sampling over the Hamiltonian Monte Carlo algorithm on multi-modal posteriors, including Boltzmann machines and models with sparsity-inducing priors., Comment: Advances in Neural Information Processing Systems 2019
Published: 2017

42. Entropic Trace Estimates for Log Determinants

Author: Fitzsimons, Jack, Granziol, Diego, Cutajar, Kurt, Osborne, Michael, Filippone, Maurizio, and Roberts, Stephen
Subjects: Computer Science - Numerical Analysis, Computer Science - Information Theory, Statistics - Computation, Statistics - Machine Learning
Abstract: The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stochastic trace estimation. The estimates demonstrate a significant improvement on state-of-the-art alternative methods, as shown on a wide variety of UFL sparse matrices. By taking the example of a general Markov random field, we also demonstrate how this approach can significantly accelerate inference in large-scale learning methods involving the log determinant., Comment: 16 pages, 4 figures, 2 tables, 2 algorithms
Published: 2017

43. Bayesian Inference of Log Determinants

Author: Fitzsimons, Jack, Cutajar, Kurt, Osborne, Michael, Roberts, Stephen, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Numerical Analysis, Statistics - Computation
Abstract: The log-determinant of a kernel matrix appears in a variety of machine learning problems, ranging from determinantal point processes and generalized Markov random fields, through to the training of Gaussian processes. Exact calculation of this term is often intractable when the size of the kernel matrix exceeds a few thousand. In the spirit of probabilistic numerics, we reinterpret the problem of computing the log-determinant as a Bayesian inference problem. In particular, we combine prior knowledge in the form of bounds from matrix theory and evidence derived from stochastic trace estimation to obtain probabilistic estimates for the log-determinant and its associated uncertainty within a given computational budget. Beyond its novelty and theoretic appeal, the performance of our proposal is competitive with state-of-the-art approaches to approximating the log-determinant, while also quantifying the uncertainty due to budget-constrained evidence., Comment: 12 pages, 3 figures
Published: 2017

44. Disease Progression Modeling and Prediction through Random Effect Gaussian Processes and Time Transformation

Author: Lorenzi, Marco, Filippone, Maurizio, Alexander, Daniel C., and Ourselin, Sebastien
Subjects: Statistics - Applications
Abstract: The development of statistical approaches for the joint modelling of the temporal changes of imaging, biochemical, and clinical biomarkers is of paramount importance for improving the understanding of neurodegenerative disorders, and for providing a reference for the prediction and quantification of the pathology in unseen individuals. Nonetheless, the use of disease progression models for probabilistic predictions still requires investigation, for example for accounting for missing observations in clinical data, and for accurate uncertainty quantification. We tackle this problem by proposing a novel Gaussian process-based method for the joint modeling of imaging and clinical biomarker progressions from time series of individual observations. The model is formulated to account for individual random effects and time reparameterization, allowing non-parametric estimates of the biomarker evolution, as well as high flexibility in specifying correlation structure, and time transformation models. Thanks to the Bayesian formulation, the model naturally accounts for missing data, and allows for uncertainty quantification in the estimate of evolutions, as well as for probabilistic prediction of disease staging in unseen patients. The experimental results show that the proposed model provides a biologically plausible description of the evolution of Alzheimer's pathology across the whole disease time-span as well as remarkable predictive performance when tested on a large clinical cohort with missing observations., Comment: 13 pages, 2 figures
Published: 2017
Full Text: View/download PDF

45. Model Monitoring and Dynamic Model Selection in Travel Time-Series Forecasting

Author: Candela, Rosa, Michiardi, Pietro, Filippone, Maurizio, Zuluaga, Maria A., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Dong, Yuxiao, editor, Mladenić, Dunja, editor, and Saunders, Craig, editor
Published: 2021
Full Text: View/download PDF

46. Variational Bootstrap for Classification

Author: Kozyrskiy, Bogdan, Milios, Dimitrios, and Filippone, Maurizio
Published: 2022
Full Text: View/download PDF

47. AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models

Author: Krauth, Karl, Bonilla, Edwin V., Cutajar, Kurt, and Filippone, Maurizio
Subjects: Statistics - Machine Learning
Abstract: We investigate the capabilities and limitations of Gaussian process models by jointly exploring three complementary directions: (i) scalable and statistically efficient inference; (ii) flexible kernels; and (iii) objective functions for hyperparameter learning alternative to the marginal likelihood. Our approach outperforms all previously reported GP methods on the standard MNIST dataset; performs comparatively to previous kernel-based methods using the RECTANGLES-IMAGE dataset; and breaks the 1% error-rate barrier in GP models using the MNIST8M dataset, showing along the way the scalability of our method at unprecedented scale for GP models (8 million observations) in classification problems. Overall, our approach represents a significant breakthrough in kernel methods and GP models, bridging the gap between deep learning approaches and kernel machines., Comment: Edited results on RECTANGLES-IMAGE and related comments; minor additional edits
Published: 2016

48. Random Feature Expansions for Deep Gaussian Processes

Author: Cutajar, Kurt, Bonilla, Edwin V., Michiardi, Pietro, and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Statistics - Computation
Abstract: The composition of multiple Gaussian Processes as a Deep Gaussian Process (DGP) enables a deep probabilistic nonparametric approach to flexibly tackle complex machine learning problems with sound quantification of uncertainty. Existing inference approaches for DGP models have limited scalability and are notoriously cumbersome to construct. In this work, we introduce a novel formulation of DGPs based on random feature expansions that we train using stochastic variational inference. This yields a practical learning framework which significantly advances the state-of-the-art in inference for DGPs, and enables accurate quantification of uncertainty. We extensively showcase the scalability and performance of our proposal on several datasets with up to 8 million observations, and various DGP architectures with up to 30 hidden layers.
Published: 2016

49. Mini-Batch Spectral Clustering

Author: Han, Yufei and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: The cost of computing the spectrum of Laplacian matrices hinders the application of spectral clustering to large data sets. While approximations recover computational tractability, they can potentially affect clustering performance. This paper proposes a practical approach to learn spectral clustering based on adaptive stochastic gradient optimization. Crucially, the proposed approach recovers the exact spectrum of Laplacian matrices in the limit of the iterations, and the cost of each iteration is linear in the number of samples. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering for a given computational budget.
Published: 2016

50. Preconditioning Kernel Matrices

Author: Cutajar, Kurt, Osborne, Michael A., Cunningham, John P., and Filippone, Maurizio
Subjects: Statistics - Machine Learning, Statistics - Computation, Statistics - Methodology
Abstract: The computational and storage complexity of kernel machines presents the primary barrier to their scaling to large, modern, datasets. A common way to tackle the scalability issue is to use the conjugate gradient algorithm, which relieves the constraints on both storage (the kernel matrix need not be stored) and computation (both stochastic gradients and parallelization can be used). Even so, conjugate gradient is not without its own issues: the conditioning of kernel matrices is often such that conjugate gradients will have poor convergence in practice. Preconditioning is a common approach to alleviating this issue. Here we propose preconditioned conjugate gradients for kernel machines, and develop a broad range of preconditioners particularly useful for kernel matrices. We describe a scalable approach to both solving kernel machines and learning their hyperparameters. We show this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

267 results on '"Filippone, Maurizio"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources