Author: "Murray, Iain A." / Topic: fos: computer and information sciences - Searchworks@Jio Institute Digital Library Search Results

1. Lossless compression with state space models using bits back coding

Author: Townsend, James and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Information Theory (cs.IT), Computer Science - Information Theory, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: We generalize the 'bits back with ANS' method to time-series models with a latent Markov structure. This family of models includes hidden Markov models (HMMs), linear Gaussian state space models (LGSSMs) and many more. We provide experimental evidence that our method is effective for small scale models, and discuss its applicability to larger scale settings such as video compression.
Published: 2021

2. Maximum Likelihood Training of Score-Based Diffusion Models

Author: Song, Yang, Durkan, Conor, Murray, Iain, and Ermon, Stefano
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet 32x32 without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks., Comment: NeurIPS 2021 (Spotlight)
Published: 2021
Full Text: View/download PDF

3. CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting

Author: Chaoyun Zhang, Fiore, Marco, Murray, Iain, and Patras, Paul
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), General Medicine, Machine Learning (cs.LG)
Abstract: This paper introduces CloudLSTM, a new branch of recurrent neural models tailored to forecasting over data streams generated by geospatial point-cloud sources. We design a Dynamic Point-cloud Convolution (DConv) operator as the core component of CloudLSTMs, which performs convolution directly over point-clouds and extracts local spatial features from sets of neighboring points that surround different elements of the input. This operator maintains the permutation invariance of sequence-to-sequence learning frameworks, while representing neighboring correlations at each time step -- an important aspect in spatiotemporal predictive learning. The DConv operator resolves the grid-structural data requirements of existing spatiotemporal forecasting models and can be easily plugged into traditional LSTM architectures with sequence-to-sequence learning and attention mechanisms. We apply our proposed architecture to two representative, practical use cases that involve point-cloud streams, i.e., mobile service traffic forecasting and air quality indicator forecasting. Our results, obtained with real-world datasets collected in diverse scenarios for each use case, show that CloudLSTM delivers accurate long-term predictions, outperforming a variety of competitor neural network models., This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no.101017109 "DAEMON", and from the Cisco University Research Program Fund (grant no. 2019- 197006).
Published: 2021

4. Density Deconvolution with Normalizing Flows

Author: Dockhorn, Tim, Ritchie, James A., Yu, Yaoliang, and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Density deconvolution is the task of estimating a probability density function given only noise-corrupted samples. We can fit a Gaussian mixture model to the underlying density by maximum likelihood if the noise is normally distributed, but would like to exploit the superior density estimation performance of normalizing flows and allow for arbitrary noise distributions. Since both adjustments lead to an intractable likelihood, we resort to amortized variational inference. We demonstrate some problems involved in this approach, however, experiments on real data demonstrate that flows can already out-perform Gaussian mixtures for density deconvolution., Appearing at the second workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2020), Virtual Conference. 8 pages, 6 figures, 5 tables
Published: 2020

5. On Contrastive Learning for Likelihood-free Inference

Author: Durkan, Conor, Murray, Iain, and Papamakarios, George
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Likelihood-free methods perform parameter inference in stochastic simulator models where evaluating the likelihood is intractable but sampling synthetic data is possible. One class of methods for this likelihood-free problem uses a classifier to distinguish between pairs of parameter-observation samples generated using the simulator and pairs sampled from some reference distribution, which implicitly learns a density ratio proportional to the likelihood. Another popular class of methods fits a conditional distribution to the parameter posterior directly, and a particular recent variant allows for the use of flexible neural density estimators for this task. In this work, we show that both of these approaches can be unified under a general contrastive learning scheme, and clarify how they should be run and compared., Appeared at ICML 2020
Published: 2020

6. Diverse Ensembles Improve Calibration

Author: Stickland, Asa Cooper and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Modern deep neural networks can produce badly calibrated predictions, especially when train and test distributions are mismatched. Training an ensemble of models and averaging their predictions can help alleviate these issues. We propose a simple technique to improve calibration, using a different data augmentation for each ensemble member. We additionally use the idea of `mixing' un-augmented and augmented inputs to improve calibration when test and training distributions are the same. These simple techniques improve calibration and accuracy over strong baselines on the CIFAR10 and CIFAR100 benchmarks, and out-of-domain data from their corrupted versions., Comment: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning
Published: 2020
Full Text: View/download PDF

7. Ordering Dimensions with Nested Dropout Normalizing Flows

Author: Bekasov, Artur and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, ComputingMethodologies_PATTERNRECOGNITION, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: The latent space of normalizing flows must be of the same dimensionality as their output space. This constraint presents a problem if we want to learn low-dimensional, semantically meaningful representations. Recent work has provided compact representations by fitting flows constrained to manifolds, but hasn't defined a density off that manifold. In this work we consider flows with full support in data space, but with ordered latent variables. Like in PCA, the leading latent dimensions define a sequence of manifolds that lie close to the data. We note a trade-off between the flow likelihood and the quality of the ordering, depending on the parameterization of the flow.
Published: 2020
Full Text: View/download PDF

8. Cubic-Spline Flows

Author: Durkan, Conor, Bekasov, Artur, Murray, Iain, and Papamakarios, George
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: A normalizing flow models a complex probability density as an invertible transformation of a simple density. The invertibility means that we can evaluate densities and generate samples from a flow. In practice, autoregressive flow-based models are slow to invert, making either density estimation or sample generation slow. Flows based on coupling transforms are fast for both tasks, but have previously performed less well at density estimation than autoregressive flows. We stack a new coupling transform, based on monotonic cubic splines, with LU-decomposed linear layers. The resulting cubic-spline flow retains an exact one-pass inverse, can be used to generate high-quality images, and closes the gap with autoregressive flows on a suite of density-estimation tasks., Appeared at the 1st Workshop on Invertible Neural Networks and Normalizing Flows at ICML 2019
Published: 2019

9. Dynamic Evaluation of Transformer Language Models

Author: Krause, Ben, Kahembwe, Emmanuel, Murray, Iain, and Renals, Steve
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: This research note combines two methods that have recently improved the state of the art in language modeling: Transformers and dynamic evaluation. Transformers use stacked layers of self-attention that allow them to capture long range dependencies in sequential data. Dynamic evaluation fits models to the recent sequence history, allowing them to assign higher probabilities to re-occurring sequential patterns. By applying dynamic evaluation to Transformer-XL models, we improve the state of the art on enwik8 from 0.99 to 0.94 bits/char, text8 from 1.08 to 1.04 bits/char, and WikiText-103 from 18.3 to 16.4 perplexity points.
Published: 2019

10. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Author: Stickland, Asa Cooper and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning, Machine Learning (stat.ML), Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or `projected attention layers', we match the performance of separately fine-tuned models on the GLUE benchmark with roughly 7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset., Accepted for publication at ICML 2019
Published: 2019

11. Scalable Extreme Deconvolution

Author: Ritchie, James A. and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: The Extreme Deconvolution method fits a probability density to a dataset where each observation has Gaussian noise added with a known sample-specific covariance, originally intended for use with astronomical datasets. The existing fitting method is batch EM, which would not normally be applied to large datasets such as the Gaia catalog containing noisy observations of a billion stars. We propose two minibatch variants of extreme deconvolution, based on an online variation of the EM algorithm, and direct gradient-based optimisation of the log-likelihood, both of which can run on GPUs. We demonstrate that these methods provide faster fitting, whilst being able to scale to much larger models for use with larger datasets., Comment: Appearing at the Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada
Published: 2019
Full Text: View/download PDF

12. Neural Spline Flows

Author: Durkan, Conor, Bekasov, Artur, Murray, Iain, and Papamakarios, George
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: A normalizing flow models a complex probability density as an invertible transformation of a simple base density. Flows based on either coupling or autoregressive transforms both offer exact density evaluation and sampling, but rely on the parameterization of an easily invertible elementwise transformation, whose choice determines the flexibility of these models. Building upon recent work, we propose a fully-differentiable module based on monotonic rational-quadratic splines, which enhances the flexibility of both coupling and autoregressive transforms while retaining analytic invertibility. We demonstrate that neural spline flows improve density estimation, variational inference, and generative modeling of images., Comment: Published at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada
Published: 2019
Full Text: View/download PDF

13. Sequential Neural Methods for Likelihood-free Inference

Author: Durkan, Conor, Papamakarios, George, and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, ComputingMethodologies_PATTERNRECOGNITION, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG), Statistics::Computation
Abstract: Likelihood-free inference refers to inference when a likelihood function cannot be explicitly evaluated, which is often the case for models based on simulators. Most of the literature is based on sample-based `Approximate Bayesian Computation' methods, but recent work suggests that approaches based on deep neural conditional density estimators can obtain state-of-the-art results with fewer simulations. The neural approaches vary in how they choose which simulations to run and what they learn: an approximate posterior or a surrogate likelihood. This work provides some direct controlled comparisons between these choices.
Published: 2018

14. Mode Normalization

Author: Deecke, Lucas, Murray, Iain, and Bilen, Hakan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Normalization methods are a central building block in the deep learning toolbox. They accelerate and stabilize training, while decreasing the dependence on manually tuned learning rate schedules. When learning from multi-modal distributions, the effectiveness of batch normalization (BN), arguably the most prominent normalization method, is reduced. As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features. We demonstrate that our method outperforms BN and other widely used normalization techniques in several experiments, including single and multi-task datasets.
Published: 2018

15. Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows

Author: Papamakarios, George, Sterratt, David C., and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, ComputingMethodologies_PATTERNRECOGNITION, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG), Statistics::Computation
Abstract: We present Sequential Neural Likelihood (SNL), a new method for Bayesian inference in simulator models, where the likelihood is intractable but simulating data from the model is possible. SNL trains an autoregressive flow on simulated data in order to learn a model of the likelihood in the region of high posterior density. A sequential training procedure guides simulations and reduces simulation cost by orders of magnitude. We show that SNL is more robust, more accurate and requires less tuning than related neural-based methods, and we discuss diagnostics for assessing calibration, convergence and goodness-of-fit., Accepted for publication at AISTATS 2019
Published: 2018

16. A determinant-free method to simulate the parameters of large Gaussian fields

Author: Ellam, Louis, Strathmann, Heiko, Girolami, Mark, and Murray, Iain
Subjects: FOS: Computer and information sciences, Statistics - Computation, Computation (stat.CO)
Abstract: We propose a determinant-free approach for simulation-based Bayesian inference in high-dimensional Gaussian models. We introduce auxiliary variables with covariance equal to the inverse covariance of the model. The joint probability of the auxiliary model can be computed without evaluating determinants, which are often hard to compute in high dimensions. We develop a Markov chain Monte Carlo sampling scheme for the auxiliary model that requires no more than the application of inverse-matrix-square-roots and the solution of linear systems. These operations can be performed at large scales with rational approximations. We provide an empirical study on both synthetic and real-world data for sparse Gaussian processes and for large-scale Gaussian Markov random fields., 16 pages, 3 figures
Published: 2017
Full Text: View/download PDF

17. Neural Autoregressive Distribution Estimation

Author: Uria, Benigno, C��t��, Marc-Alexandre, Gregor, Karol, Murray, Iain, and Larochelle, Hugo
Subjects: FOS: Computer and information sciences, Computer Science - Learning, deep learinng, neural networks, density modeling, unsupervised learning, Machine Learning (cs.LG)
Abstract: We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance. We discuss how they achieve competitive performance in modeling both binary and real-valued observations. We also present how deep NADE models can be trained to be agnostic to the ordering of input dimensions used by the autoregressive product rule decomposition. Finally, we also show how to exploit the topological structure of pixels in images using a deep convolutional architecture for NADE.
Published: 2016

18. Differentiation of the Cholesky decomposition

Author: Murray, Iain
Subjects: FOS: Computer and information sciences, ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION, MathematicsofComputing_NUMERICALANALYSIS, Computer Science::Mathematical Software, Computer Science - Mathematical Software, Mathematical Software (cs.MS), Statistics - Computation, Computation (stat.CO)
Abstract: We review strategies for differentiating matrix-based computations, and derive symbolic and algorithmic update rules for differentiating expressions containing the Cholesky decomposition. We recommend new `blocked' algorithms, based on differentiating the Cholesky algorithm DPOTRF in the LAPACK library, which uses `Level 3' matrix-matrix operations from BLAS, and so is cache-friendly and easy to parallelize. For large matrices, the resulting algorithms are the fastest way to compute Cholesky derivatives, and are an order of magnitude faster than the algorithms in common usage. In some computing environments, symbolically-derived updates are faster for small matrices than those based on differentiating Cholesky algorithms. The symbolic and algorithmic approaches can be combined to get the best of both worlds., 18 pages, including 7 pages of code listings
Published: 2016

19. Fast $��$-free Inference of Simulation Models with Bayesian Conditional Density Estimation

Author: Papamakarios, George and Murray, Iain
Subjects: FOS: Computer and information sciences, Machine Learning (stat.ML), Computation (stat.CO), Statistics::Computation, Machine Learning (cs.LG)
Abstract: Many statistical models can be simulated forwards but have intractable likelihoods. Approximate Bayesian Computation (ABC) methods are used to infer properties of these models from data. Traditionally these methods approximate the posterior over parameters by conditioning on data being inside an $��$-ball around the observed data, which is only correct in the limit $��\!\rightarrow\!0$. Monte Carlo methods can then draw samples from the approximate posterior to approximate predictions or error bars on parameters. These algorithms critically slow down as $��\!\rightarrow\!0$, and in practice draw samples from a broader distribution than the posterior. We propose a new approach to likelihood-free inference based on Bayesian conditional density estimation. Preliminary inferences based on limited simulation data are used to guide later simulations. In some cases, learning an accurate parametric representation of the entire true posterior distribution requires fewer model simulations than Monte Carlo ABC methods need to produce a single sample from an approximate posterior., Appeared at NIPS 2016. Fixed typo in Eq (37)
Published: 2016
Full Text: View/download PDF

20. Markov Chain Truncation for Doubly-Intractable Inference

Author: Wei, Colin and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Computing partition functions, the normalizing constants of probability distributions, is often hard. Variants of importance sampling give unbiased estimates of a normalizer Z, however, unbiased estimates of the reciprocal1=Z are harder to obtain. Unbiased estimates of 1=Z allow Markov chain MonteCarlo sampling of “doubly-intractable” distributions, such as the parameter posterior for Markov Random Fields or Exponential Random Graphs. We demonstrate how to construct unbiased estimates for 1=Z given access to black-box importance sampling estimators for Z. We adapt recent work on random series truncation and Markov chain coupling, producing estimators with lower variance and a higher percentage of positive estimates than before. Our debiasing algorithms are simple to implement, and have some theoretical andempirical advantages over existing methods.
Published: 2016
Full Text: View/download PDF

21. Pseudo-Marginal Slice Sampling

Author: Murray, Iain and Graham, Matthew M.
Subjects: FOS: Computer and information sciences, Statistics - Computation, Computation (stat.CO)
Abstract: Markov chain Monte Carlo (MCMC) methods asymptotically sample from complex probability distributions. The pseudo-marginal MCMC framework only requires an unbiased estimator of the unnormalized probability distribution function to construct a Markov chain. However, the resulting chains are harder to tune to a target distribution than conventional MCMC, and the types of updates available are limited. We describe a general way to clamp and update the random numbers used in a pseudo-marginal method's unbiased estimator. In this framework we can use slice sampling and other adaptive methods. We obtain more robust Markov chains, which often mix more quickly., 9 pages, 6 figures, 1 table. Version 2 includes citations to closely-related work released on arXiv since version 1
Published: 2015

22. MADE: Masked Autoencoder for Distribution Estimation

Author: Germain, Mathieu, Gregor, Karol, Murray, Iain, and Larochelle, Hugo
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: There has been a lot of recent interest in designing neural network models to estimate a distribution from a set of examples. We introduce a simple modification for autoencoder neural networks that yields powerful generative models. Our method masks the autoencoder's parameters to respect autoregressive constraints: each input is reconstructed only from previous inputs in a given ordering. Constrained this way, the autoencoder outputs can be interpreted as a set of conditional probabilities, and their product, the full joint probability. We can also train a single network that can decompose the joint probability in multiple different orderings. Our simple framework can be applied to multiple architectures, including deep ones. Vectorized implementations, such as on GPUs, are simple and fast. Experiments demonstrate that this approach is competitive with state-of-the-art tractable distribution estimators. At test time, the method is significantly faster and scales better than other autoregressive estimators., 9 pages and 1 page of supplementary material. Updated to match published version
Published: 2015

23. Parallel MCMC with Generalized Elliptical Slice Sampling

Author: Nishihara, Robert, Murray, Iain, and Adams, Ryan P.
Subjects: FOS: Computer and information sciences, Statistics - Machine Learning, Machine Learning (stat.ML), Statistics - Computation, Computation (stat.CO)
Abstract: Probabilistic models are conceptually powerful tools for finding structure in data, but their practical effectiveness is often limited by our ability to perform inference in them. Exact inference is frequently intractable, so approximate inference is often performed using Markov chain Monte Carlo (MCMC). To achieve the best possible results from MCMC, we want to efficiently simulate many steps of a rapidly mixing Markov chain which leaves the target distribution invariant. Of particular interest in this regard is how to take advantage of multi-core computing to speed up MCMC-based inference, both to improve mixing and to distribute the computational load. In this paper, we present a parallelizable Markov chain Monte Carlo algorithm for efficiently sampling from continuous probability distributions that can take advantage of hundreds of cores. This method shares information between parallel Markov chains to build a scale-mixture of Gaussians approximation to the density function of the target distribution. We combine this approximation with a recent method known as elliptical slice sampling to create a Markov chain with no step-size parameters that can mix rapidly without requiring gradient or curvature computations., 19 pages, 8 figures, 3 algorithms
Published: 2014

24. A Deep and Tractable Density Estimator

Author: Uria, Benigno, Murray, Iain, and Larochelle, Hugo
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: The Neural Autoregressive Distribution Estimator (NADE) and its real-valued version RNADE are competitive density models of multidimensional data across a variety of domains. These models use a fixed, arbitrary ordering of the data dimensions. One can easily condition on variables at the beginning of the ordering, and marginalize out variables at the end of the ordering, however other inference tasks require approximate inference. In this work we introduce an efficient procedure to simultaneously train a NADE model for each possible ordering of the variables, by sharing parameters across all these models. We can thus use the most convenient model for each inference task at hand, and ensembles of such models with different orderings are immediately available. Moreover, unlike the original NADE, our training procedure scales to deep models. Empirically, ensembles of Deep NADE models obtain state of the art density estimation performance., 9 pages, 4 tables, 1 algorithm, 5 figures. To appear ICML 2014, JMLR W&CP volume 32
Published: 2013

25. A Framework for Evaluating Approximation Methods for Gaussian Process Regression

Author: Chalupka, Krzysztof, Williams, Christopher K. I., and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Statistics - Computation, Computation (stat.CO), Machine Learning (cs.LG)
Abstract: Gaussian process (GP) predictors are an important component of many Bayesian approaches to machine learning. However, even a straightforward implementation of Gaussian process regression (GPR) requires O(n^2) space and O(n^3) time for a dataset of n examples. Several approximation methods have been proposed, but there is a lack of understanding of the relative merits of the different approximations, and in what situations they are most useful. We recommend assessing the quality of the predictions obtained as a function of the compute time taken, and comparing to standard baselines (e.g., Subset of Data and FITC). We empirically investigate four different approximation algorithms on four different prediction problems, and make our code available to encourage future comparisons., 19 pages, 4 figures
Published: 2013

26. RNADE: The real-valued neural autoregressive density-estimator

Author: Uria, Benigno, Murray, Iain, and Larochelle, Hugo
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradient-based optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case., Comment: 12 pages, 3 figures, 3 tables, 2 algorithms. Merges the published paper and supplementary material into one document
Published: 2013
Full Text: View/download PDF

27. Driving Markov chain Monte Carlo with a dependent random stream

Author: Murray, Iain and Elliott, Lloyd T.
Subjects: FOS: Computer and information sciences, Statistics - Computation, Computation (stat.CO)
Abstract: Markov chain Monte Carlo is a widely-used technique for generating a dependent sequence of samples from complex distributions. Conventionally, these methods require a source of independent random variates. Most implementations use pseudo-random numbers instead because generating true independent variates with a physical system is not straightforward. In this paper we show how to modify some commonly used Markov chains to use a dependent stream of random numbers in place of independent uniform variates. The resulting Markov chains have the correct invariant distribution without requiring detailed knowledge of the stream's dependencies or even its marginal distribution. As a side-effect, sometimes far fewer random numbers are required to obtain accurate results., 16 pages, 4 figures
Published: 2012

28. MCMC for doubly-intractable distributions

Author: Murray, Iain, Ghahramani, Zoubin, and MacKay, David
Subjects: Methodology (stat.ME), FOS: Computer and information sciences, Statistics - Computation, Statistics - Methodology, Computation (stat.CO), Statistics::Computation
Abstract: Markov Chain Monte Carlo (MCMC) algorithms are routinely used to draw samples from distributions with intractable normalization constants. However, standard MCMC algorithms do not apply to doubly-intractable distributions in which there are additional parameter-dependent normalization terms; for example, the posterior over parameters of an undirected graphical model. An ingenious auxiliary-variable scheme (Moeller et al., 2004) offers a solution: exact sampling (Propp and Wilson, 1996) is used to sample from a Metropolis-Hastings proposal for which the acceptance probability is tractable. Unfortunately the acceptance probability of these expensive updates can be low. This paper provides a generalization of Moeller et al. (2004) and a new MCMC algorithm, which obtains better acceptance probabilities for the same amount of exact sampling, and removes the need to estimate model parameters before sampling begins., Comment: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)
Published: 2012
Full Text: View/download PDF

29. Slice sampling covariance hyperparameters of latent Gaussian models

Author: Murray, Iain and Adams, Ryan Prescott
Subjects: FOS: Computer and information sciences, Statistics - Machine Learning, Machine Learning (stat.ML), Statistics - Computation, Computation (stat.CO), Statistics::Computation
Abstract: The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong- and weak-data regimes., 9 pages, 4 figures, 4 algorithms. Minor corrections to previous version. This version to appear in Advances in Neural Information Processing Systems (NIPS) 23, 2010
Published: 2010

30. Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes

Author: Adams, Ryan Prescott, Dahl, George E., and Murray, Iain
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Probabilistic matrix factorization (PMF) is a powerful method for modeling data associated with pairwise relationships, finding use in collaborative filtering, computational biology, and document analysis, among other areas. In many domains, there is additional information that can assist in prediction. For example, when modeling movie ratings, we might know when the rating occurred, where the user lives, or what actors appear in the movie. It is difficult, however, to incorporate this side information into the PMF model. We propose a framework for incorporating side information by coupling together multiple PMF problems via Gaussian process priors. We replace scalar latent features with functions that vary over the space of side information. The GP priors on these functions require them to vary smoothly and share information. We successfully use this new method to predict the scores of professional basketball games, where side information about the venue and date of the game are relevant for the outcome., 18 pages, 4 figures, Submitted to UAI 2010
Published: 2010

31. Elliptical slice sampling

Author: Murray, Iain, Adams, Ryan Prescott, and MacKay, David J. C.
Subjects: FOS: Computer and information sciences, Statistics - Machine Learning, Machine Learning (stat.ML), Statistics - Computation, Computation (stat.CO)
Abstract: Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. We present a new Markov chain Monte Carlo algorithm for performing inference in models with multivariate Gaussian priors. Its key properties are: 1) it has simple, generic code applicable to many models, 2) it has no free parameters, 3) it works well for a variety of Gaussian process based models. These properties make our method ideal for use while model building, removing the need to spend time deriving and tuning updates for more complex algorithms., Comment: 8 pages, 6 figures, appearing in AISTATS 2010 (JMLR: W&CP volume 6). Differences from first submission: some minor edits in response to feedback.
Published: 2010
Full Text: View/download PDF

32. Nonparametric Bayesian Density Modeling with Gaussian Processes

Author: Adams, Ryan Prescott, Murray, Iain, and MacKay, David J. C.
Subjects: FOS: Computer and information sciences, FOS: Mathematics, Mathematics - Statistics Theory, Statistics Theory (math.ST), Statistics - Computation, Computation (stat.CO), Statistics::Computation
Abstract: We present the Gaussian process density sampler (GPDS), an exchangeable generative model for use in nonparametric Bayesian density estimation. Samples drawn from the GPDS are consistent with exact, independent samples from a distribution defined by a density that is a transformation of a function drawn from a Gaussian process prior. Our formulation allows us to infer an unknown density from data using Markov chain Monte Carlo, which gives samples from the posterior distribution over density functions and from the predictive distribution on data space. We describe two such MCMC methods. Both methods also allow inference of the hyperparameters of the Gaussian process., Comment: 26 pages, 4 figures, submitted to the Annals of Statistics
Published: 2009
Full Text: View/download PDF

33. Bayesian learning in undirected graphical models: approximate MCMC algorithms

Author: Murray, Iain and Ghahramani, Zoubin
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG), Statistics::Computation
Abstract: Bayesian learning in undirected graphical models|computing posterior distributions over parameters and predictive quantities is exceptionally difficult. We conjecture that for general undirected models, there are no tractable MCMC (Markov Chain Monte Carlo) schemes giving the correct equilibrium distribution over parameters. While this intractability, due to the partition function, is familiar to those performing parameter optimisation, Bayesian learning of posterior distributions over undirected model parameters has been unexplored and poses novel challenges. we propose several approximate MCMC schemes and test on fully observed binary models (Boltzmann machines) for a small coronary heart disease data set and larger artificial systems. While approximations must perform well on the model, their interaction with the sampling scheme is also important. Samplers based on variational mean- field approximations generally performed poorly, more advanced methods using loopy propagation, brief sampling and stochastic dynamics lead to acceptable parameter posteriors. Finally, we demonstrate these techniques on a Markov random field with hidden variables., Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)
Published: 2004

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

33 results on '"Murray, Iain A."'

1. Lossless compression with state space models using bits back coding

2. Maximum Likelihood Training of Score-Based Diffusion Models

3. CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting

4. Density Deconvolution with Normalizing Flows

5. On Contrastive Learning for Likelihood-free Inference

6. Diverse Ensembles Improve Calibration

7. Ordering Dimensions with Nested Dropout Normalizing Flows

8. Cubic-Spline Flows

9. Dynamic Evaluation of Transformer Language Models

10. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

11. Scalable Extreme Deconvolution

12. Neural Spline Flows

13. Sequential Neural Methods for Likelihood-free Inference

14. Mode Normalization

15. Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows

16. A determinant-free method to simulate the parameters of large Gaussian fields

17. Neural Autoregressive Distribution Estimation

18. Differentiation of the Cholesky decomposition

19. Fast $��$-free Inference of Simulation Models with Bayesian Conditional Density Estimation

20. Markov Chain Truncation for Doubly-Intractable Inference

21. Pseudo-Marginal Slice Sampling

22. MADE: Masked Autoencoder for Distribution Estimation

23. Parallel MCMC with Generalized Elliptical Slice Sampling

24. A Deep and Tractable Density Estimator

25. A Framework for Evaluating Approximation Methods for Gaussian Process Regression

26. RNADE: The real-valued neural autoregressive density-estimator

27. Driving Markov chain Monte Carlo with a dependent random stream

28. MCMC for doubly-intractable distributions

29. Slice sampling covariance hyperparameters of latent Gaussian models

30. Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes

31. Elliptical slice sampling

32. Nonparametric Bayesian Density Modeling with Gaussian Processes

33. Bayesian learning in undirected graphical models: approximate MCMC algorithms

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

33 results on '"Murray, Iain A."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources