29 results on '"Generative models"'
Search Results
2. Conditional Generative Models for Dynamic Trajectory Generation and Urban Driving
- Author
-
Paz, David, Zhang, Hengyuan, Xiang, Hao, Liang, Andrew, and Christensen, Henrik I
- Subjects
Clinical Research ,HD maps ,autonomous driving ,coarse maps ,generative models ,global planning ,perception ,scene understanding ,semantic maps ,Analytical Chemistry ,Environmental Science and Management ,Ecology ,Distributed Computing ,Electrical and Electronic Engineering - Abstract
This work explores methodologies for dynamic trajectory generation for urban driving environments by utilizing coarse global plan representations. In contrast to state-of-the-art architectures for autonomous driving that often leverage lane-level high-definition (HD) maps, we focus on minimizing required map priors that are needed to navigate in dynamic environments that may change over time. To incorporate high-level instructions (i.e., turn right vs. turn left at intersections), we compare various representations provided by lightweight and open-source OpenStreetMaps (OSM) and formulate a conditional generative model strategy to explicitly capture the multimodal characteristics of urban driving. To evaluate the performance of the models introduced, a data collection phase is performed using multiple full-scale vehicles with ground truth labels. Our results show potential use cases in dynamic urban driving scenarios with real-time constraints. The dataset is released publicly as part of this work in combination with code and benchmarks.
- Published
- 2023
3. Med-cDiff: Conditional Medical Image Generation with Diffusion Models
- Author
-
Hung, Alex Ling Yu, Zhao, Kai, Zheng, Haoxin, Yan, Ran, Raman, Steven S, Terzopoulos, Demetri, and Sung, Kyunghyun
- Subjects
Communications Engineering ,Engineering ,Biomedical Imaging ,4.1 Discovery and preclinical testing of markers and technologies ,image generation ,diffusion models ,generative models ,super-resolution ,denoising ,inpainting ,Biomedical engineering - Abstract
Conditional image generation plays a vital role in medical image analysis as it is effective in tasks such as super-resolution, denoising, and inpainting, among others. Diffusion models have been shown to perform at a state-of-the-art level in natural image generation, but they have not been thoroughly studied in medical image generation with specific conditions. Moreover, current medical image generation models have their own problems, limiting their usage in various medical image generation tasks. In this paper, we introduce the use of conditional Denoising Diffusion Probabilistic Models (cDDPMs) for medical image generation, which achieve state-of-the-art performance on several medical image generation tasks.
- Published
- 2023
4. Deep generative modeling for volume reconstruction in cryo-electron microscopy.
- Author
-
Donnat, Claire, Levy, Axel, Poitevin, Frédéric, Zhong, Ellen, and Miolane, Nina
- Subjects
Deep neural networks ,Generative models ,High-resolution volume reconstruction ,cryoEM ,Cryoelectron Microscopy - Abstract
Advances in cryo-electron microscopy (cryo-EM) for high-resolution imaging of biomolecules in solution have provided new challenges and opportunities for algorithm development for 3D reconstruction. Next-generation volume reconstruction algorithms that combine generative modelling with end-to-end unsupervised deep learning techniques have shown promise, but many technical and theoretical hurdles remain, especially when applied to experimental cryo-EM images. In light of the proliferation of such methods, we propose here a critical review of recent advances in the field of deep generative modelling for cryo-EM reconstruction. The present review aims to (i) provide a unified statistical framework using terminology familiar to machine learning researchers with no specific background in cryo-EM, (ii) review the current methods in this framework, and (iii) outline outstanding bottlenecks and avenues for improvements in the field.
- Published
- 2022
5. A Generative Account of Latent Abstractions
- Author
-
Xie, Sirui
- Subjects
Artificial intelligence ,Decision-making ,Generative models ,Latent abstraciton - Abstract
Abstractions are fundamental to human intelligence, extending far beyond pattern recognition. They enable the distillation and organization of complex information into structured knowledge, facilitate the succinct communication of intricate ideas, and empower us to navigate complex decision-making scenarios with consistent value prediction. The ability to abstract is particularly fascinating because abstractions are not inherently present in raw data --- they are latent variables underlying our observations. Despite the recent phenomenal advances in modeling data distributions, Generative Artificial Intelligence (GenAI) systems still lack robust principles for the autonomous emergence of latent abstractions.This dissertation studies the problem of unsupervised latent abstraction learning, focusing on developing modeling, learning, and inference methods for latent-variable generative models across diverse high-dimensional data modalities. The core premise is that by incorporating algebraic, geometric, and statistical structures into the latent space and generator, we can cultivate representations of latent variables that explain observed data in alignment with human understanding.The dissertation consists of four parts. The first three explore the generative constructs of latent abstractions for Category, Object, and Decision, respectively. Part I examines the basic structure of categories, emphasizing their symbol-vector duality. We develop a latent-variable text model with a coupling of symbols and vectors in its representations. We investigate another representation that is both discrete and continuous --- iconic symbols --- in a visual communication game. Part II enriches the abstract structure by shifting focus to object-centric abstractions in visual data. We introduce a generative model that disentangles objects from backgrounds in the latent space. We then rethink the algebraic structures of object abstractions and propose a novel metric that measures compositionality as a more generic form than disentanglement. Part III incorporates situational context by introducing a sequential decision-making aspect with trajectory data. Here, latent abstractions manifest as actions and plans. We bridge the theories of decision-making and generative modeling, proving that the inference of latent decisions enhances consistency with the model's understanding while optimizing intrinsic values. Whereas these three parts adopt the paradigm of directly learning from raw data, Part IV introduces a dialectic discussion with an alternative paradigm, Knowledge Distillation. We demonstrate how to distill from and accelerate the state-of-the-art massive-scale data-space models by re-purposing our methods and techniques for latent-variable generative modeling. Together, the contributions of this dissertation enable GenAI systems to overcome the critical bottlenecks of alignment, efficiency, and consistency in representation, inference, and decision-making.
- Published
- 2024
6. Electrocardiogram Synthesis Using Denoising Diffusion Probabilistic Models and Bidirectional State-Space Models
- Author
-
Alsharif, Haya Adnan N
- Subjects
Statistics ,Computer science ,Artificial intelligence ,Diffusion Models ,ECG Synthesis ,Electrocardiography ,Generative Models ,Signal Processing ,Synthetic data - Abstract
This thesis investigates the application of Denoising Diffusion Probabilistic Models (DDPM) for synthesizing 12-lead Electrocardiogram (ECG) signals. Utilizing classifier-free guidance along with the bidirectional Mamba State Space Model (SSM) within a DiffWave framework, we developed a model capable of both unconditional and conditional ECG signal generation.Despite the promising potential of Mamba for enhancing temporal signal encoding, its performance compared to the time-invariant SSM model, S4, was either worse or inconclusive, likely due to the limitations of current metrics. Visual assessments often contradicted the automated metrics, indicating a significant gap in current evaluation methods. We also explored the feasibility of training models on all 12 leads, contrasting previous studies that used fewer leads. Our findings indicate that diffusion models can adequately learn the linear relationships between leads without significantly increasing model size. Overall, our work reduces the need for extensive data pre- or post-processing, streamlines ECG data generation, and highlights the limitations of existing evaluation methodologies, suggesting the need for further evaluation.
- Published
- 2024
7. Multi-Track Music Generation with Latent Diffusion Models
- Author
-
Karchkhadze, Tornike
- Subjects
Music ,diffusion models ,generative models ,machine learning ,multi track - Abstract
In recent years, diffusion models have demonstrated promising results in cross-modalgeneration tasks within generative media, encompassing image, video, and audio generation.This development has introduced a great deal of novelty to audio and music-related tasks, suchas text-to-sound and text-to-music generation. However, these text-controlled music generationmodels typically focus on capturing global musical attributes, such as genre and mood, and donot allow for the more fine-grained control that composers might desire. Music composition is acomplex, multilayered task that frequently involves intricate musical arrangements as an essentialpart of the creative process. This task requires composers to carefully align each instrument withexisting tracks in terms of beat, dynamics, harmony, and melody, demanding a level of precisionand control over individual tracks that current text-driven prompts often fail to provide.In this work, we address these challenges by presenting a multi-track music generationmodel, one of the first of its kind. Our model, by learning the joint probability of tracks sharinga context, is capable of generating music across several tracks that correspond well to each other,either conditionally or unconditionally. We achieve this by extending the MusicLDM—a latentdiffusion model for music—into a multi-track generative model. Additionally, our model iscapable of arrangement generation, where it can generate any subset of tracks given the others(e.g., generating a piano track that complements given bass and drum tracks). We compared ourmodel with existing multi-track generative models and demonstrated that our model achievesconsiderable improvements across objective metrics, for both total and arrangement generationtasks. Additionally, we demonstrated that our model is capable of meaningful conditioninggeneration with text and reference musical audio, corresponding well to text meaning andreference audio content/style. Sound examples form this work can be found at https://mtmusicldm.github.io.
- Published
- 2024
8. Robust Modeling through Causal Priors and Data Purification in Machine Learning
- Author
-
Bhat, Sunay Gajanan
- Subjects
Computer science ,Computer engineering ,Electrical engineering ,Artificial Intelligence ,Causality ,Generative Models ,Machine Learning ,Poison Defense ,Robust Classification - Abstract
The continued success and ubiquity of machine learning techniques, particularly Deep Learning, have necessitated research in robust model training to enhance generalization capabilities and security against incomplete data, distributional shifts, and adversarial attacks. This thesis presents two primary sets of contributions to robust modeling in machine learning through the use of causal priors and data purification with generative models such as the Variational Autoencoder (VAE), Energy-Based Model (EBM), and Denoising Diffusion Probabilistic Model (DDPM), focusing on image datasets. In the first set of contributions, we use structural causal priors in the latent spaces of VAEs. Initially, we demonstrate counterfactual synthetic data generation outside the training data distribution. This technique allows for the creation of diverse and novel data points, which is critical to enhancing model robustness and generalization capabilities. We utilize a similar VAE architecture to compare causal structural (graphical) hypotheses, showing that the fit of generated data from various hypotheses on distributionally shifted test data is an effective method for hypothesis comparison. Additionally, we explore using augmentations in the latent space of a VAE as an efficient and effective way to generate realistic augmented data. The second set of contributions focuses on data purification using EBMs and DDPMs. We propose a framework of universal data purification methods to defend against train-time data poisoning attacks. This framework utilizes stochastic transforms realized via iterative Langevin dynamics of EBMs, DDPMs, or both, to purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various poisoning attacks while preserving natural accuracy. Preprocessing data with these techniques pushes poisoned images into the natural, clean image manifold, effectively neutralizing adversarial perturbations. The framework achieves state-of-the-art performance without needing attack or classifier-specific information, even when the generative models are trained on poisoned or distributionally shifted data. Beyond defense against data poisoning, our framework also shows promise in applications such as the degradation and removal of unwanted intellectual property. The flexibility and generality of these data purification techniques represent a significant step forward in the adversarial model training paradigm. All of these methods enable new perspectives and approaches to robust machine learning, advancing an essential field in artificial intelligence research.
- Published
- 2024
9. Generating new concepts with hybrid neuro-symbolic models
- Author
-
Feinman, Reuben and Lake, Brenden M.
- Subjects
Categories and concepts ,neural networks ,com-positionality ,causality ,generative models - Abstract
Human conceptual knowledge supports the ability to generatenovel yet highly structured concepts, and the form of this con-ceptual knowledge is of great interest to cognitive scientists.One tradition has emphasized structured knowledge, view-ing concepts as embedded in intuitive theories or organizedin complex symbolic knowledge structures. A second tradi-tion has emphasized statistical knowledge, viewing conceptualknowledge as an emerging from the rich correlational structurecaptured by training neural networks and other statistical mod-els. In this paper, we explore a synthesis of these two traditionsthrough a novel neuro-symbolic model for generating new con-cepts. Using simple visual concepts as a testbed, we bring to-gether neural networks and symbolic probabilistic programsto learn a generative model of novel handwritten characters.Two alternative models are explored with more generic neuralnetwork architectures. We compare each of these three mod-els for their likelihoods on held-out character classes and forthe quality of their productions, finding that our hybrid modellearns the most convincing representation and generalizes fur-ther from the training observations.
- Published
- 2020
10. The fine structure of surprise in intuitive physics: when, why, and how much?
- Author
-
Smith, Kevin A., Mei, Lingjie, Yao, Shunyu, Wu, Jiajun, Spelke, Elizabeth, Tenenbaum, Joshua B., and Ullman, Tomer D.
- Subjects
Intuitive physics ,Surprise ,Violation of expecta-tion ,Generative models - Abstract
We are surprised when events violate our intuitive physicalexpectations. Even infants look longer when things seem tomagically teleport or vanish. This important surprise signalhas been used to probe what infants expect, in order to studythe most basic representations of objects. But these studiesrely on binary measures – an event is surprising, or not. Here,we study surprise in a more precise, quantitative way, usingthree distinct measures: we ask adults to judge how surprisinga scene is, when that scene is surprising, and why it is surpris-ing. We find good consistency in the level of surprise reportedacross these experiments, but also crucial differences in theimplied explanations of those scenes. Beyond this, we showthat the timing and degree of surprise can be explained by anobject-based model of intuitive physics.
- Published
- 2020
11. Modifying social dimensions of human faces with ModifAE
- Author
-
Atalla, Chad, Song, Amanda, Tam, Bartholomew, Rathis, Asmitha, and Cottrell, Gary
- Subjects
neural networks ,generative models ,face recogni-tion ,social perception ,image modification - Abstract
At first glance, humans extract social judgments from faces, in-cluding how trustworthy, attractive, and aggressive they look.These impressions have profound social, economic, and polit-ical consequences, as they subconsciously influence decisionslike voting and criminal sentencing. Therefore, understand-ing human perception of these judgments is important for thesocial sciences. In this work, we present a modifying autoen-coder (ModifAE, pronounced “modify”) that can model andalter these facial impressions. We assemble a face impressiondataset large enough for training a generative model by ap-plying a state-of-the-art (SOTA) impression predictor to facesfrom CelebA. Then, we apply ModifAE to learn generalizablemodifications of these continuous-valued traits in faces (e.g.,make a face look slightly more intelligent or much less aggres-sive). ModifAE can modify face images to create controlledsocial science experimental datasets, and it can reveal datasetbiases by creating direct visualizations of what makes a facesalient in social dimensions. The ModifAE architecture is alsosmaller and faster than SOTA image-to-image translation mod-els, while outperforming SOTA in quantitative evaluations.
- Published
- 2019
12. Mapping and Planning for Autonomous Vehicles in Dynamic Urban Settings
- Author
-
Paz Ruiz, David Fernando
- Subjects
Robotics ,Artificial intelligence ,Computer science ,Autonomous Driving ,Computer Vision ,Generative Models ,Machine Learning ,Planning ,Semantic Scene Understanding - Abstract
In highly dynamic urban environments, software stacks for autonomous driving applications must quickly adapt to fast changing environments. Examples of dynamic scenarios include construction sites, road closures, and lane level updates. Failure to adapt to changes in map definitions can result in catastrophic failures in the system that can lead to accidents or, at best, rule violations in shared public roads. This work focuses on identifying strategies that leverage automatically generated map representations to minimize human-in-the-loop efforts and explores methods for integrating nominal planners in the global planning task. The first part of this dissertation covers multi-class semantic mapping for large scale urban driving applications. As part of this framework, sensor fusion based strategies are applied to provide robust depth and semantic estimates from the scene without making strong assumptions about the road topology. Secondly, rasterized and graphical representations are jointly leveraged to formulate a nominal global planning approach for lane-level navigation. This method utilizes the semantic maps introduced and employs a conditional generative model to explicitly model the multi-modal distribution of trajectories that are feasible when driving in an urban setting. We additionally provide details from real-world testing and the open-source data collected from the UC San Diego campus during 2020-2021. In the last chapter, 2D and 3D centerline prediction methods are introduced to reduce the gap in real-time scene understanding. This contribution outlines an automatic label generation process and additionally leverages an occlusion handling approach to reason about centerline prediction with varying degrees of occlusion. The methods proposed achieve robust performance in diverse driving scenarios with promising directions in autonomous driving architectures.
- Published
- 2023
13. Generative Models of Images and Neural Networks
- Author
-
Peebles, William Smith
- Subjects
Artificial intelligence ,deep learning ,diffusion ,generative adversarial networks ,generative models ,neural networks ,transformers - Abstract
Large-scale generative models have fueled recent progress in artificial intelligence. Armed with scaling laws that accurately predict model performance as invested compute increases, NLP has become the gold standard for all disciplines of AI. Given a new task, pre-trained generative models can either solve it zero-shot or be efficiently fine-tuned on a small amount of task-specific training examples. However, the widespread adoption of generative models has lagged in other domains---such as vision and meta-learning. In this thesis, we study ways to train improved, scalable generative models of two modalities---images and neural network parameters. We also examine how pre-trained generative models can be leveraged to tackle additional downstream tasks.We begin by introducing a new, powerful class of generative models---Diffusion Transformers (DiTs). We show that transformers---with one small yet critically-important modification---retain their excellent scaling properties for diffusion-based image generation and outperform convolutional neural networks that have previously dominated the area. DiT outperforms all prior generative models on the class-conditional ImageNet generation benchmark.Next, we introduce a novel framework for learning to learn based on building generative models of a new data source---neural network checkpoints. We create datasets containing hundreds of thousands of deep learning training runs and use it to train generative models of neural network checkpoints. Given a starting parameter vector and a target loss, error or reward, loss-conditional diffusion models trained on this data can sample parameter updates that achieve a desired metric. We apply our framework to problems in vision and reinforcement learning.Finally, we explore how pre-trained image-level generative models can be used to tackle downstream tasks in vision without requiring task-specific training data. We show that pre-trained GAN generators can be used to create an infinite data stream to train networks for the dense visual correspondence problem---without requiring any human-annotated supervision like keypoints. Networks trained on this completely GAN-generated data generalize zero-shot to real images, and they outperform previous self-supervised and keypoint-supervised approaches that train on real data.
- Published
- 2023
14. Generative Models for Image and Long Video Synthesis
- Author
-
Brooks, Tim
- Subjects
Artificial intelligence ,Artificial Intelligence ,Deep Learning ,Generative AI ,Generative Models ,Image Generation ,Video Generation - Abstract
In this thesis, I present essential ingredients for making image and video generative models useful for general visual content creation through three contributions. First, I will present research on long video generation. This work proposes a network architecture and training paradigm that enables learning long-term temporal patterns from videos, a key challenge to advancing video generation from short clips to longer-form coherent videos. Next, I will present research on generating images of scenes conditioned on human poses. This work showcases the ability of generative models to represent relationships between humans and their environments, and emphasizes the importance of learning from large and complex datasets of daily human activity. Lastly, I will present a method for teaching generative models to follow image editing instructions by combining the abilities of large language models and text-to-image models to create supervised training data. Following instructions is an important step that will allow generative models of visual data to become more helpful to people. Together these works advance the capabilities of generative models for synthesizing images and long videos.
- Published
- 2023
15. Fast Training of Generalizable Deep Neural Networks
- Author
-
POOLADZANDI, OMEAD BRANDON
- Subjects
Computer science ,Computer Vision ,Curvature Aware Optimization ,Generative Models ,Machine Learning ,Speech Processing - Abstract
Effective natural agents excel in learning representations of our world and efficiently generalizing to make decisions. Critically, developing such advanced reasoning capabilities can occur even with limited information-rich samples. In stark contrast, the major success of deep learning-based artificial agents is primarily trained on massive datasets. This dissertation focuses on curvature-informed learning and generative modeling methods that boost efficiency and close the gap between natural and artificial agents, thus enabling computationally efficient and improved reasoning.This dissertation is comprised of two parts. First, we formally lay the foundations for learning. The goal is to establish optimization techniques, understand datasets, establish probabilistic generative models, and provide natural learning objectives even in settings with limited supervision. We discuss various first and second-order optimization methods, show the importance of modeling distributions in Variational Auto Encoders (VAEs),and discuss which points are essential for generalization in supervised learning.Building on these insights, we develop new algorithms to boost the performance of state-of-the-art models, select subsets to improve data quality, speed up training, mitigate their biases, and generate new augmentations on large labeled and partially labeled datasets. These contributions enable ML systems to better model and generalize to unseen and potentially out-of-distribution samples while drastically reducing training time and computational cost.
- Published
- 2023
16. Authoring and Experiencing Virtual 3D Environments
- Author
-
Sayyad, Ehsan
- Subjects
Computer science ,Artificial intelligence ,Design ,3D Reconstruction ,3D User Interface ,Generative Models ,Human Computer Interaction ,Machine Learning ,Virtual Reality - Abstract
The growing popularity of Virtual Reality and Augmented Reality has led to an increase in demand for 3D content. However, traditional methods for creating 3D environments are often expensive and complicated, making it difficult for non-expert users to produce their own content. This thesis addresses this issue through research focusing on two main areas: (1) the study of 3D user experiences, which helps create engaging and immersive experiences that meet users' needs and preferences and (2) increasing accessibility for content generation, which divides into creating approachable 3D user interfaces and intelligent creative tools. Our research aims to understand how users perceive and interact with immersive environments and, subsequently, to develop tools that enable users to create 3D content with ease and accessibility, even with limited experience or resources, with the help of generative models in machine learning and intuitive UI. Our project PanoTrace introduces a 3D modeling platform in VR for creating 3D scenes from 2D panoramas, providing a first example of our idea of novel approachable 3D user interfaces. Our wide-area VR walking study on the other hand falls entirely within the domain of studying 3D user experiences. It investigates the impact of natural walking versus teleportation on presence and user preference in VR experiences. We introduce the content-aware semantic editing and inpainting system (CASEIn) as an example of an intelligent creative tool. CASEIn generates high-quality results for guided image inpainting and semantic image synthesis using machine learning. Our projects DeepDive and Faded focus on accessibility for content generation by linking approachable UI design with machine learning. Faded is a memory reconstruction system that relies on our inpainting model, CASEIn, to extrapolate existing sparse information, including images and the user's memory of a space, into a cohesive 3D experience. In summary, this thesis demonstrates the feasibility and importance of approachable 3D UIs, the use of intelligent creative tools, and 3D user experience studies, all in service of enabling users to create 3D content in a more accessible way. Our research also provides insights into factors contributing to immersion, user preference, and viewing comfort in these settings.
- Published
- 2023
17. Amortized Inference in Latent Space Energy-Based Prior Model
- Author
-
Zhai, Xufan
- Subjects
Statistics ,amortization ,energy-based ,Generative models ,MCMC - Abstract
This thesis discusses amortized inference in the latent space energy-based prior model(EBM), where the EBM serves as the prior of a generator neural network. The sampling of the prior and posterior can be done by short-run MCMC, however, the MCMC sampling of the posterior distribution can be time consuming due to the complexity of the posterior distribution. We propose to amortize the MCMC sampling in the posterior distribution with an inference network. Image experiments showed that amortization produces similar results to short-run MCMC sampling and is more time efficient; the generator also shows better stability under amortization.
- Published
- 2022
18. DEEP LEARNING MODELS FOR THE ANALYSIS OF SINGLE CELL GENOMICS
- Author
-
Johansen, Nelson Jamse
- Subjects
Computer science ,Biology ,Computational biology ,Deep learning ,Generative models ,Machine learning ,Neural networks - Abstract
Single cell transcriptomic technologies which capture high dimensional measurements of gene expression in individual cells have been exponentially scaling in the number of cells that can be sequenced and analyzed simultaneously. Capturing a snapshot of the landscape for possible gene expression measurements from a collection of cells enables researchers to observe the space of molecular variation inherent to specific biological systems, termed atlasing. A challenge to building deeply characterized atlases of complex biological systems such as the human brain is in the identification and correction of confounding factors which do not relate to the underlying biology but instead arise from technical confounders. In this dissertation I present deep learning models applied to single cell genomics which remove unwanted technical variation and contamination as well as perform novel analysis not previously possible using standard methods.The construction of single cell genomics atlases leverages recent advances in single cell RNA sequencing technologies such as 10X and SmartSeq which can capture thousands of cells in single experiment. When the sequencing of individual cells is performed on different technologies this introduces unwanted technical variation (bias) specific to the technology and confounds attempts to merge scRNA-seq experiments into more complete atlases. To address this challenge, we developed scAlign to remove the effects of unwanted technical variation on gene expression specifically, scRNA-seq alignment based on advances in computer vision. scAlign, an unsupervised deep learning method, performs data alignment that can incorporate partial, overlapping or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets or conditions to characterize specific expression changes due to conditions such as age or disease.With the recent surge of atlases efforts across complex tissues, conditions, and species another challenge is how to integrate the deep characterizations of cell state with lower resolution assays of single cell or bulk genomics. Specifically, spatial and multi-omics assays do not collect RNA from a single cell but instead from a spot containing multiple cells or in the later contamination from the unintended collection of additional cells. We developed scProjection to join deeply sequenced atlases with lower resolution genomic assays to address the unwanted heterogeneity in mixed samples and project such samples in a way that recovers the underlying single-cell measurements. scProjection is demonstrated to accurately estimate the abundance of cell types that compose a mixed RNA sample while simultaneously identifying the gene expression measurements consistent for each cell type in the sample to identify cell type specific changes due spatial location of cells or disease state.
- Published
- 2022
19. Simple Structures in Deep Networks
- Author
-
Hyder, Rakib
- Subjects
Electrical engineering ,Continual learning ,Deep networks ,Generative models ,Inverse problems ,Low-rank factorization ,Phase retrieval - Abstract
Deep networks have received considerable attention in recent years due to their applications in different problems of science and engineering. This dissertation explores the application of deep networks in continual learning and inverse problems. In this work, we enforce some simple structures on the networks to achieve better solution in terms of performance, memory and computational cost.Continual Learning with Low-rank Increment: Continual learning is a process of training a single neural network on multiple tasks one after another, where training data for each task is often available only during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, continual learning methods use episodic memory, parameter regularization, masking and pruning, or extensible network structures. This work proposes a continual learning framework based on low-rank factorization of the network weights. To update the network for a new task, a rank-1 (or low-rank) matrix is learned and added to the weights of every layer. An additional selector vector is also introduced that assigns different weights to the low-rank matrices learned for the previous tasks. Our proposed approach demonstrates superior performance compared to the current state-of-the-art methods with much lower number of network parameters.Inverse Problems with Deep Networks: Inverse problems form a family of problems where we try to recover the true signal given the modified version of the signal. Since inverse problems are often ill-posed in nature, we often need to impose some constraints on the solution set. This dissertation mainly focuses on deep generative networks as a prior for solving inverse problems. Low-rank matrix and tensor structures have been used in this work as constraints on the input latent vectors of the deep generative networks to improve quality of the reconstruction with reduced parameters. This dissertation also explores unrolled networks where classical iterative solution approaches are structured as fixed layer networks with each iteration forming a layer of the network. We use such unrolled network structures to design sensing parameters for nonlinear inverse problems that led to achieving good reconstruction quality with a fixed number of layers (or iterations).
- Published
- 2022
20. Learning Grid Cells and Remapping in Curved Space: A Gauge Theoretic Perspective
- Author
-
Lutz, Anthony
- Subjects
Statistics ,Generative Models ,Geometric Learning ,Grid Cells ,Machine Learning - Abstract
This paper seeks to generalize a proposed model for grid cell learning in order to accommodate motion in curved space. The original model predicted grid cell motion on flat spaceas Lie group actions on the high dimensional grid cell vector. Expanding upon this idea,this thesis considers a general manifold where each covering chart is small enough to approximate flat space. Grid cell excitations are then sections of the vector bundle above theterrain manifold, and moving from one chart to the next promotes a global remapping ofthe grid cells, represented by a gauge transformation of the bundle space. Paths along thisterrain may then be associated by an isotropic collection of connection one-forms, takingvalues in the Lie algebra of the motion group. The connections may be used to define gridfield strength and curvature, which then may be used to analyze error in grid cell mappingdue to permutations in the terrain. Experimental results are displayed using Neural Networklearned weights to represent the updated grid cell firing structures.
- Published
- 2021
21. ATTRACTORS IN SONG
- Author
-
FRISTON, KARL and KIEBEL, STEFAN
- Subjects
Generative models ,Predictive coding ,Hierarchical ,Dynamic ,Non-linear ,Variational ,Birdsong - Abstract
This paper summarizes our recent attempts to integrate action and perception within a single optimization framework. We start with a statistical formulation of Helmholtz's ideas about neural energy to furnish a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts. Using constructs from statistical physics it can be shown that the problems of inferring the causes of our sensory inputs and learning regularities in the sensorium can be resolved using exactly the same principles. Furthermore, inference and learning can proceed in a biologically plausible fashion. The ensuing scheme rests on Empirical Bayes and hierarchical models of how sensory information is generated. The use of hierarchical models enables the brain to construct prior expectations in a dynamic and context-sensitive fashion. This scheme provides a principled way to understand many aspects of the brain's organization and responses. We will demonstrate the brain-like dynamics that this scheme entails by using models of birdsongs that are based on chaotic attractors with autonomous dynamics. This provides a nice example of how non-linear dynamics can be exploited by the brain to represent and predict dynamics in the environment.
- Published
- 2009
22. Statistical Modeling and Conceptualization of Visual Patterns
- Author
-
Zhu, Song C
- Subjects
perceptual organization ,descriptive models ,generative models ,causal models ,minimax enropy learning ,natural image statistics - Abstract
The objective of perceptual organization (grouping, segmentation and recognition) is to parse generic natural images into their constituent components which are respectively instances of a wide variety of visual patterns. These visual patterns are fundamentally stochastic processes governed by probabilistic models which ought to be learned from the statistics of natural images. In this paer,we review research steams from several disciplines , and divide existing models into four categories according to their semantic structures: descriptive models, causal Markov models, generative models, discriminative models. The objectives, principles, theories, and typical models are reviewed in each category. The central theme of this epistomlogical paper is to study the relationships between the four types of models and to pursue a unified mathematical framework for the conceptualization (or definition) and modeling of various visual patterns. In representation, we point out that the effective integration of descriptive and generative models is the future direction for statistical modeling. To make visual models tractable computationally, we study the causal Markov models as approximations and we observe that the discriminative models are computational heuristics for inferring generative models. Under this unified mathematical framework statistical models for various patterns should form a "continuous" spectrum - in the sense that they belong to a serial of probability families in the space of attributed graphs. Visual patterns and their parts are conceptualized as statistical ensembles governed by their models. These statistical models and concepts amount to a visual language with a hierarchy of vocabularies, which is essential for builing effective, robust, and generic vision systems.
- Published
- 2002
23. Modeling Visual Patterns by Intergrating Descriptive and Generative Methods
- Author
-
Guo, Cheng E, Zhu, Song C, and Wu, Ying N
- Subjects
Descriptive models ,Generative models ,Gibbs point process ,Markov chain ,Monte Carlo ,Markov random fields ,Minimax entropy learning ,Perceptual organization ,Texton models ,Visual learning - Abstract
This paper presents a class of statistical models that integrate two statistical modeling paradigms in the literature: I) Descriptive methods, such as Markov random fields and minimax entropy learning [41] and II) Generative methods, such as principal component analysis, independent component analysis [2] transformed component analysis [11], wavelet coding [27, 5], and sparse coding [30, 24]. In this paper, we demonstrate the integrated framwork by constructing a class of hierarchical models for texton patterns ) the term "texton" was coined by psychologist Julez in the early 80's). At the bottom level of the model, we assume that an observed texture image is generated by multiple hidden "texton maps", and textons on each map are translated, scaled, stretched, and oriented versions of a window function, like mini-templates or wavelet bases. The texton maps generate the observed image by occlusion or linear superposition. this bottom level of the model is generative in nature. At the top level of the model, the spatial arrangements of the textons in the texton maps are characterized by minimax entropy principle, which leads to embellished versions of Gibbs point rocess [34]. The top level of the model is descriptive in nature. We demonstrate the integrated model by a set of experiments.
- Published
- 2002
24. Deep Generative Models: Imitation Learning, Image Synthesis, and Compression
- Author
-
Ho, Jonathan
- Subjects
Computer science ,artificial intelligence ,compression ,generative models ,machine learning ,reinforcement learning ,unsupervised learning - Abstract
Machine learning has its roots in the design of algorithms that extract actionable structure from real world data. For high dimensional and high entropy data, machine learning techniques must cope with a fundamental tension arising from the curse of dimensionality: they must be computationally and statistically efficient despite the sheer size of the exponentially large spaces where meaningful signal is hidden. This thesis explores and attempts to resolve this tension in deep generative modeling.The first part of this thesis addresses imitation learning, the problem of reproducing the behavior of experts acting in dynamic environments. Supervised learning, to predict expert actions from states, suffers in statistical efficiency because large amounts of expert data are required to prevent action prediction errors from compounding over long behaviors. We propose a resolution in the form of an algorithm that learns a policy by matching the expert's state distribution. During learning, our algorithm continually executes the policy in the task environment and compares its states to the expert's on a gradually learned reward function. Allowing our algorithm to interact with the environment during training in this manner allows it to learn policies that stay on expert states even when expert data is extremely scarce.The second part of this thesis addresses modeling and compressing natural images using likelihood-based generative models, which are generative models trained with maximum likelihood to explicitly represent the probability distribution of data. When these models are scaled to high entropy datasets, they become computationally inefficient to employ for downstream tasks like image synthesis and compression. We present progress on these problems in the form of developments in flow models, a class of likelihood-based generative models that admit fast sampling and inference. We reduce flow model codelengths to be competitive with those of other types of likelihood-based generative models. Then, we develop the first computationally efficient compression algorithms for flow models, making our improved codelengths realizable in practice with fully parallelizable encoding and decoding.
- Published
- 2020
25. Generative Models for Content Creation
- Author
-
Lee, Hsin-Ying
- Subjects
Computer science ,content creation ,generative models - Abstract
Content creation can be broadly defined as a way of conveying thoughts and expressing ideas through some medium such as speech, text or any of various arts. The general goal of content creation is to generate contents that make the information accessible and understandable to audiences. In recent years, with the rapid progress in artificial intelligence, it has become an inevitable fact that the future of content creation is a powerful blend of machine technology and human creativity. However, it remains extremely challenging for machines to truly emulate what a content creator does. Therefore, instead of attempting to take over the role of professional content creators, we aim at (1) how to shorten the gap between professional content creators and general users with the help of machines, and (2) how to leverage machines to facilitate the creation process. In this work, we propose efficient algorithms based on generative models to tackle several content creation tasks that are originally time- and money-consuming. First, we address the problem of image-to-image translation. We propose a disentangled representation image-to-image translation (DRIT) framework to perform diverse translation without paired training data. Our model disentangles images into a domain-invariant content space and a domain-specific attribute space to enable diverse translation. Furthermore, to improve the diversity of the generated images, we propose a simple yet effective model-seeking regularization term. The proposed regularization term can serve as a plug-in term to various conditional generation tasks. Second, we address the music-to-dance translation task. Given an input music clip, we aim to generate a corresponding dancing sequence. We propose a synthesis-by-analysis learning framework. We first learn to perform basic movement, then learn how to combine basic. movements based on input music. The generated dancing sequences are consistent to the input music clips in terms of music style and audio beats. Third, we address the design layout generation task. Given a set of desired components as well as user-specified constraints, we aim to generate visually reasonable and appealing layouts. We propose a multi-stage framework. We first learn to predict complete relationships among components given the user-specified constraints, then we predict bounding boxes of all components. Finally, we finetune the prediction to further improve the alignment and visual quality.
- Published
- 2020
26. On the Scalable Construction of Measure Transport Maps and Applications in Health Analytics
- Author
-
Mendoza, Marcela P.
- Subjects
Biostatistics ,Artificial intelligence ,Applied mathematics ,Bayesian Inference ,Generative Models ,Machine Learning ,Optimal Transport - Abstract
Characterizing and sampling from probability distributions is useful to reason about uncertainty in large, complex, and multi-modal datasets. One established and increasingly popular method to do so involves finding transformations or transport maps between one distribution to another. The computation of these transport maps is the subject of the field of Optimal Transportation, a rich area of mathematical theory that has led to many applications in machine learning, economics, and statistics. Finding these transport maps, however, usually comprises computational difficulties, particularly when datasets are large both in dimension and the number of samples to learn from.Building upon previous work in our group, we introduce a formulation to find transport maps that is parallelizable and solvable with convex optimization methods. We show applications in the field of health analytics encompassing scalable Bayesian inference, density estimation, and generative models. We show how this formulation is scalable with the dimension of data and can be parallelized utilizing a sweep of architectures such as cloud computing services and Graphics Processing Units. Within the context of Bayesian inference, we present a distributed framework for finding the full posterior distribution associated with LASSO problems and show advantages compared to traditional methods of computing this posterior. Finally, we discuss novel methods to reduce the number of parameters necessary to approximate transport maps.
- Published
- 2018
27. Generative Modeling and Unsupervised Learning in Computer Vision
- Author
-
Xie, Jianwen
- Subjects
Statistics ,Computer science ,auto-encoder ,convolutional neural network ,dynamic texture ,generative models ,Markov random fields ,shared sparse coding - Abstract
Developing statistical models and associated learning algorithms for the rich visual patterns in natural images is of fundamental importance for computer vision. More importantly, the endeavor has the potential to enrich our treasured collections of statistical models and expand the already vast reach of machine learning methodologies. Generative models enable us to learn useful features and representations from the natural images in an unsupervised manner. The learned features and representations can be more interpretable and explicit than those learned by the discriminative models, especially if the learned models are sparse. The objective of this dissertation is to learn probabilistic generative models for representing visual patterns in natural images.In this dissertation, we first develop a sparse FRAME model as a generative model for representing natural image patterns. The model is an inhomogeneous and sparsified version of the original FRAME (Filters, Random field, And Maximum Entropy) model. More specifically, it is a probability distribution defined on the image space which combines wavelet sparse coding and Markov random field modeling. We propose two different algorithms to learn this model. The first is a two-stage algorithm that initially selects the wavelets by a shared sparse coding algorithm and then estimates the weight parameters by maximum likelihood via stochastic gradient ascent. The second approach utilizes a single-stage algorithm that uses a generative boosting method combined with a Gibbs sampler on the reconstruction coefficients of the selected wavelets. Our experimental results show that the proposed sparse FRAME model can not only learn to generate realistic images of a wide variety of image patterns, but can also be used for object detection, clustering, codebook learning, bag-of-word image classification, and domain adaptation.We further propose a hierarchical version of FRAME models that we call generative ConvNet. The probability distribution of the generative ConvNet model is in the form of exponential tilting of a reference distribution, and the exponential tilting is defined by ConvNet that involves multiple layers of liner filtering and non-linear transformation. Assuming re-lu non-linearity and Gaussian white noise reference distribution, we show that the generative ConvNet model contains a representational structure with multiple layers of binary activation variables. The model is piecewise Gaussian, where each piece is determined by an instantiation of the binary activation variables that reconstruct the mean of the Gaussian piece. The Langevin dynamics for synthesis is driven by the reconstruction error, and the corresponding gradient descent dynamics converges to a local energy minimum that is auto-encoding. As for learning, we show that the contrastive divergence learning tends to reconstruct the observed images. We also generalize the spatial generative ConvNet to model dynamic textures by adding the temporal dimension. The spatial-temporal generative ConvNet consists of multiple layers of spatial-temporal filters to capture the spatial-temporal patterns in the dynamic textures. Finally, we show that the maximum likelihood learning algorithm can generate not only vivid natural images but also realistic dynamic textures.We lastly investigate a connection of the proposed models to auto-encoders. We show that the local modes of both the sparse FRAME model and the generative ConvNet are represented by auto-encoders, with explicit encoding of the data in terms of filtering operations, and explicit decoding that generates the data in terms of the basis functions that corresponds to the filters. We call these auto-encoders the Hopfield auto-encoders because they describe the local energy minima of the models. We develop learning algorithms to learn those models by fitting Hopfield auto-encoders. We show that it is possible to select wavelets and estimate weight parameters for sparse FRAME models by fitting Hopfield auto-encoders. Moreover, meaningful dictionaries of filters can be obtained by learning hierarchical Hopfield auto-encoders for generative ConvNet. Without MCMC, the Hopfield auto-encoder has the potential to tremendously accelerate learning that is crucial for big data.
- Published
- 2016
28. Learning Inhomogeneous FRAME Models for Object Patterns
- Author
-
Xie, Jianwen
- Subjects
Statistics ,Computer science ,Generative models ,Markov random fields - Abstract
This research investigates an inhomogeneous version of the FRAME (Filters, Random field, And Maximum Entropy) model and apply it to modeling object patterns. The inhomogeneous FRAME is a non-stationary Markov random field model that reproduces the observed marginal distributions or statistics of filter responses at all the different locations, scales and orientations. The experiments show that the inhomogeneous FRAME model is capable of generating a wide variety of object patterns in natural images. It is useful for object detection, alignment, and clustering.
- Published
- 2014
29. ATTRACTORS IN SONG
- Author
-
KARL FRISTON and STEFAN KIEBEL
- Subjects
Predictive coding ,Quantitative Biology::Neurons and Cognition ,Dynamic ,Non-linear ,Physical Sciences and Mathematics ,Generative models ,Life Sciences ,Variational ,Birdsong ,Hierarchical ,Generative models, predictive coding, hierarchical, dynamic, non-linear, variational, birdsong - Abstract
This paper summarizes our recent attempts to integrate action and perception within a single optimization framework. We start with a statistical formulation of Helmholtz's ideas about neural energy to furnish a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts. Using constructs from statistical physics it can be shown that the problems of inferring the causes of our sensory inputs and learning regularities in the sensorium can be resolved using exactly the same principles. Furthermore, inference and learning can proceed in a biologically plausible fashion. The ensuing scheme rests on Empirical Bayes and hierarchical models of how sensory information is generated. The use of hierarchical models enables the brain to construct prior expectations in a dynamic and context-sensitive fashion. This scheme provides a principled way to understand many aspects of the brain's organization and responses. We will demonstrate the brain-like dynamics that this scheme entails by using models of birdsongs that are based on chaotic attractors with autonomous dynamics. This provides a nice example of how non-linear dynamics can be exploited by the brain to represent and predict dynamics in the environment.
- Published
- 2009
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.