12 results on '"Generative models"'
Search Results
2. Content-aware preserving image generation
- Author
-
Le, Giang H., Nguyen, Anh Q., Kang, Byeongkeun, and Lee, Yeejin
- Published
- 2025
- Full Text
- View/download PDF
3. An analysis of pre-trained stable diffusion models through a semantic lens.
- Author
-
Bonechi, Simone, Andreini, Paolo, Corradini, Barbara Toniella, and Scarselli, Franco
- Subjects
- *
STABLE Diffusion , *RESEARCH personnel , *SEMANTICS , *GENERALIZATION , *REALISM , *PROBABILISTIC generative models - Abstract
Recently, generative models for images have garnered remarkable attention, due to their effective generalization ability and their capability to generate highly detailed and realistic content. Indeed, the success of generative networks (e.g. , BigGAN, StyleGAN, Diffusion Models) has driven researchers to develop increasingly powerful models. As a result, we have observed an unprecedented improvement in terms of both image resolution and realism, making generated images indistinguishable from real ones. In this work, we focus on a family of generative models known as Stable Diffusion Models (SDMs), which have recently emerged due to their ability to generate images in a multimodal setup (i.e. , from a textual prompt) and have outperformed adversarial networks by learning to reverse a diffusion process. Given the complexity of these models that makes it hard to retrain them, researchers started to exploit pre-trained SDMs to perform downstream tasks (e.g. , classification and segmentation), where semantics plays a fundamental role. In this context, understanding how well the model preserves semantic information may be crucial to improve its performance. This paper presents an approach aimed at providing insights into the properties of a pre-trained SDM through the semantic lens. In particular, we analyze the features extracted by the U-Net within a SDM to explore whether and how the semantic information of an image is preserved in its internal representation. For this purpose, different distance measures are compared, and an ablation study is performed to select the layer (or combination of layers) of the U-Net that best preserves the semantic information. We also seek to understand whether semantics are preserved when the image undergoes simple transformations (e.g. , rotation, flip, scale, padding, crop, and shift) and for a different number of diffusion denoising steps. To evaluate these properties, we consider popular benchmarks for semantic segmentation tasks (e.g. , COCO, and Pascal-VOC). Our experiments suggest that the first encoder layer at 16 × 16 resolution effectively preserves semantic information. However, increasing inference steps (even for a minimal amount of noise) and applying various image transformations can affect the diffusion U-Net's internal feature representation. Additionally, we propose some examples taken from a video benchmark (DAVIS dataset), where we investigate if an object instance within a video preserves its internal representation even after several frames. Our findings suggest that the internal object representation remains consistent across multiple frames in a video, as long as the configuration changes are not excessive. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
4. Patch-DFD: Patch-based end-to-end DeepFake discriminator.
- Author
-
Yu, Miaomiao, Ju, Sigang, Zhang, Jun, Li, Shuohao, Lei, Jun, and Li, Xiaofei
- Subjects
- *
FORGERY , *FACE , *DETECTORS , *VOTING - Abstract
• DeepFake detection is usually regarded as a binary classification task. • It is important to pay attention to the local discriminative features of the tampered images. • For patch-based solutions, resizing each patch before feeding it into the CNN may result in the loss and change of its original details. • Facial Patch Mapping strategy maintains the spoof patterns of each patch to the greatest extent, while also improving the training and inference speed. • Local voting scheme improves the detection accuracy. Facial forgery by DeepFake has recently attracted more public attention. Face image contains sensitive personal information, abuse of such technology will grow into a menace. Since the difference between real and fake faces is usually subtle and local, the general detection framework of applying the backbone network to capture the global features of the entire face and then feeding it into the binary classifier is not optimal. In addition, patch-based schemes are widely used in various computer vision tasks, including image classification. However, how to extract features for location-specific and arbitrary-shaped patches while preserving their original information and spoof patterns as much as possible requires further exploration. In this paper, a novel deep forgery detector called Patch-DFD is proposed, which applies a patch-based solution of Facial Patch Mapping (FPM) to obtain several part-based feature maps, preserving original details of each facial patch to the greatest extent. Besides, the BM-pooling module aims to fix the size of the feature maps while reducing quantization errors. The local voting strategy is finally used to fuse the results of parts detectors, so as to more accurately identify the fake faces generated by deep generative models. Compared to typical patch-wise framework that takes patch inputs, our scheme is more efficient due to the absence of repeated convolution operations. Moreover, extensive experiments conducted on publicly available face forensics datasets have proved that the effectiveness of our framework. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. RealSinger: Ultra-realistic singing voice generation via stochastic differential equations.
- Author
-
Shi, Ziqiang and Wu, Shoule
- Subjects
- *
SINGING , *SPEECH synthesis , *MUSIC scores , *WHITE noise , *STOCHASTIC processes - Abstract
Synthesizing high-quality singing voice from music score is a challenging problem in music generation and has many practical applications. Samples generated by existing singing voice synthesis (SVS) systems can roughly reflect the lyrics, pitch and duration in a given score, but they fail to contain necessary details. In this paper, based on stochastic differential equations (SDE) we propose RealSinger to generate 22.05kHz ultra-realistic singing voice conditioned on a music score. Our RealSinger learns to find the stochastic process path from a source of white noise to the target singing voice manifold under the conditional music score, allowing to sing the music score while maintaining the local voice details of the target singer. During training, our model learns to accurately predict the direction of movement in the ambient Euclidean space onto the low-dimensional singing voice manifold. RealSinger's framework is very flexible. It can either generate intermediate feature representations of the singing voice, such as mel-spectrogram, or directly generate the final waveform, as in the end-to-end style which rectify defects and accumulation errors introduced by two-stage connected singing synthesis systems. An extensive subjective and objective test on benchmark datasets shows significant gains in perceptual quality using RealSinger. The mean opinion scores (MOS) obtained with RealSinger are closer to those of the human singer's original high-fidelity singing voice than to those obtained with any state-of-the-art method. Audio samples are available at https://realsinger.github.io/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Learning hierarchical discrete prior for co-speech gesture generation.
- Author
-
Zhang, Jian and Yoshie, Osamu
- Subjects
- *
GESTURE , *VANILLA - Abstract
In the context of Co-Speech Gesture Generation, Vector-Quantized Variational Autoencoder (VQ-VAE) based methods have shown promising results by separating the generation process into two stages: learning discrete gesture priors via pretraining for gesture reconstruction, which encodes gesture into a discrete codebook, followed by learning the mapping between speech audio and gesture codebook indices. This design leverages pretraining of motion VQVAE with the motion reconstruction task to improve the quality of generated gestures. However, the vanilla VQVAE's codebook often fails to encode both low-level and high-level gesture features adequately, resulting in limited reconstruction quality and generation performance. To address this, we propose the Hierarchical Discrete Audio-to-Gesture (HD-A2G), which innovates (i) a two-stage hierarchical codebook structure for capturing high-level and low-level gesture priors, enabling the reconstruction of gesture details. (ii) it further integrates high-level and low-level feature using an AdaIn layer, effectively enhancing the learning of gesture's rhythm and content. (iii) it explicitly maps text and audio onset features to the appropriate levels of the codebook, ensuring learning accurate hierarchical associations for the generation stage. Experimental results on the BEAT and Trinity datasets demonstrate that HD-A2G outperform the baseline method in both pretrained gesture reconstruction and audio-conditioned gesture generation with a clear margin, achieving the state-of-the-art performance qualitatively and quantitatively. • Proposes HD-A2G with two-stage hierarchical codebook for capturing high-level and low-level gesture priors. • Integrates high-level and low-level features using AdaIn layer for enhancing gesture rhythm and content. • Explicitly maps text and audio onset features to appropriate codebook levels for accurate hierarchical associations outperforms baseline on BEAT and Trinity datasets for gesture reconstruction and audio-conditioned generation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Semi-supervised image attribute editing using generative adversarial networks.
- Author
-
Dogan, Yahya and Keles, Hacer Yalim
- Subjects
- *
DIESEL electric power-plants , *CONVOLUTIONAL neural networks , *INVERSE functions , *IMAGE representation - Abstract
Image attribute editing is a challenging problem that has been recently studied by many researchers using generative networks. The challenge is in the manipulation of selected attributes of images while preserving the other details. The method to achieve this goal is to find an accurate latent vector representation of an image and a direction corresponding to the attribute. Almost all the works in the literature use labeled datasets in a supervised setting for this purpose. In this study, we introduce an architecture called Cyclic Reverse Generator (CRG), which allows learning the inverse function of the generator accurately via an encoder in an unsupervised setting by utilizing cyclic cost minimization. Attribute editing is then performed using the CRG models for finding desired attribute representations in the latent space. In this work, we use two arbitrary reference images, with and without desired attributes, to compute an attribute direction for editing. We show that the proposed approach performs better in terms of image reconstruction compared to the existing end-to-end generative models both quantitatively and qualitatively. We demonstrate state-of-the-art results on both real images and generated images in CelebA dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
8. Deep-FS: A feature selection algorithm for Deep Boltzmann Machines.
- Author
-
Taherkhani, Aboozar, Cosma, Georgina, and McGinnity, T. M
- Subjects
- *
BOLTZMANN machine , *ARTIFICIAL neural networks , *DEEP learning , *FEATURE selection , *ALGORITHMS - Abstract
Abstract A Deep Boltzmann Machine is a model of a Deep Neural Network formed from multiple layers of neurons with nonlinear activation functions. The structure of a Deep Boltzmann Machine enables it to learn very complex relationships between features and facilitates advanced performance in learning of high-level representation of features, compared to conventional Artificial Neural Networks. Feature selection at the input level of Deep Neural Networks has not been well studied, despite its importance in reducing the input features processed by the deep learning model, which facilitates understanding of the data. This paper proposes a novel algorithm, Deep Feature Selection (Deep-FS), which is capable of removing irrelevant features from large datasets in order to reduce the number of inputs which are modelled during the learning process. The proposed Deep-FS algorithm utilizes a Deep Boltzmann Machine, and uses knowledge which is acquired during training to remove features at the beginning of the learning process. Reducing inputs is important because it prevents the network from learning the associations between the irrelevant features which negatively impact on the acquired knowledge of the network about the overall distribution of the data. The Deep-FS method embeds feature selection in a Restricted Boltzmann Machine which is used for training a Deep Boltzmann Machine. The generative property of the Restricted Boltzmann Machine is used to reconstruct eliminated features and calculate reconstructed errors, in order to evaluate the impact of eliminating features. The performance of the proposed approach was evaluated with experiments conducted using the MNIST, MIR-Flickr, GISETTE, MADELON and PANCAN datasets. The results revealed that the proposed Deep-FS method enables improved feature selection without loss of accuracy on the MIR-Flickr dataset, where Deep-FS reduced the number of input features by removing 775 features without reduction in performance. With regards to the MNIST dataset, Deep-FS reduced the number of input features by more than 45%; it reduced the network error from 0.97% to 0.90%, and also reduced processing and classification time by more than 5.5%. Additionally, when compared to classical feature selection methods, Deep-FS returned higher accuracy. The experimental results on GISETTE, MADELON and PANCAN showed that Deep-FS reduced 81%, 57% and 77% of the number of input features, respectively. Moreover, the proposed feature selection method reduced the classifier training time by 82%, 70% and 85% on GISETTE, MADELON and PANCAN datasets, respectively. Experiments with various datasets, comprising a large number of features and samples, revealed that the proposed Deep-FS algorithm overcomes the main limitations of classical feature selection algorithms. More specifically, most classical methods require, as a prerequisite, a pre-specified number of features to retain, however in Deep-FS this number is identified automatically. Deep-FS performs the feature selection task faster than classical feature selection algorithms which makes it suitable for deep learning tasks. In addition, Deep-FS is suitable for finding features in large and big datasets which are normally stored in data batches for faster and more efficient processing. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
9. Exploiting local structure in Boltzmann machines
- Author
-
Schulz, Hannes, Müller, Andreas, and Behnke, Sven
- Subjects
- *
ARTIFICIAL neural networks , *STOCHASTIC processes , *COMPUTATIONAL complexity , *COMPUTER vision , *MACHINE learning , *COMPUTER science - Abstract
Abstract: Restricted Boltzmann machines (RBM) are well-studied generative models. For image data, however, standard RBMs are suboptimal, since they do not exploit the local nature of image statistics. We modify RBMs to focus on local structure by restricting visible–hidden interactions. We model long-range dependencies using direct or indirect lateral interaction between hidden variables. While learning in our model is much faster, it retains generative and discriminative properties of RBMs of similar complexity. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
10. Learning principal directions: Integrated-squared-error minimization
- Author
-
Ahn, Jong-Hoon, Oh, Jong-Hoon, and Choi, Seungjin
- Subjects
- *
PRINCIPAL components analysis , *STATISTICAL correlation , *FACTOR analysis , *STATISTICS , *LINEAR statistical models , *DATA , *ALGORITHMS - Abstract
A common derivation of principal component analysis (PCA) is based on the minimization of the squared-error between centered data and linear model, corresponding to the reconstruction error. In fact, minimizing the squared-error leads to principal subspace analysis where scaled and rotated principal axes of a set of observed data, are estimated. In this paper, we introduce and investigate an alternative error measure, integrated-squared error (ISE), the minimization of which determines the exact principal axes (without rotational ambiguity) of a set of observed data. We show that exact principal directions emerge from the minimization of ISE. We present a simple EM algorithm, ‘EM-ePCA’, which is similar to EM-PCA [S. T. Roweis, EM algorithms for PCA and SPCA, in: Advances in Neural Information Processing Systems, vol. 10, MIT Press, Cambridge, 1998, pp. 626–632. ], but finds exact principal directions without rotational ambiguity. In addition, we revisit the generalized Hebbian algorithm (GHA) and show that it emerges from the ISE minimization in a single-layer linear feedforward neural network. [Copyright &y& Elsevier]
- Published
- 2007
- Full Text
- View/download PDF
11. A two-layer temporal generative model of natural video exhibits complex-cell-like pooling of simple cell outputs
- Author
-
Hurri, Jarmo and Hyvärinen, Aapo
- Subjects
- *
VISUAL cortex , *NEUROSCIENCES - Abstract
We present a two-layer dynamic generative model of the statistical structure of natural image sequences. The second layer of the model is a linear mapping from simple cell outputs to pixel values, as in most work on natural image statistics. The first layer models the dependencies of the activity levels (amplitudes or variances) of the simple cells, using a multivariate autoregressive model. The second layer shows emergence of basis vectors that are localized, oriented and have different scales, just like previous work. But our new model enables the first layer to learn connections between the simple cells that are similar to complex cell pooling: connections are strong among cells with similar location, frequency and orientation. In contrast to previous work in which one of the layers needed to be fixed in advance, the dynamic model enables us to estimate both of the layers simultaneously from natural data. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
12. Latent variable models for the topographic organisation of discrete and strictly positive data
- Author
-
Girolami, Mark
- Subjects
- *
TOPOGRAPHIC maps , *FACTORIZATION - Abstract
This paper is concerned with learning dense low-dimensional representations of high-dimensional positive data. The positive data may be continuous, discrete binary or count based. In addition to the low-dimensional data model, a topographic ordering of the representation is desired. The primary motivation for this work is the requirement for a low-dimensional interpretation of sparse vector space models of text documents which may take the form of binary, count based or real multivariate data. The generative topographic mapping (GTM) was developed and introduced as a principled alternative to the self-organising map for, principally, visualising high-dimensional continuous data. The GTM is one method by which a topographically organised low-dimensional data representation may be realised. There are many cases where the observation data is discrete and the application of methods developed specifically for continuous data is inappropriate. Based on the continuous GTM data model a non-linear latent variable model for modelling high-dimensional binary data is presented. The non-negative factorisation of a positive matrix which ensures a topographic ordering of the constituent factors is also presented as a principled yet non-probabilistic alternative to the GTM model. Experimental demonstrations of both methods are provided based on representing binary coded handwritten digits and the topographic organisation and visualisation of a collection of text based documents. [Copyright &y& Elsevier]
- Published
- 2002
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.