Author: "Aila, Timo" / Publisher: arxiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Aila, Timo"' showing total 7 results

Start Over Author "Aila, Timo" Publisher arxiv

7 results on '"Aila, Timo"'

1. StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Author: Sauer, Axel, Karras, Tero, Laine, Samuli, Geiger, Andreas, and Aila, Timo
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward pass. They are thus much faster, but they currently remain far behind the state-of-the-art in large-scale text-to-image synthesis. This paper aims to identify the necessary steps to regain competitiveness. Our proposed model, StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable variation vs. text alignment tradeoff. StyleGAN-T significantly improves over previous GANs and outperforms distilled diffusion models - the previous state-of-the-art in fast text-to-image synthesis - in terms of sample quality and speed., Comment: Project page: https://sites.google.com/view/stylegan-t/
Published: 2023
Full Text: View/download PDF

2. Generating Long Videos of Dynamic Scenes

Author: Brooks, Tim, Hellsten, Janne, Aittala, Miika, Wang, Ting-Chun, Aila, Timo, Lehtinen, Jaakko, Liu, Ming-Yu, Efros, Alexei A., and Karras, Tero
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Neural and Evolutionary Computing, Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never change due to over-reliance on inductive biases to provide temporal consistency, such as a single latent code that dictates content for the entire video. On the other extreme, without long-term consistency, generated videos may morph unrealistically between different scenes. To address these limitations, we prioritize the time axis by redesigning the temporal latent representation and learning long-term consistency from data by training on longer videos. To this end, we leverage a two-phase training strategy, where we separately train using longer videos at a low resolution and shorter videos at a high resolution. To evaluate the capabilities of our model, we introduce two new benchmark datasets with explicit focus on long-term temporal dynamics.
Published: 2022
Full Text: View/download PDF

3. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Author: Balaji, Yogesh, Nah, Seungjun, Huang, Xun, Vahdat, Arash, Song, Jiaming, Zhang, Qinsheng, Kreis, Karsten, Aittala, Miika, Aila, Timo, Laine, Samuli, Catanzaro, Bryan, Karras, Tero, and Liu, Ming-Yu
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are trained for the specific stages of the iterative generation process. Our ensemble of diffusion models, called eDiff-I, results in improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text-to-image diffusion models on the standard benchmark. In addition, we train our model to exploit a variety of embeddings for conditioning, including the T5 text, CLIP text, and CLIP image embeddings. We show that these different embeddings lead to different behaviors. Notably, the CLIP image embedding allows an intuitive way of transferring the style of a reference image to the target text-to-image output. Lastly, we show a technique that enables eDiff-I's "paint-with-words" capability. A user can select the word in the input text and paint it in a canvas to control the output, which is very handy for crafting the desired image in mind. The project page is available at https://deepimagination.cc/eDiff-I/
Published: 2022
Full Text: View/download PDF

4. Training Generative Adversarial Networks with Limited Data

Author: Karras, Tero, Aittala, Miika, Hellsten, Janne, Laine, Samuli, Lehtinen, Jaakko, and Aila, Timo
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.
Published: 2020
Full Text: View/download PDF

5. Semi-supervised semantic segmentation needs strong, varied perturbations

Author: French, Geoff, Laine, Samuli, Aila, Timo, Mackiewicz, Michal, and Finlayson, Graham
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Consistency regularization describes a class of approaches that have yielded ground breaking results in semi-supervised classification problems. Prior work has established the cluster assumption - under which the data distribution consists of uniform class clusters of samples separated by low density regions - as important to its success. We analyze the problem of semantic segmentation and find that its' distribution does not exhibit low density regions separating classes and offer this as an explanation for why semi-supervised segmentation is a challenging problem, with only a few reports of success. We then identify choice of augmentation as key to obtaining reliable performance without such low-density regions. We find that adapted variants of the recently proposed CutOut and CutMix augmentation techniques yield state-of-the-art semi-supervised semantic segmentation results in standard datasets. Furthermore, given its challenging nature we propose that semantic segmentation acts as an effective acid test for evaluating semi-supervised regularizers. Implementation at: https://github.com/Britefury/cutmix-semisup-seg., Comment: 21 pages, 7 figures, accepted to BMVC 2020
Published: 2019
Full Text: View/download PDF

6. Noise2Noise: Learning Image Restoration without Clean Data

Author: Lehtinen, Jaakko, Munkberg, Jacob, Hasselgren, Jon, Laine, Samuli, Karras, Tero, Aittala, Miika, and Aila, Timo
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Computer Science::Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: We apply basic statistical reasoning to signal reconstruction by machine learning -- learning to map corrupted observations to clean signals -- with a simple and powerful conclusion: it is possible to learn to restore images by only looking at corrupted examples, at performance at and sometimes exceeding training using clean data, without explicit image priors or likelihood models of the corruption. In practice, we show that a single model learns photographic noise removal, denoising synthetic Monte Carlo images, and reconstruction of undersampled MRI scans -- all corrupted by different processes -- based on noisy data only., Comment: Added link to official implementation and updated MRI results to match it
Published: 2018
Full Text: View/download PDF

7. Progressive Growing of GANs for Improved Quality, Stability, and Variation

Author: Karras, Tero, Aila, Timo, Laine, Samuli, and Lehtinen, Jaakko
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset., Comment: Final ICLR 2018 version
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Aila, Timo"'

1. StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

2. Generating Long Videos of Dynamic Scenes

3. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

4. Training Generative Adversarial Networks with Limited Data

5. Semi-supervised semantic segmentation needs strong, varied perturbations

6. Noise2Noise: Learning Image Restoration without Clean Data

7. Progressive Growing of GANs for Improved Quality, Stability, and Variation

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

7 results on '"Aila, Timo"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources