Author: "Li, Chongxuan" / Topic: computer science - artificial intelligence - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Chongxuan"' showing total 13 results

Start Over Author "Li, Chongxuan" Topic computer science - artificial intelligence

13 results on '"Li, Chongxuan"'

1. Scaling up Masked Diffusion Models on Text

Author: Nie, Shen, Zhu, Fengqi, Du, Chao, Pang, Tianyu, Liu, Qian, Zeng, Guangtao, Lin, Min, and Li, Chongxuan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Masked diffusion models (MDMs) have shown promise in language modeling, yet their scalability and effectiveness in core language tasks, such as text generation and language understanding, remain underexplored. This paper establishes the first scaling law for MDMs, demonstrating a scaling rate comparable to autoregressive models (ARMs) and a relatively small compute gap. Motivated by their scalability, we train a family of MDMs with up to 1.1 billion (B) parameters to systematically evaluate their performance against ARMs of comparable or larger sizes. Fully leveraging the probabilistic formulation of MDMs, we propose a simple yet effective unsupervised classifier-free guidance that effectively exploits large-scale unpaired data, boosting performance for conditional inference. In language understanding, the 1.1B MDM outperforms the 1.1B TinyLlama model trained on the same data across four of eight zero-shot benchmarks. Notably, it achieves competitive math reasoning ability with the 7B Llama-2 model on the GSM8K dataset. In text generation, MDMs provide a flexible trade-off compared to ARMs utilizing KV-cache: MDMs match the performance of ARMs while being 1.4 times faster or achieving higher quality than ARMs at a higher computational cost. Moreover, MDMs address challenging tasks for ARMs by effectively handling bidirectional reasoning and adapting to temporal shifts in data. Notably, a 1.1B MDM breaks the reverse curse encountered by much larger ARMs with significantly more data and computation, such as 13B Llama-2 and 175B GPT-3. Our code is available at https://github.com/ML-GSAI/SMDM.
Published: 2024

2. Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

Author: Zhao, Juntu, Deng, Junyu, Ye, Yixin, Li, Chongxuan, Deng, Zhijie, and Wang, Dequan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Advancements in text-to-image diffusion models have broadened extensive downstream practical applications, but such models often encounter misalignment issues between text and image. Taking the generation of a combination of two disentangled concepts as an example, say given the prompt "a tea cup of iced coke", existing models usually generate a glass cup of iced coke because the iced coke usually co-occurs with the glass cup instead of the tea one during model training. The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the "a tea cup of iced coke" phenomenon as Latent Concept Misalignment (LC-Mis). We leverage large language models (LLMs) to thoroughly investigate the scope of LC-Mis, and develop an automated pipeline for aligning the latent semantics of diffusion models to text prompts. Empirical assessments confirm the effectiveness of our approach, substantially reducing LC-Mis errors and enhancing the robustness and versatility of text-to-image diffusion models. The code and dataset are here: https://github.com/RossoneriZhao/iced_coke., Comment: Accepted by the 18th European Conference on Computer Vision ECCV 2024
Published: 2024

3. Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

Author: Zhao, Min, Zhu, Hongzhou, Xiang, Chendong, Zheng, Kaiwen, Li, Chongxuan, and Zhu, Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Diffusion models have obtained substantial progress in image-to-video generation. However, in this paper, we find that these models tend to generate videos with less motion than expected. We attribute this to the issue called conditional image leakage, where the image-to-video diffusion models (I2V-DMs) tend to over-rely on the conditional image at large time steps. We further address this challenge from both inference and training aspects. First, we propose to start the generation process from an earlier time step to avoid the unreliable large-time steps of I2V-DMs, as well as an initial noise distribution with optimal analytic expressions (Analytic-Init) by minimizing the KL divergence between it and the actual marginal distribution to bridge the training-inference gap. Second, we design a time-dependent noise distribution (TimeNoise) for the conditional image during training, applying higher noise levels at larger time steps to disrupt it and reduce the model's dependency on it. We validate these general strategies on various I2V-DMs on our collected open-domain image benchmark and the UCF101 dataset. Extensive results show that our methods outperform baselines by producing higher motion scores with lower errors while maintaining image alignment and temporal consistency, thereby yielding superior overall performance and enabling more accurate motion control. The project page: \url{https://cond-image-leak.github.io/}., Comment: NeurIPS 2024. Project page: https://cond-image-leak.github.io/
Published: 2024

4. PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

Author: Zhong, Yong, Zhao, Min, You, Zebin, Yu, Xiaofeng, Zhang, Changwang, and Li, Chongxuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: In this paper, we introduce PoseCrafter, a one-shot method for personalized video generation following the control of flexible poses. Built upon Stable Diffusion and ControlNet, we carefully design an inference process to produce high-quality videos without the corresponding ground-truth frames. First, we select an appropriate reference frame from the training video and invert it to initialize all latent variables for generation. Then, we insert the corresponding training pose into the target pose sequences to enhance faithfulness through a trained temporal attention module. Furthermore, to alleviate the face and hand degradation resulting from discrepancies between poses of training videos and inference poses, we implement simple latent editing through an affine transformation matrix involving facial and hand landmarks. Extensive experiments on several datasets demonstrate that PoseCrafter achieves superior results to baselines pre-trained on a vast collection of videos under 8 commonly used metrics. Besides, PoseCrafter can follow poses from different individuals or artificial edits and simultaneously retain the human identity in an open-domain training video. Our project page is available at https://ml-gsai.github.io/PoseCrafter-demo/.
Published: 2024

5. Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

Author: Xue, Kaiwen, Zhou, Yuhao, Nie, Shen, Min, Xu, Zhang, Xiaolu, Zhou, Jun, and Li, Chongxuan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling both continuous and discrete data, while simultaneously maintaining fast sampling capabilities. This paper aims to understand and enhance BFNs by connecting them with DMs through stochastic differential equations (SDEs). We identify the linear SDEs corresponding to the noise-addition processes in BFNs, demonstrate that BFN's regression losses are aligned with denoise score matching, and validate the sampler in BFN as a first-order solver for the respective reverse-time SDE. Based on these findings and existing recipes of fast sampling in DMs, we propose specialized solvers for BFNs that markedly surpass the original BFN sampler in terms of sample quality with a limited number of function evaluations (e.g., 10) on both image and text datasets. Notably, our best sampler achieves an increase in speed of 5~20 times for free. Our code is available at https://github.com/ML-GSAI/BFN-Solver., Comment: Published as a conference paper at ICML 2024
Published: 2024

6. Graph Diffusion Policy Optimization

Author: Liu, Yijing, Du, Chao, Pang, Tianyu, Li, Chongxuan, Lin, Min, and Chen, Wei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science
Abstract: Recent research has made significant progress in optimizing diffusion models for downstream objectives, which is an important pursuit in fields such as graph generation for drug design. However, directly applying these models to graph presents challenges, resulting in suboptimal performance. This paper introduces graph diffusion policy optimization (GDPO), a novel approach to optimize graph diffusion models for arbitrary (e.g., non-differentiable) objectives using reinforcement learning. GDPO is based on an eager policy gradient tailored for graph diffusion models, developed through meticulous analysis and promising improved performance. Experimental results show that GDPO achieves state-of-the-art performance in various graph generation tasks with complex and diverse objectives. Code is available at https://github.com/sail-sg/GDPO., Comment: NeurIPS 2024
Published: 2024

7. Gaussian Mixture Solvers for Diffusion Models

Author: Guo, Hanzhong, Lu, Cheng, Bao, Fan, Pang, Tianyu, Yan, Shuicheng, Du, Chao, and Li, Chongxuan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, diffusion models have achieved great success in generative tasks. Sampling from diffusion models is equivalent to solving the reverse diffusion stochastic differential equations (SDEs) or the corresponding probability flow ordinary differential equations (ODEs). In comparison, SDE-based solvers can generate samples of higher quality and are suited for image translation tasks like stroke-based synthesis. During inference, however, existing SDE-based solvers are severely constrained by the efficiency-effectiveness dilemma. Our investigation suggests that this is because the Gaussian assumption in the reverse transition kernel is frequently violated (even in the case of simple mixture data) given a limited number of discretization steps. To overcome this limitation, we introduce a novel class of SDE-based solvers called \emph{Gaussian Mixture Solvers (GMS)} for diffusion models. Our solver estimates the first three-order moments and optimizes the parameters of a Gaussian mixture transition kernel using generalized methods of moments in each step during sampling. Empirically, our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis in various diffusion models, which validates the motivation and effectiveness of GMS. Our code is available at https://github.com/Guohanzhong/GMS., Comment: NeurIPS 2023
Published: 2023

8. On Memorization in Diffusion Models

Author: Gu, Xiangming, Du, Chao, Pang, Tianyu, Li, Chongxuan, Lin, Min, and Wang, Ye
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Due to their capacity to generate novel and high-quality samples, diffusion models have attracted significant research interest in recent years. Notably, the typical training objective of diffusion models, i.e., denoising score matching, has a closed-form optimal solution that can only generate training data replicating samples. This indicates that a memorization behavior is theoretically expected, which contradicts the common generalization ability of state-of-the-art diffusion models, and thus calls for a deeper understanding. Looking into this, we first observe that memorization behaviors tend to occur on smaller-sized datasets, which motivates our definition of effective model memorization (EMM), a metric measuring the maximum size of training data at which a learned diffusion model approximates its theoretical optimum. Then, we quantify the impact of the influential factors on these memorization behaviors in terms of EMM, focusing primarily on data distribution, model configuration, and training procedure. Besides comprehensive empirical results identifying the influential factors, we surprisingly find that conditioning training data on uninformative random labels can significantly trigger the memorization in diffusion models. Our study holds practical significance for diffusion model users and offers clues to theoretical research in deep generative models. Code is available at https://github.com/sail-sg/DiffMemorize.
Published: 2023

9. Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels

Author: You, Zebin, Zhong, Yong, Bao, Fan, Sun, Jiacheng, Li, Chongxuan, and Zhu, Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called dual pseudo training (DPT), built upon strong semi-supervised learners and diffusion models. DPT operates in three stages: training a classifier on partially labeled data to predict pseudo-labels; training a conditional generative model using these pseudo-labels to generate pseudo images; and retraining the classifier with a mix of real and pseudo images. Empirically, DPT consistently achieves SOTA performance of semi-supervised generation and classification across various settings. In particular, with one or two labels per class, DPT achieves a Fr\'echet Inception Distance (FID) score of 3.08 or 2.52 on ImageNet 256x256. Besides, DPT outperforms competitive semi-supervised baselines substantially on ImageNet classification tasks, achieving top-1 accuracies of 59.0 (+2.8), 69.5 (+3.0), and 74.4 (+2.0) with one, two, or five labels per class, respectively. Notably, our results demonstrate that diffusion can generate realistic images with only a few labels (e.g., <0.1%) and generative augmentation remains viable for semi-supervised classification. Our code is available at https://github.com/ML-GSAI/DPT., Comment: Accepted to NeurIPS 2023
Published: 2023

10. Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

Author: Zheng, Chenyu, Wu, Guoqiang, Bao, Fan, Cao, Yue, Li, Chongxuan, and Zhu, Jun
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the statistical efficiency of naive Bayes, the paper revisits the classical topic on discriminative vs. generative classifiers. Theoretically, the paper considers the surrogate loss instead of the zero-one loss in analyses and generalizes the classical results from binary cases to multiclass ones. We show that, under mild assumptions, multiclass naive Bayes requires $O(\log n)$ samples to approach its asymptotic error while the corresponding multiclass logistic regression requires $O(n)$ samples, where $n$ is the feature dimension. To establish it, we present a multiclass $\mathcal{H}$-consistency bound framework and an explicit bound for logistic loss, which are of independent interests. Simulation results on a mixture of Gaussian validate our theoretical findings. Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases. Besides, naive Bayes shows promise in few-shot cases and we observe the "two regimes" phenomenon in pre-trained supervised models. Our code is available at https://github.com/ML-GSAI/Revisiting-Dis-vs-Gen-Classifiers., Comment: Accepted by ICML 2023, 58 pages
Published: 2023

11. Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection

Author: Liu, Luping, Ren, Yi, Cheng, Xize, Huang, Rongjie, Li, Chongxuan, and Zhao, Zhou
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Out-of-distribution (OOD) detection is a crucial task for ensuring the reliability and safety of deep learning. Currently, discriminator models outperform other methods in this regard. However, the feature extraction process used by discriminator models suffers from the loss of critical information, leaving room for bad cases and malicious attacks. In this paper, we introduce a new perceptron bias assumption that suggests discriminator models are more sensitive to certain features of the input, leading to the overconfidence problem. To address this issue, we propose a novel framework that combines discriminator and generation models and integrates diffusion models (DMs) into OOD detection. We demonstrate that the diffusion denoising process (DDP) of DMs serves as a novel form of asymmetric interpolation, which is well-suited to enhance the input and mitigate the overconfidence problem. The discriminator model features of OOD data exhibit sharp changes under DDP, and we utilize the norm of this change as the indicator score. Our experiments on CIFAR10, CIFAR100, and ImageNet show that our method outperforms SOTA approaches. Notably, for the challenging InD ImageNet and OOD species datasets, our method achieves an AUROC of 85.7, surpassing the previous SOTA method's score of 77.4. Our implementation is available at \url{https://github.com/luping-liu/DiffOOD}.
Published: 2022

12. All are Worth Words: A ViT Backbone for Diffusion Models

Author: Bao, Fan, Nie, Shen, Xue, Kaiwen, Cao, Yue, Li, Chongxuan, Su, Hang, and Zhu, Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Vision transformers (ViT) have shown promise in various vision tasks while the U-Net based on a convolutional neural network (CNN) remains dominant in diffusion models. We design a simple and general ViT-based architecture (named U-ViT) for image generation with diffusion models. U-ViT is characterized by treating all inputs including the time, condition and noisy image patches as tokens and employing long skip connections between shallow and deep layers. We evaluate U-ViT in unconditional and class-conditional image generation, as well as text-to-image generation tasks, where U-ViT is comparable if not superior to a CNN-based U-Net of a similar size. In particular, latent diffusion models with U-ViT achieve record-breaking FID scores of 2.29 in class-conditional image generation on ImageNet 256x256, and 5.48 in text-to-image generation on MS-COCO, among methods without accessing large external datasets during the training of generative models. Our results suggest that, for diffusion-based image modeling, the long skip connection is crucial while the down-sampling and up-sampling operators in CNN-based U-Net are not always necessary. We believe that U-ViT can provide insights for future research on backbones in diffusion models and benefit generative modeling on large scale cross-modality datasets., Comment: Accepted to CVPR 2023
Published: 2022

13. ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

Author: Wang, Liyuan, Yang, Kuo, Li, Chongxuan, Hong, Lanqing, Li, Zhenguo, and Zhu, Jun
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Continual learning usually assumes the incoming data are fully labeled, which might not be applicable in real applications. In this work, we consider semi-supervised continual learning (SSCL) that incrementally learns from partially labeled data. Observing that existing continual learning methods lack the ability to continually exploit the unlabeled data, we propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN), which continually passes the learned data distribution to the classifier. In particular, ORDisCo replays data sampled from the conditional generator to the classifier in an online manner, exploiting unlabeled data in a time- and storage-efficient way. Further, to explicitly overcome the catastrophic forgetting of unlabeled data, we selectively stabilize parameters of the discriminator that are important for discriminating the pairs of old unlabeled data and their pseudo-labels predicted by the classifier. We extensively evaluate ORDisCo on various semi-supervised learning benchmark datasets for SSCL, and show that ORDisCo achieves significant performance improvement on SVHN, CIFAR10 and Tiny-ImageNet, compared to strong baselines.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Li, Chongxuan"'

1. Scaling up Masked Diffusion Models on Text

2. Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

3. Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

4. PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

5. Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

6. Graph Diffusion Policy Optimization

7. Gaussian Mixture Solvers for Diffusion Models

8. On Memorization in Diffusion Models

9. Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels

10. Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

11. Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection

12. All are Worth Words: A ViT Backbone for Diffusion Models

13. ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

13 results on '"Li, Chongxuan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources