Author: "Yu, Yaodong" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yu, Yaodong"' showing total 175 results

Start Over Author "Yu, Yaodong"

175 results on '"Yu, Yaodong"'

1. M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation

Author: Ren, Sucheng, Yu, Yaodong, Ruiz, Nataniel, Wang, Feng, Yuille, Alan, and Xie, Cihang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: There exists recent work in computer vision, named VAR, that proposes a new autoregressive paradigm for image generation. Diverging from the vanilla next-token prediction, VAR structurally reformulates the image generation into a coarse to fine next-scale prediction. In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into \textit{intra-scale modeling}, which captures local spatial dependencies within each scale, and \textit{inter-scale modeling}, which models cross-scale relationships progressively from coarse-to-fine scales. This decoupling structure allows to rebuild VAR in a more computationally efficient manner. Specifically, for intra-scale modeling -- crucial for generating high-fidelity images -- we retain the original bidirectional self-attention design to ensure comprehensive modeling; for inter-scale modeling, which semantically connects different scales but is computationally intensive, we apply linear-complexity mechanisms like Mamba to substantially reduce computational overhead. We term this new framework M-VAR. Extensive experiments demonstrate that our method outperforms existing models in both image quality and generation speed. For example, our 1.5B model, with fewer parameters and faster inference speed, outperforms the largest VAR-d30-2B. Moreover, our largest model M-VAR-d32 impressively registers 1.78 FID on ImageNet 256$\times$256 and outperforms the prior-art autoregressive models LlamaGen/VAR by 0.4/0.19 and popular diffusion models LDM/DiT by 1.82/0.49, respectively. Code is avaiable at \url{https://github.com/OliverRensu/MVAR}.
Published: 2024

2. Causal Image Modeling for Efficient Visual Understanding

Author: Wang, Feng, Yang, Timing, Yu, Yaodong, Ren, Sucheng, Wei, Guoyizhe, Wang, Angtian, Shao, Wei, Zhou, Yuyin, Yuille, Alan, and Xie, Cihang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we present a comprehensive analysis of causal image modeling and introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations. This modeling paradigm allows us to process images in a recurrent formulation with linear complexity relative to the sequence length, which can effectively address the memory and computation explosion issues posed by high-resolution and fine-grained images. In detail, we introduce two simple designs that seamlessly integrate image inputs into the causal inference framework: a global pooling token placed at the beginning of the sequence and a flipping operation between every two layers. Extensive empirical studies demonstrate the significant efficiency and effectiveness of this causal image modeling paradigm. For example, our base-sized Adventurer model attains a competitive test accuracy of 84.0% on the standard ImageNet-1k benchmark with 216 images/s training throughput, which is 5.3 times more efficient than vision transformers to achieve the same result.
Published: 2024

3. Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation

Author: Sanyal, Amartya, Hu, Yaxi, Yu, Yaodong, Ma, Yian, Wang, Yixin, and Schölkopf, Bernhard
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: "Accuracy-on-the-line" is a widely observed phenomenon in machine learning, where a model's accuracy on in-distribution (ID) and out-of-distribution (OOD) data is positively correlated across different hyperparameters and data configurations. But when does this useful relationship break down? In this work, we explore its robustness. The key observation is that noisy data and the presence of nuisance features can be sufficient to shatter the Accuracy-on-the-line phenomenon. In these cases, ID and OOD accuracy can become negatively correlated, leading to "Accuracy-on-the-wrong-line". This phenomenon can also occur in the presence of spurious (shortcut) features, which tend to overshadow the more complex signal (core, non-spurious) features, resulting in a large nuisance feature space. Moreover, scaling to larger datasets does not mitigate this undesirable behavior and may even exacerbate it. We formally prove a lower bound on Out-of-distribution (OOD) error in a linear classification model, characterizing the conditions on the noise and nuisance features for a large OOD error. We finally demonstrate this phenomenon across both synthetic and real datasets with noisy data and nuisance features.
Published: 2024

4. A Global Geometric Analysis of Maximal Coding Rate Reduction

Author: Wang, Peng, Liu, Huikang, Pai, Druv, Yu, Yaodong, Zhu, Zhihui, Qu, Qing, and Ma, Yi
Subjects: Computer Science - Machine Learning
Abstract: The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures. However, it lacks a complete theoretical justification: only the properties of its global optima are known, and its global landscape has not been studied. In this work, we give a complete characterization of the properties of all its local and global optima, as well as other types of critical points. Specifically, we show that each (local or global) maximizer of the MCR$^2$ problem corresponds to a low-dimensional, discriminative, and diverse representation, and furthermore, each critical point of the objective is either a local maximizer or a strict saddle point. Such a favorable landscape makes MCR$^2$ a natural choice of objective for learning diverse and discriminative representations via first-order optimization methods. To validate our theoretical findings, we conduct extensive experiments on both synthetic and real data sets., Comment: 43 pages, 9 figures. This work has been accepted for publication in the Proceedings of the 41st International Conference on Machine Learning (ICML 2024)
Published: 2024

5. Scaling White-Box Transformers for Vision

Author: Yang, Jinrui, Li, Xianhang, Pai, Druv, Zhou, Yuyin, Ma, Yi, Yu, Yaodong, and Xie, Cihang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: CRATE, a white-box transformer architecture designed to learn compressed and sparse representations, offers an intriguing alternative to standard vision transformers (ViTs) due to its inherent mathematical interpretability. Despite extensive investigations into the scaling behaviors of language and vision transformers, the scalability of CRATE remains an open question which this paper aims to address. Specifically, we propose CRATE-$\alpha$, featuring strategic yet minimal modifications to the sparse coding block in the CRATE architecture design, and a light training recipe designed to improve the scalability of CRATE. Through extensive experiments, we demonstrate that CRATE-$\alpha$ can effectively scale with larger model sizes and datasets. For example, our CRATE-$\alpha$-B substantially outperforms the prior best CRATE-B model accuracy on ImageNet classification by 3.7%, achieving an accuracy of 83.2%. Meanwhile, when scaling further, our CRATE-$\alpha$-L obtains an ImageNet classification accuracy of 85.1%. More notably, these model performance improvements are achieved while preserving, and potentially even enhancing the interpretability of learned CRATE models, as we demonstrate through showing that the learned token representations of increasingly larger trained CRATE-$\alpha$ models yield increasingly higher-quality unsupervised object segmentation of images. The project page is https://rayjryang.github.io/CRATE-alpha/., Comment: project page: https://rayjryang.github.io/CRATE-alpha/
Published: 2024

6. Masked Completion via Structured Diffusion with White-Box Transformers

Author: Pai, Druv, Wu, Ziyang, Buchanan, Sam, Yu, Yaodong, and Ma, Yi
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Modern learning frameworks often train deep neural networks with massive amounts of unlabeled data to learn representations by solving simple pretext tasks, then use the representations as foundations for downstream tasks. These networks are empirically designed; as such, they are usually not interpretable, their representations are not structured, and their designs are potentially redundant. White-box deep networks, in which each layer explicitly identifies and transforms structures in the data, present a promising alternative. However, existing white-box architectures have only been shown to work at scale in supervised settings with labeled data, such as classification. In this work, we provide the first instantiation of the white-box design paradigm that can be applied to large-scale unsupervised representation learning. We do this by exploiting a fundamental connection between diffusion, compression, and (masked) completion, deriving a deep transformer-like masked autoencoder architecture, called CRATE-MAE, in which the role of each layer is mathematically fully interpretable: they transform the data distribution to and from a structured representation. Extensive empirical evaluations confirm our analytical insights. CRATE-MAE demonstrates highly promising performance on large-scale imagery datasets while using only ~30% of the parameters compared to the standard masked autoencoder with the same model configuration. The representations learned by CRATE-MAE have explicit structure and also contain semantic meaning. Code is available at https://github.com/Ma-Lab-Berkeley/CRATE ., Comment: To be published at ICLR 2024; 44 pages. arXiv admin note: substantial text overlap with arXiv:2311.13110
Published: 2024

7. Differentially Private Representation Learning via Image Captioning

Author: Sander, Tom, Yu, Yaodong, Sanjabi, Maziar, Durmus, Alain, Ma, Yi, Chaudhuri, Kamalika, and Guo, Chuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn representations that are not significantly better than hand-crafted features. In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets. Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of $\varepsilon=8$ for the LAION dataset, a linear classifier trained on top of learned DP-Cap features attains $65.8\%$ accuracy on ImageNet-1K, considerably improving the previous SOTA of $56.5\%$., Comment: Accepted and presented at ICML 2024
Published: 2024

8. A Study on the Calibration of In-context Learning

Author: Zhang, Hanlin, Zhang, Yi-Fan, Yu, Yaodong, Madeka, Dhruv, Foster, Dean, Xing, Eric, Lakkaraju, Himabindu, and Kakade, Sham
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs). We study in-context learning (ICL), a prevalent method for adapting static LMs through tailored prompts, and examine the balance between performance and calibration across a broad spectrum of natural language understanding and reasoning tasks. Through comprehensive experiments, we observe that, with an increasing number of ICL examples, models initially exhibit increased miscalibration before achieving better calibration and miscalibration tends to arise in low-shot settings. Moreover, we find that methods aimed at improving usability, such as fine-tuning and chain-of-thought (CoT) prompting, can lead to miscalibration and unreliable natural language explanations. Furthermore, we explore recalibration techniques and find that a scaling-binning calibrator can reduce calibration errors consistently., Comment: NAACL 2024
Published: 2023

9. White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?

Author: Yu, Yaodong, Buchanan, Sam, Pai, Druv, Chu, Tianzhe, Wu, Ziyang, Tong, Shengbang, Bai, Hao, Zhai, Yuexiang, Haeffele, Benjamin D., and Ma, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we contend that a natural objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a low-dimensional Gaussian mixture supported on incoherent subspaces. The goodness of such a representation can be evaluated by a principled measure, called sparse rate reduction, that simultaneously maximizes the intrinsic information gain and extrinsic sparsity of the learned representation. From this perspective, popular deep network architectures, including transformers, can be viewed as realizing iterative schemes to optimize this measure. Particularly, we derive a transformer block from alternating optimization on parts of this objective: the multi-head self-attention operator compresses the representation by implementing an approximate gradient descent step on the coding rate of the features, and the subsequent multi-layer perceptron sparsifies the features. This leads to a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable. We show, by way of a novel connection between denoising and compression, that the inverse to the aforementioned compressive encoding can be realized by the same class of CRATE architectures. Thus, the so-derived white-box architectures are universal to both encoders and decoders. Experiments show that these networks, despite their simplicity, indeed learn to compress and sparsify representations of large-scale real-world image and text datasets, and achieve performance very close to highly engineered transformer-based models: ViT, MAE, DINO, BERT, and GPT2. We believe the proposed computational framework demonstrates great potential in bridging the gap between theory and practice of deep learning, from a unified perspective of data compression. Code is available at: https://ma-lab-berkeley.github.io/CRATE ., Comment: Accepted at Journal of Machine Learning Research. This paper integrates the works arXiv:2306.01129 and arXiv:2308.16271 into a complete story. In this paper, we improve the writing and organization, and also add conceptual, empirical, and theoretical improvements over the previous work. V2: small typo fixes/formatting improvements. V3: improvements from journal revisions. V4: fix figures
Published: 2023

10. Emergence of Segmentation with Minimalistic White-Box Transformers

Author: Yu, Yaodong, Chu, Tianzhe, Tong, Shengbang, Wu, Ziyang, Pai, Druv, Buchanan, Sam, and Ma, Yi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection. Previous works have shown that segmentation properties emerge in vision transformers (ViTs) trained using self-supervised methods such as DINO, but not in those trained on supervised classification tasks. In this study, we probe whether segmentation emerges in transformer-based models solely as a result of intricate self-supervised learning mechanisms, or if the same emergence can be achieved under much broader conditions through proper design of the model architecture. Through extensive experimental results, we demonstrate that when employing a white-box transformer-like architecture known as CRATE, whose design explicitly models and pursues low-dimensional structures in the data distribution, segmentation properties, at both the whole and parts levels, already emerge with a minimalistic supervised training recipe. Layer-wise finer-grained analysis reveals that the emergent properties strongly corroborate the designed mathematical functions of the white-box network. Our results suggest a path to design white-box foundation models that are simultaneously highly performant and mathematically fully interpretable. Code is at \url{https://github.com/Ma-Lab-Berkeley/CRATE}., Comment: Code: https://github.com/Ma-Lab-Berkeley/CRATE
Published: 2023

11. Scaff-PD: Communication Efficient Fair and Robust Federated Learning

Author: Yu, Yaodong, Karimireddy, Sai Praneeth, Ma, Yi, and Jordan, Michael I.
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Mathematics - Optimization and Control, Statistics - Machine Learning, 68W40, 68W15, 90C25, 90C06, G.1.6, F.2.1, E.4
Abstract: We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning. Our approach improves fairness by optimizing a family of distributionally robust objectives tailored to heterogeneous clients. We leverage the special structure of these objectives, and design an accelerated primal dual (APD) algorithm which uses bias corrected local steps (as in Scaffold) to achieve significant gains in communication efficiency and convergence speed. We evaluate Scaff-PD on several benchmark datasets and demonstrate its effectiveness in improving fairness and robustness while maintaining competitive accuracy. Our results suggest that Scaff-PD is a promising approach for federated learning in resource-constrained and heterogeneous settings.
Published: 2023

12. ViP: A Differentially Private Foundation Model for Computer Vision

Author: Yu, Yaodong, Sanjabi, Maziar, Ma, Yi, Chaudhuri, Kamalika, and Guo, Chuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of foundation models trained on internet-scale data. On the flip side, the uncurated nature of internet-scale data also poses significant privacy and legal risks, as they often contain personal information or copyrighted material that should not be trained on without permission. In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee. We identify masked autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and train ViP -- a Vision transformer with differential Privacy -- under a strict privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the quality of representation learned by ViP using standard downstream vision tasks; in particular, ViP achieves a (non-private) linear probing accuracy of $55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained and evaluated on ImageNet). Our result suggests that scaling to internet-scale data can be practical for private learning. Code is available at \url{https://github.com/facebookresearch/ViP-MAE}., Comment: Code: https://github.com/facebookresearch/ViP-MAE. V2 adds a GitHub link to the code
Published: 2023

13. White-Box Transformers via Sparse Rate Reduction

Author: Yu, Yaodong, Buchanan, Sam, Pai, Druv, Chu, Tianzhe, Wu, Ziyang, Tong, Shengbang, Haeffele, Benjamin D., and Ma, Yi
Subjects: Computer Science - Machine Learning
Abstract: In this paper, we contend that the objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a mixture of low-dimensional Gaussian distributions supported on incoherent subspaces. The quality of the final representation can be measured by a unified objective function called sparse rate reduction. From this perspective, popular deep networks such as transformers can be naturally viewed as realizing iterative schemes to optimize this objective incrementally. Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens. This leads to a family of white-box transformer-like deep network architectures which are mathematically fully interpretable. Despite their simplicity, experiments show that these networks indeed learn to optimize the designed objective: they compress and sparsify representations of large-scale real-world vision datasets such as ImageNet, and achieve performance very close to thoroughly engineered transformers such as ViT. Code is at \url{https://github.com/Ma-Lab-Berkeley/CRATE}., Comment: 33 pages, 11 figures
Published: 2023

14. Federated Conformal Predictors for Distributed Uncertainty Quantification

Author: Lu, Charles, Yu, Yaodong, Karimireddy, Sai Praneeth, Jordan, Michael I., and Raskar, Ramesh
Subjects: Computer Science - Machine Learning
Abstract: Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models. In this paper, we extend conformal prediction to the federated learning setting. The main challenge we face is data heterogeneity across the clients - this violates the fundamental tenet of exchangeability required for conformal prediction. We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction (FCP) framework. We show FCP enjoys rigorous theoretical guarantees and excellent empirical performance on several computer vision and medical imaging datasets. Our results demonstrate a practical approach to incorporating meaningful uncertainty quantification in distributed and heterogeneous environments. We provide code used in our experiments https://github.com/clu5/federated-conformal., Comment: 23 pages, 18 figures, accepted to International Conference on Machine Learning (ICML 2023)
Published: 2023

15. TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels

Author: Yu, Yaodong, Wei, Alexander, Karimireddy, Sai Praneeth, Ma, Yi, and Jordan, Michael I.
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Mathematics - Optimization and Control, Statistics - Machine Learning, 68W40, 68W15, 90C25, 90C06, G.1.6, F.2.1, E.4
Abstract: State-of-the-art federated learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. For neural networks, even when centralized SGD easily finds a solution that is simultaneously performant for all clients, current federated optimization methods fail to converge to a comparable solution. We show that this performance disparity can largely be attributed to optimization challenges presented by nonconvexity. Specifically, we find that the early layers of the network do learn useful features, but the final layers fail to make use of them. That is, federated optimization applied to this non-convex problem distorts the learning of the final layers. Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e.g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation. Our technique yields accuracy improvements of up to +36% on FMNIST and +37% on CIFAR10 when clients have dissimilar data., Comment: Accepted at Neural Information Processing Systems (NeurIPS) 2022. V2 releases code
Published: 2022

16. Robust Calibration with Multi-domain Temperature Scaling

Author: Yu, Yaodong, Bates, Stephen, Ma, Yi, and Jordan, Michael I.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains. Uncertainty quantification is all the more challenging when training distribution and test distribution are different, even the distribution shifts are mild. Despite the ubiquity of distribution shifts in real-world applications, existing uncertainty quantification approaches mainly study the in-distribution setting where the train and test distributions are the same. In this paper, we develop a systematic calibration model to handle distribution shifts by leveraging data from multiple domains. Our proposed method -- multi-domain temperature scaling -- uses the heterogeneity in the domains to improve calibration robustness under distribution shift. Through experiments on three benchmark data sets, we find our proposed method outperforms existing methods as measured on both in-distribution and out-of-distribution test sets.
Published: 2022

17. Conditional Supervised Contrastive Learning for Fair Text Classification

Author: Chi, Jianfeng, Shand, William, Yu, Yaodong, Chang, Kai-Wei, Zhao, Han, and Tian, Yuan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Contrastive representation learning has gained much attention due to its superior performance in learning representations from both image and sequential data. However, the learned representations could potentially lead to performance disparities in downstream tasks, such as increased silencing of underrepresented groups in toxicity comment classification. In light of this challenge, in this work, we study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning. Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives, and then propose to use conditional supervised contrastive objectives to learn fair representations for text classification. We conduct experiments on two text datasets to demonstrate the effectiveness of our approaches in balancing the trade-offs between task performance and bias mitigation among existing baselines for text classification. Furthermore, we also show that the proposed methods are stable in different hyperparameter settings., Comment: Findings of EMNLP 2022
Published: 2022

18. Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

Author: Lin, Tianyi, Pacchiano, Aldo, Yu, Yaodong, and Jordan, Michael I.
Subjects: Computer Science - Machine Learning
Abstract: Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings. In contrast to previous works on online unconstrained submodular minimization, we focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms in static and delayed scenarios. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded. Key to our approach is the notion of $(\alpha, \beta)$-regret and the extension of the generic convex relaxation model from~\citet{El-2020-Optimal}, the analysis of which is of independent interest. We conduct and showcase several simulation studies to demonstrate the efficacy of our algorithms., Comment: Accepted by ICML 2022; The first three authors contributed equally to this work; 36 pages, 9 figures
Published: 2022

19. What You See is What You Get: Principled Deep Learning via Distributional Generalization

Author: Kulynych, Bogdan, Yang, Yao-Yuan, Yu, Yaodong, Błasiok, Jarosław, and Nakkiran, Preetum
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Having similar behavior at training time and test time $-$ what we call a "What You See Is What You Get" (WYSIWYG) property $-$ is desirable in machine learning. Models trained with standard stochastic gradient descent (SGD), however, do not necessarily have this property, as their complex behaviors such as robustness or subgroup performance can differ drastically between training and test time. In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization. Applying this connection, we introduce new conceptual tools for designing deep-learning methods by reducing generalization concerns to optimization ones: to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the training data. By applying this novel design principle, which bypasses "pathologies" of SGD, we construct simple algorithms that are competitive with SOTA in several distributional-robustness applications, significantly improve the privacy vs. disparate impact trade-off of DP-SGD, and mitigate robust overfitting in adversarial training. Finally, we also improve on theoretical bounds relating DP, stability, and distributional generalization., Comment: First two authors contributed equally. To appear in NeurIPS 2022
Published: 2022

20. Oxidative reconstructed Ru-based nanoclusters forming heterostructures with lanthanide oxides for acidic water oxidation

Author: Xu, Wenxia, Liu, Ziyi, Yu, Yaodong, Shi, Yue, Li, Hongdong, Chi, Jingqi, Bagliuk, G.A., Lai, Jianping, and Wang, Lei
Published: 2025
Full Text: View/download PDF

21. Hydrated molten salt electrolytes effectively promote electrocatalytic NH3 synthesis at low temperatures

Author: Cai, Wenwen, Sun, Yuyao, Zhang, Yanyun, Li, Hongdong, Yu, Yaodong, Lai, Jianping, and Wang, Lei
Published: 2025
Full Text: View/download PDF

22. Predicting Out-of-Distribution Error with the Projection Norm

Author: Yu, Yaodong, Yang, Zitong, Wei, Alexander, Ma, Yi, and Steinhardt, Jacob
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a metric -- Projection Norm -- to predict a model's performance on out-of-distribution (OOD) data without access to ground truth labels. Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels. The more the new model's parameters differ from an in-distribution model, the greater the predicted OOD error. Empirically, our approach outperforms existing methods on both image and text classification tasks and across different network architectures. Theoretically, we connect our approach to a bound on the test error for overparameterized linear models. Furthermore, we find that Projection Norm is the only approach that achieves non-trivial detection performance on adversarial examples. Our code is available at https://github.com/yaodongyu/ProjNorm.
Published: 2022

23. The Effect of Model Size on Worst-Group Generalization

Author: Pham, Alan, Chan, Eunice, Srivatsa, Vikranth, Ghosh, Dhruba, Yang, Yaoqing, Yu, Yaodong, Zhong, Ruiqi, Gonzalez, Joseph E., and Steinhardt, Jacob
Subjects: Computer Science - Machine Learning
Abstract: Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known. To gain a more complete picture, we consider the case where subgroup information is unknown. We investigate the effect of model size on worst-group generalization under empirical risk minimization (ERM) across a wide range of settings, varying: 1) architectures (ResNet, VGG, or BERT), 2) domains (vision or natural language processing), 3) model size (width or depth), and 4) initialization (with pre-trained or random weights). Our systematic evaluation reveals that increasing model size does not hurt, and may help, worst-group test performance under ERM across all setups. In particular, increasing pre-trained model size consistently improves performance on Waterbirds and MultiNLI. We advise practitioners to use larger pre-trained models when subgroup labels are unknown., Comment: The first four authors contributed equally to the work
Published: 2021

24. Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction

Author: Dai, Xili, Tong, Shengbang, Li, Mingyang, Wu, Ziyang, Psenka, Michael, Chan, Kwan Ho Ryan, Zhai, Pengyuan, Yu, Yaodong, Yuan, Xiaojun, Shum, Heung Yeung, and Ma, Yi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a closed-loop transcription between a multi-class multi-dimensional data distribution and a linear discriminative representation (LDR) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as the equilibrium point of a two-player minimax game between the encoder and decoder. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and often better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so learned features of the multiple classes are structured: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space. Source code can be found at https://github.com/Delay-Xili/LDR., Comment: 41 pages
Published: 2021
Full Text: View/download PDF

25. Mechanism conversion and accelerated deprotonation of RuO2-GO catalyst to promote acidic oxygen evolution reaction

Author: Zhang, Xin, Shi, Yue, Liu, Pengfei, Li, Hongdong, Yu, Yaodong, Liu, Yan, Xu, Wenxia, Zhan, Tianrong, Lai, Jianping, and Wang, Lei
Published: 2024
Full Text: View/download PDF

26. CTRL: Closed-Loop Transcription to an LDR via Minimaxing Rate Reduction

Author: Dai, Xili, Tong, Shengbang, Li, Mingyang, Wu, Ziyang, Psenka, Michael, Chan, Kwan Ho Ryan, Zhai, Pengyuan, Yu, Yaodong, Yuan, Xiaojun, Shum, Heung-Yeung, and Ma, Yi
Subjects: closed-loop transcription, linear discriminative representation, rate reduction, minimax game, Mathematical Sciences, Physical Sciences, Fluids & Plasmas
Abstract: This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a Closed-loop Transcriptionbetween a multi-class, multi-dimensional data distribution and a Linear discriminative representation (CTRL) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as a two-player minimax game between the encoder and decoderfor the learned representation. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing of approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and arguably better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so-learned features of the multiple classes are structured instead of hidden: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space, and diverse visual attributes within each class are modeled by the independent principal components within each subspace.
Published: 2022

27. On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging

Author: Li, Chris Junchi, Yu, Yaodong, Loizou, Nicolas, Gidel, Gauthier, Ma, Yi, Roux, Nicolas Le, and Jordan, Michael I.
Subjects: Mathematics - Optimization and Control, Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting., Comment: Camera-ready version appeared at AISTATS 2022; short version appeared at NeurIPS OPT 2021 Workshop
Published: 2021

28. ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction

Author: Chan, Kwan Ho Ryan, Yu, Yaodong, You, Chong, Qi, Haozhi, Wright, John, and Ma, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Theory, Statistics - Machine Learning
Abstract: This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We argue that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, which shares common characteristics of modern deep networks. The deep layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer via forward propagation, although they are amenable to fine-tuning via back propagation. All components of so-obtained "white-box" network have precise optimization, statistical, and geometric interpretation. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation in the invariant setting suggests a trade-off between sparsity and invariance, and also indicates that such a deep convolution network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments clearly verify the effectiveness of both the rate reduction objective and the associated ReduNet. All code and data are available at \url{https://github.com/Ma-Lab-Berkeley}., Comment: This paper integrates previous two manuscripts: arXiv:2006.08558 and arXiv:2010.14765, with significantly improved organization, presentation, and new results; V2 polishes writing and adds citation; V3 polishes writing, adds citation and experiments
Published: 2021

29. Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization

Author: Yu, Yaodong, Lin, Tianyi, Mazumdar, Eric, and Jordan, Michael I.
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL -- one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance -- have serious limitations that limit their use in large-scale problems -- in particular they involve solving complex subproblems and they fail to make use of stochastic gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest., Comment: Accepted by AISTATS 2022; The first three authors contributed equally to this work; 43 pages, 28 figures
Published: 2021

30. Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition

Author: Yu, Yaodong, Yang, Zitong, Dobriban, Edgar, Steinhardt, Jacob, and Ma, Yi
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Adversarially trained models exhibit a large generalization gap: they can interpolate the training set even for large perturbation radii, but at the cost of large test error on clean samples. To investigate this gap, we decompose the test risk into its bias and variance components and study their behavior as a function of adversarial training perturbation radii ($\varepsilon$). We find that the bias increases monotonically with $\varepsilon$ and is the dominant term in the risk. Meanwhile, the variance is unimodal as a function of $\varepsilon$, peaking near the interpolation threshold for the training set. This characteristic behavior occurs robustly across different datasets and also for other robust training procedures such as randomized smoothing. It thus provides a test for proposed explanations of the generalization gap. We find that some existing explanations fail this test--for instance, by predicting a monotonically increasing variance curve. This underscores the power of bias-variance decompositions in modern settings-by providing two measurements instead of one, they can rule out more explanations than test accuracy alone. We also show that bias and variance can provide useful guidance for scalably reducing the generalization gap, highlighting pre-training and unlabeled data as promising routes., Comment: V2 adds new results and improves organization and presentation
Published: 2021

31. Deep Networks from the Principle of Rate Reduction

Author: Chan, Kwan Ho Ryan, Yu, Yaodong, You, Chong, Qi, Haozhi, Wright, John, and Ma, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Information Theory, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer. The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme. All components of this "white box" network have precise optimization, statistical, and geometric interpretation. This principled framework also reveals and justifies the role of multi-channel lifting and sparse coding in early stage of deep networks. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation also indicates that such a convolutional network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments indicate that so constructed deep network can already learn a good discriminative representation even without any back propagation training.
Published: 2020

32. Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients

Author: Huang, Yifei, Yu, Yaodong, Zhang, Hongyang, Ma, Yi, and Yao, Yuan
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper we introduce a provably stable architecture for Neural Ordinary Differential Equations (ODEs) which achieves non-trivial adversarial robustness under white-box adversarial attacks even when the network is trained naturally. For most existing defense methods withstanding strong white-box attacks, to improve robustness of neural networks, they need to be trained adversarially, hence have to strike a trade-off between natural accuracy and adversarial robustness. Inspired by dynamical system theory, we design a stabilized neural ODE network named SONet whose ODE blocks are skew-symmetric and proved to be input-output stable. With natural training, SONet can achieve comparable robustness with the state-of-the-art adversarial defense methods, without sacrificing natural accuracy. Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e.g., under PGD-20 ($\ell_\infty=0.031$) attack on CIFAR-10 dataset, it achieves 91.57\% and natural accuracy and 62.35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76.29\% and 45.24\%, respectively. To understand possible reasons behind this surprisingly good result, we further explore the possible mechanism underlying such an adversarial robustness. We show that the adaptive stepsize numerical ODE solver, DOPRI5, has a gradient masking effect that fails the PGD attacks which are sensitive to gradient information of training loss; on the other hand, it cannot fool the CW attack of robust gradients and the SPSA attack that is gradient-free. This provides a new explanation that the adversarial robustness of ODE-based networks mainly comes from the obfuscated gradients in numerical ODE solvers., Comment: 16 pages
Published: 2020

33. Boundary thickness and robustness in learning models

Author: Yang, Yaoqing, Khanna, Rajiv, Yu, Yaodong, Gholami, Amir, Keutzer, Kurt, Gonzalez, Joseph E., Ramchandran, Kannan, and Mahoney, Michael W.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Robustness of machine learning models to various adversarial and non-adversarial corruptions continues to be of interest. In this paper, we introduce the notion of the boundary thickness of a classifier, and we describe its connection with and usefulness for model robustness. Thick decision boundaries lead to improved performance, while thin decision boundaries lead to overfitting (e.g., measured by the robust generalization gap between training and testing) and lower robustness. We show that a thicker boundary helps improve robustness against adversarial examples (e.g., improving the robust test accuracy of adversarial training) as well as so-called out-of-distribution (OOD) transforms, and we show that many commonly-used regularization and data augmentation procedures can increase boundary thickness. On the theoretical side, we establish that maximizing boundary thickness during training is akin to the so-called mixup training. Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms. We can also show that the performance improvement in several lines of recent work happens in conjunction with a thicker boundary.
Published: 2020

34. Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Author: Yu, Yaodong, Chan, Kwan Ho Ryan, You, Chong, Song, Chaobing, and Ma, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Theory, Statistics - Machine Learning
Abstract: To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features. The coding rate can be accurately computed from finite samples of degenerate subspace-like distributions and can learn intrinsic representations in supervised, self-supervised, and unsupervised settings in a unified manner. Empirically, the representations learned using this principle alone are significantly more robust to label corruptions in classification than those using cross-entropy, and can lead to state-of-the-art results in clustering mixed data from self-learned invariant features.
Published: 2020

35. Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Author: Yang, Zitong, Yu, Yaodong, You, Chong, Steinhardt, Jacob, and Ma, Yi
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The classical bias-variance trade-off predicts that bias decreases and variance increase with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent curve observed in recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.
Published: 2020

36. Reliable Representation Learning: Theory and Practice

Author: Yu, Yaodong
Subjects: Computer science, Machine Learning, Representation Learning
Abstract: Machine learning models trained on vast amounts of data have achieved remarkable success across various applications. However, they also pose new challenges and risks for deployment in real-world high-stakes domains. Decisions made by deep learning models are often difficult to interpret, and the underlying mechanisms remain poorly understood, and large-scale foundational models can memorize and leak private personal information. Given that deep learning models operate as black-boxes, it is challenging to understand, let alone resolve, various types of failures in current machine learning systems.In this dissertation, we present research towards building reliable machine learning systems through the lens of representation learning. The first part focuses on transparent representation learning. We first propose a principled and effective objective function, called coding rate reduction, for measuring the goodness of representations, and present a white-box approach to understanding transformer models. We then show how to derive a family of mathematically interpretable transformer-like deep network architectures by maximizing the information gain of the learned representations. The second part focuses on privacy-preserving representation learning. We first present our investigation on understanding the effectiveness of learned representations using federated optimization methods, and present our approach for overcoming data heterogeneity when training deep, non-convex models in the federated setting. Next, we describe our work on training the first set of vision foundation models with rigorous differential privacy guarantees, and demonstrate the promise of high-utility differentially private representation learning.
Published: 2024

37. Theoretically Principled Trade-off between Robustness and Accuracy

Author: Zhang, Hongyang, Yu, Yaodong, Jiao, Jiantao, Xing, Eric P., Ghaoui, Laurent El, and Jordan, Michael I.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance., Comment: Appeared in ICML 2019; the winning methodology of the NeurIPS 2018 Adversarial Vision Challenge
Published: 2019

38. Scheduling a multi-agent flow shop with two scenarios and release dates.

Author: Wang, Xinyue, Ren, Tao, Bai, Danyu, Chu, Feng, Yu, Yaodong, Meng, Fanchun, and Wu, Chin-Chia
Subjects: FLOW shop scheduling, BRANCH & bound algorithms, TECHNOLOGICAL innovations, CLOUD computing, COMPUTER systems
Abstract: Cloud computing is widely applied in modern industrial areas due to its technological advancement, cost reduction, and applicability. Packets (tasks) belonging to different applications (agents) compete to share the common cloud resource through a series of edge nodes (processors) in pursuit of fast transmission. This paper abstracts the cloud computing system as a multi-agent flow-shop scheduling (MAFS) problem. The objective is to minimise the total completion time of several agents with the restriction that the maximum lateness cannot exceed a given bound. Given the complexity of the considered problem, a branch and bound algorithm combined with several pruning rules and lower bounds is proposed to obtain optimal solutions. Furthermore, the considered problem is generalised to a bi-scenario version, and a bi-population cooperative co-evolutionary (BCCE) algorithm is proposed to solve it. A reinforcement learning-based method is presented to generate the initial population. Several problem-specific intensification strategies are constructed to explore promising solutions. Comprehensive experiments verified the effectiveness of the proposed algorithms. The industrial data from the China Earthquake Network Centre further confirmed the superiority of the BCCE algorithm. Overall, the MAFS model and the proposed algorithms effectively enhance the user experience and reasonably guarantee revenue. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. ·H effectively enhance electrocatalytic nitrogen fixation

Author: Sun, Yuyao, Yu, Yaodong, Xu, Wenxia, Wu, Di, Wei, Yingying, Lai, Jianping, and Wang, Lei
Published: 2023
Full Text: View/download PDF

40. Learning One-hidden-layer ReLU Networks via Gradient Descent

Author: Zhang, Xiao, Yu, Yaodong, Wang, Lingxiao, and Gu, Quanquan
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings., Comment: 26 pages, 2 figures
Published: 2018

41. PtIrM (M = Ni, Co) jagged nanowires for efficient methanol oxidation electrocatalysis

Author: Zhu, Rongying, Yu, Yaodong, Yu, Renqin, Lai, Jianping, Chung-Yen Jung, Joey, Zhang, Shiming, Zhao, Yufeng, Zhang, Jiujun, and Xia, Zhonghong
Published: 2022
Full Text: View/download PDF

42. The self-complementary effect through strong orbital coupling in ultrathin high-entropy alloy nanowires boosting pH-universal multifunctional electrocatalysis

Author: Li, Hongdong, Sun, Mingzi, Pan, Yue, Xiong, Juan, Du, Haoyang, Yu, Yaodong, Feng, Shouhua, Li, Zhenjiang, Lai, Jianping, Huang, Bolong, and Wang, Lei
Published: 2022
Full Text: View/download PDF

43. Coordination engineering of cobalt phthalocyanine by functionalized carbon nanotube for efficient and highly stable carbon dioxide reduction at high current density

Author: Li, Hongdong, Pan, Yue, Wang, Zuochao, Yu, Yaodong, Xiong, Juan, Du, Haoyang, Lai, Jianping, Wang, Lei, and Feng, Shouhua
Published: 2022
Full Text: View/download PDF

44. Protecting the state of Cu clusters and nanoconfinement engineering over hollow mesoporous carbon spheres for electrocatalytical C-C coupling

Author: Pan, Yue, Li, Hongdong, Xiong, Juan, Yu, Yaodong, Du, Haoyang, Li, Shaoxiang, Wu, Zhanchao, Li, Suping, Lai, Jianping, and Wang, Lei
Published: 2022
Full Text: View/download PDF

45. Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

Author: Yu, Yaodong, Xu, Pan, and Gu, Quanquan
Subjects: Mathematics - Optimization and Control, Computer Science - Learning
Abstract: We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. More specifically, the proposed algorithm only needs $\tilde{O}(\epsilon^{-10/3})$ stochastic gradient evaluations to converge to an approximate local minimum $\mathbf{x}$, which satisfies $\|\nabla f(\mathbf{x})\|_2\leq\epsilon$ and $\lambda_{\min}(\nabla^2 f(\mathbf{x}))\geq -\sqrt{\epsilon}$ in the general stochastic optimization setting, where $\tilde{O}(\cdot)$ hides logarithm polynomial terms and constants. This improves upon the $\tilde{O}(\epsilon^{-7/2})$ gradient complexity achieved by the state-of-the-art stochastic local minima finding algorithms by a factor of $\tilde{O}(\epsilon^{-1/6})$. For nonconvex finite-sum optimization, our algorithm also outperforms the best known algorithms in a certain regime., Comment: 25 pages
Published: 2017

46. Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently

Author: Yu, Yaodong, Zou, Difan, and Gu, Quanquan
Subjects: Computer Science - Learning, Mathematics - Optimization and Control
Abstract: We propose a family of nonconvex optimization algorithms that are able to save gradient and negative curvature computations to a large extent, and are guaranteed to find an approximate local minimum with improved runtime complexity. At the core of our algorithms is the division of the entire domain of the objective function into small and large gradient regions: our algorithms only perform gradient descent based procedure in the large gradient region, and only perform negative curvature descent in the small gradient region. Our novel analysis shows that the proposed algorithms can escape the small gradient region in only one negative curvature descent step whenever they enter it, and thus they only need to perform at most $N_{\epsilon}$ negative curvature direction computations, where $N_{\epsilon}$ is the number of times the algorithms enter small gradient regions. For both deterministic and stochastic settings, we show that the proposed algorithms can potentially beat the state-of-the-art local minima finding algorithms. For the finite-sum setting, our algorithm can also outperform the best algorithm in a certain regime., Comment: 31 pages, 1 table
Published: 2017

47. Integrating Ozone Pollutant Elimination in N2 Electrolysis to Produce Nitrate with Reduced Reaction Steps.

Author: Yu, Yaodong, Liu, Jiao, Sun, Mingzi, Han, Jiani, Chi, Jingqi, Huang, Bolong, Lai, Jianping, and Wang, Lei
Published: 2024
Full Text: View/download PDF

48. Adversarial Vision Challenge

Author: Brendel, Wieland, Rauber, Jonas, Kurakin, Alexey, Papernot, Nicolas, Veliqi, Behar, Mohanty, Sharada P., Laurent, Florian, Salathé, Marcel, Bethge, Matthias, Yu, Yaodong, Zhang, Hongyang, Xu, Susu, Zhang, Hongbao, Xie, Pengtao, Xing, Eric P., Brunner, Thomas, Diehl, Frederik, Rony, Jérôme, Hafemann, Luiz Gustavo, Cheng, Shuyu, Dong, Yinpeng, Ning, Xuefei, Li, Wenshuo, Wang, Yu, Escalante, Hugo Jair, Series Editor, Guyon, Isabelle, Series Editor, Escalera, Sergio, Series Editor, and Herbrich, Ralf, editor
Published: 2020
Full Text: View/download PDF

49. Achieving efficient urea electrosynthesis through improving the coverage of a crucial intermediate across a broad range of nitrate concentrations.

Author: Yu, Yaodong, Sun, Yuyao, Han, Jiani, Guan, Yujia, Li, Hongdong, Wang, Lei, and Lai, Jianping
Published: 2024
Full Text: View/download PDF

50. Activation of Ga liquid catalyst with continuously exposed active sites for electrocatalytic C‐N coupling

Author: Yu, Yaodong, primary, Lv, Zheng, additional, Liu, Ziyi, additional, Sun, Yuyao, additional, Wei, Yingying, additional, Ji, Xiang, additional, Li, Yanyan, additional, Li, Hongdong, additional, Wang, Lei, additional, and Lai, Jianping, additional
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

175 results on '"Yu, Yaodong"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources