Search

Your search keyword '"Zou, Difan"' showing total 172 results

Search Constraints

Start Over You searched for: Author "Zou, Difan" Remove constraint Author: "Zou, Difan"
172 results on '"Zou, Difan"'

Search Results

1. How Does Critical Batch Size Scale in Pre-training?

2. Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers

3. Towards a Theoretical Understanding of Memorization in Diffusion Models

4. How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

5. Extracting Training Data from Unconditional Diffusion Models

6. Explainable Bayesian Recurrent Neural Smoother to Capture Global State Evolutionary Correlations

7. The Implicit Bias of Adam on Separable Data

8. Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

9. Slight Corruption in Pre-training Data Makes Better Diffusion Models

10. A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models

11. Faster Sampling via Stochastic Gradient Proximal Sampler

12. Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

13. Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference

14. The Dog Walking Theory: Rethinking Convergence in Federated Learning

15. What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

16. On the Benefits of Over-parameterization for Out-of-Distribution Generalization

17. Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems

18. An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling

19. Towards Robust Graph Incremental Learning on Evolving Graphs

20. PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks

21. Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo

22. Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates

23. How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

24. Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks

25. Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

26. The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

27. Per-Example Gradient Regularization Improves Learning Signals from Noisy Data

28. The Benefits of Mixup for Feature Learning

29. Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

30. On the Limitation and Experience Replay for GNNs in Continual Learning

31. Multiple models for outbreak decision support in the face of uncertainty.

32. The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

33. Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

34. Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

35. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States

36. Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

37. The Benefits of Implicit Regularization from SGD in Least Squares Problems

38. Self-training Converts Weak Learners to Strong Learners in Mixture Models

39. Provable Robustness of Adversarial Training for Learning Halfspaces with Noise

40. Benign Overfitting of Constant-Stepsize SGD for Linear Regression

41. Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate

42. Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

43. On the Global Convergence of Training Deep Linear ResNets

44. How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

45. Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks

46. Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo

47. An Improved Analysis of Training Over-parameterized Deep Neural Networks

48. Two Dimension Intensity Distribution of Ultraviolet Scattering Communication

49. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

50. Stochastic Variance-Reduced Hamilton Monte Carlo Methods

Catalog

Books, media, physical & digital resources