Search

Your search keyword '"Song, Zhao"' showing total 3,661 results

Search Constraints

Start Over You searched for: Author "Song, Zhao" Remove constraint Author: "Song, Zhao"
3,661 results on '"Song, Zhao"'

Search Results

1. On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

2. Toward Infinite-Long Prefix in Transformer

3. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

4. Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

5. Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

6. Binary Hypothesis Testing for Softmax Models and Leverage Score Models

7. Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

8. Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

9. How to Inverting the Leverage Score Distribution?

10. Attention is Naturally Sparse with Gaussian Distributed Input

11. Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

12. Quantum Speedup for Spectral Approximation of Kronecker Products

13. On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

14. The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

15. Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

16. One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

17. A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

18. Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training

21. The Expressibility of Polynomial based Attention Scheme

22. Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

23. Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

24. Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention

25. An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

26. How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

27. Fine-tune Language Models to Approximate Unbiased In-context Learning

28. A Unified Scheme of ResNet and Softmax

29. Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

30. A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

31. Streaming Semidefinite Programs: $O(\sqrt{n})$ Passes, Small Space and Fast Runtime

32. Online Adaptive Mahalanobis Distance Estimation

33. Solving Attention Kernel Regression Problem via Pre-conditioner

34. How to Protect Copyright Data in Optimization of Large Language Models?

35. GradientCoin: A Peer-to-Peer Decentralized Large Language Models

36. Clustered Linear Contextual Bandits with Knapsacks

37. Convergence of Two-Layer Regression with Nonlinear Units

39. Zero-th Order Algorithm for Softmax Attention Optimization

40. Fast Quantum Algorithm for Attention Computation

41. Faster Algorithms for Structured Linear and Kernel Support Vector Machines

42. Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

43. In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

44. H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

45. Efficient Algorithm for Solving Hyperbolic Programs

46. InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

47. Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation

48. Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis

49. Sparse Convolution for Approximate Sparse Instance

50. A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models

Catalog

Books, media, physical & digital resources