Search

Your search keyword '"Song, Zhao"' showing total 3,645 results

Search Constraints

Start Over You searched for: Author "Song, Zhao" Remove constraint Author: "Song, Zhao" Publication Year Range Last 50 years Remove constraint Publication Year Range: Last 50 years
3,645 results on '"Song, Zhao"'

Search Results

1. Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

2. A Tighter Complexity Analysis of SparseGPT

3. Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method

4. Fast John Ellipsoid Computation with Differential Privacy Optimization

5. Differential Privacy of Cross-Attention with Provable Guarantee

6. Differential Privacy Mechanisms in Neural Tangent Kernel Regression

7. On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

8. Toward Infinite-Long Prefix in Transformer

9. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

10. Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

11. Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

12. Binary Hypothesis Testing for Softmax Models and Leverage Score Models

13. Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

14. Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

15. How to Inverting the Leverage Score Distribution?

16. Attention is Naturally Sparse with Gaussian Distributed Input

17. Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

18. Quantum Speedup for Spectral Approximation of Kronecker Products

19. On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

20. The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

21. Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

24. One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

25. A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

26. Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training

27. The Expressibility of Polynomial based Attention Scheme

28. Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

29. Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

30. Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention

31. An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

32. How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

33. Fine-tune Language Models to Approximate Unbiased In-context Learning

34. A Unified Scheme of ResNet and Softmax

35. Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

36. A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

37. Streaming Semidefinite Programs: $O(\sqrt{n})$ Passes, Small Space and Fast Runtime

38. Online Adaptive Mahalanobis Distance Estimation

40. Solving Attention Kernel Regression Problem via Pre-conditioner

41. How to Protect Copyright Data in Optimization of Large Language Models?

42. GradientCoin: A Peer-to-Peer Decentralized Large Language Models

43. Clustered Linear Contextual Bandits with Knapsacks

44. Convergence of Two-Layer Regression with Nonlinear Units

45. Zero-th Order Algorithm for Softmax Attention Optimization

46. Fast Quantum Algorithm for Attention Computation

47. Faster Algorithms for Structured Linear and Kernel Support Vector Machines

48. Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

49. In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

50. H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Catalog

Books, media, physical & digital resources