Search

Your search keyword '"Song, Zhao"' showing total 193 results

Search Constraints

Start Over You searched for: Author "Song, Zhao" Remove constraint Author: "Song, Zhao" Database arXiv Remove constraint Database: arXiv
193 results on '"Song, Zhao"'

Search Results

1. Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

2. A Tighter Complexity Analysis of SparseGPT

3. Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method

4. Fast John Ellipsoid Computation with Differential Privacy Optimization

5. Differential Privacy of Cross-Attention with Provable Guarantee

6. Differential Privacy Mechanisms in Neural Tangent Kernel Regression

7. On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

8. Toward Infinite-Long Prefix in Transformer

9. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

10. Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

11. Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

12. Binary Hypothesis Testing for Softmax Models and Leverage Score Models

13. Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

14. Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

15. How to Inverting the Leverage Score Distribution?

16. Attention is Naturally Sparse with Gaussian Distributed Input

17. Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

18. Quantum Speedup for Spectral Approximation of Kronecker Products

19. On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

20. The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

21. Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

22. One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

23. A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

24. Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training

25. The Expressibility of Polynomial based Attention Scheme

26. Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

27. Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

28. Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention

29. An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

30. How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

31. Fine-tune Language Models to Approximate Unbiased In-context Learning

32. A Unified Scheme of ResNet and Softmax

33. Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

34. A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

35. Streaming Semidefinite Programs: $O(\sqrt{n})$ Passes, Small Space and Fast Runtime

36. Online Adaptive Mahalanobis Distance Estimation

37. Solving Attention Kernel Regression Problem via Pre-conditioner

38. How to Protect Copyright Data in Optimization of Large Language Models?

39. GradientCoin: A Peer-to-Peer Decentralized Large Language Models

40. Clustered Linear Contextual Bandits with Knapsacks

41. Convergence of Two-Layer Regression with Nonlinear Units

42. Zero-th Order Algorithm for Softmax Attention Optimization

43. Fast Quantum Algorithm for Attention Computation

44. Faster Algorithms for Structured Linear and Kernel Support Vector Machines

45. Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

46. In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

47. H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

48. Efficient Algorithm for Solving Hyperbolic Programs

49. InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

50. Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation

Catalog

Books, media, physical & digital resources