Search

Your search keyword '"Song, Zhao"' showing total 1,790 results

Search Constraints

Start Over You searched for: Author "Song, Zhao" Remove constraint Author: "Song, Zhao" Search Limiters Full Text Remove constraint Search Limiters: Full Text
1,790 results on '"Song, Zhao"'

Search Results

1. Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

2. A Tighter Complexity Analysis of SparseGPT

3. Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method

4. Fast John Ellipsoid Computation with Differential Privacy Optimization

5. Differential Privacy of Cross-Attention with Provable Guarantee

6. Differential Privacy Mechanisms in Neural Tangent Kernel Regression

7. On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

8. Toward Infinite-Long Prefix in Transformer

9. Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

10. Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

11. Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

12. Binary Hypothesis Testing for Softmax Models and Leverage Score Models

13. Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

14. Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

15. How to Inverting the Leverage Score Distribution?

16. Attention is Naturally Sparse with Gaussian Distributed Input

17. Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

18. Quantum Speedup for Spectral Approximation of Kronecker Products

19. On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

20. The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

21. Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

23. One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

24. A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

25. Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training

26. The Expressibility of Polynomial based Attention Scheme

27. Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

28. Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

29. Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention

30. An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

31. How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

32. Fine-tune Language Models to Approximate Unbiased In-context Learning

33. A Unified Scheme of ResNet and Softmax

34. Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

35. A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

36. Streaming Semidefinite Programs: $O(\sqrt{n})$ Passes, Small Space and Fast Runtime

37. Online Adaptive Mahalanobis Distance Estimation

38. Solving Attention Kernel Regression Problem via Pre-conditioner

39. How to Protect Copyright Data in Optimization of Large Language Models?

40. GradientCoin: A Peer-to-Peer Decentralized Large Language Models

41. Clustered Linear Contextual Bandits with Knapsacks

42. Convergence of Two-Layer Regression with Nonlinear Units

43. Zero-th Order Algorithm for Softmax Attention Optimization

44. Fast Quantum Algorithm for Attention Computation

45. Faster Algorithms for Structured Linear and Kernel Support Vector Machines

46. Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification

47. In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

48. H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

49. Efficient Algorithm for Solving Hyperbolic Programs

50. InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding

Catalog

Books, media, physical & digital resources