Search

Your search keyword '"Suzuki, Taiji"' showing total 455 results

Search Constraints

Start Over You searched for: Author "Suzuki, Taiji" Remove constraint Author: "Suzuki, Taiji"
455 results on '"Suzuki, Taiji"'

Search Results

1. Transformers Provably Solve Parity Efficiently with Chain of Thought

2. On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

3. Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization

4. Transformers are Minimax Optimal Nonparametric In-Context Learners

5. Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

6. Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples

7. High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

8. Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

9. Flow matching achieves almost minimax optimal convergence

10. State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

11. State-Free Inference of State-Space Models: The Transfer Function Approach

12. Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

13. Mechanistic Design and Scaling of Hybrid Architectures

14. Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective

15. How do Transformers perform In-Context Autoregressive Learning?

16. Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape

18. Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems

19. Scalable Federated Learning for Clients with Different Input Image Sizes and Numbers of Output Categories

21. Gradient-Based Feature Learning under Structured Data

22. Learning Green's Function Efficiently Using Low-Rank Approximations

23. Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective

24. Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

25. Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input

26. Tight and fast generalization error bound of graph embedding in metric space

27. Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

28. Diffusion Models are Minimax Optimal Distribution Estimators

29. Koopman-based generalization bound: New aspect for full-rank weights

30. DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning

31. Dimensionality-Induced Information Loss of Outliers in Deep Neural Networks

32. Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs

33. Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

35. Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

36. High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

37. Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

38. Convergence Error Analysis of Reflected Gradient Langevin Dynamics for Globally Optimizing Non-Convex Constrained Problems

39. Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

40. Convex Analysis of the Mean Field Langevin Dynamics

41. Neural Network Module Decomposition and Recomposition

42. A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

43. Layer-wise Adaptive Graph Convolution Networks Using Generalized Pagerank

44. AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network

45. On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

46. Deep Two-Way Matrix Reordering for Relational Data Analysis

47. A Goodness-of-fit Test on the Number of Biclusters in a Relational Data Matrix

48. Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

49. Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

50. Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Catalog

Books, media, physical & digital resources