Search

Your search keyword '"Jaggi, Martin"' showing total 39 results

Search Constraints

Start Over You searched for: Author "Jaggi, Martin" Remove constraint Author: "Jaggi, Martin" Publisher arxiv Remove constraint Publisher: arxiv
39 results on '"Jaggi, Martin"'

Search Results

1. Provably Personalized and Robust Federated Learning

2. Rotational Optimizers: Simple & Robust DNN Training

3. Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods

4. Landmark Attention: Random-Access Infinite Context Length for Transformers

5. Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

6. Layerwise Linear Mode Connectivity

7. Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

8. Collaborative Learning via Prediction Consensus

9. Beyond spectral gap (extended): The role of the topology in decentralized learning

10. Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

11. Byzantine-Robust Decentralized Learning via ClippedGossip

12. Scalable Collaborative Learning via Representation Sharing

13. Special Properties of Gradient Descent with Large Learning Rates

14. On Second-order Optimization Methods for Federated Learning

15. Linear Speedup in Personalized Collaborative Learning

16. Lightweight Cross-Lingual Sentence Representation Learning

17. Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

18. Exact Optimization of Conformal Predictors via Incremental and Decremental Learning

19. A Field Guide to Federated Optimization

20. Masked Training of Neural Networks with Partial Gradients

21. RelaySum for Decentralized Deep Learning on Heterogeneous Data

22. Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation

23. Multi-Head Attention: Collaborate Instead of Concatenate

24. Dynamic Model Pruning with Feedback

25. Sparse Communication for Training Deep Networks

26. Robust Cross-lingual Embeddings from Parallel Sentences

27. Decentralized Deep Learning with Arbitrary Communication Compression

28. MLSys: The New Frontier of Machine Learning Systems

29. Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations

30. Don't Use Large Mini-Batches, Use Local SGD

31. COLA: Decentralized Linear Learning

32. Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

33. Training DNNs with Hybrid Block Floating Point

34. Approximate Steepest Coordinate Descent

35. Distributed Optimization with Arbitrary Local Solvers

36. L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

37. On the Global Linear Convergence of Frank-Wolfe Optimization Variants

38. Convex Optimization without Projection Steps

39. A Combinatorial Algorithm to Compute Regularization Paths

Catalog

Books, media, physical & digital resources