Search

Your search keyword '"Tang, Yunhao"' showing total 257 results

Search Constraints

Start Over You searched for: Author "Tang, Yunhao" Remove constraint Author: "Tang, Yunhao"
257 results on '"Tang, Yunhao"'

Search Results

1. On scalable oversight with weak LLMs judging strong LLMs

2. A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

3. Offline Regularised Reinforcement Learning for Large Language Models Alignment

4. Understanding the performance gap between online and offline alignment algorithms

5. Human Alignment of Large Language Models through Online Preference Optimisation

6. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

7. A Distributional Analogue to the Successor Representation

8. Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

9. Off-policy Distributional Q($\lambda$): Distributional RL without Importance Sampling

10. Generalized Preference Optimization: A Unified Approach to Offline Alignment

11. Learning Uncertainty-Aware Temporally-Extended Actions

12. Gemini: A Family of Highly Capable Multimodal Models

13. Nash Learning from Human Feedback

16. DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

17. Towards a Better Understanding of Representation Dynamics under TD-learning

18. VA-learning as a more efficient alternative to Q-learning

19. The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

20. Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

21. Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

22. Fast Rates for Maximum Entropy Exploration

23. The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

24. An Analysis of Quantile Temporal-Difference Learning

25. Understanding Self-Predictive Learning for Reinforcement Learning

27. The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

28. BYOL-Explore: Exploration by Bootstrapped Prediction

29. KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

30. From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

31. Marginalized Operators for Off-policy Reinforcement Learning

33. Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

35. Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

36. Taylor Expansion of Discount Factors

37. Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning

38. Unlocking Pixels for Reinforcement Learning via Implicit Attention

39. ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

40. Monte-Carlo Tree Search as Regularized Policy Optimization

41. Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies

42. Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

43. Self-Imitation Learning via Generalized Lower Bound Q-learning

44. Taylor Expansion Policy Optimization

45. Discrete Action On-Policy Learning with Action-Value Critic

46. ES-MAML: Simple Hessian-Free Meta Learning

47. Reinforcement Learning with Chromatic Networks for Compact Architecture Search

48. Reinforcement Learning for Integer Programming: Learning to Cut

49. Learning to Score Behaviors for Guided Policy Optimization

50. Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes

Catalog

Books, media, physical & digital resources