Search

Your search keyword '"Wang, Zhaoran"' showing total 670 results

Search Constraints

Start Over You searched for: Author "Wang, Zhaoran" Remove constraint Author: "Wang, Zhaoran"
670 results on '"Wang, Zhaoran"'

Search Results

1. Just say what you want: only-prompting self-rewarding online preference optimization

2. Safe MPC Alignment with Human Directional Feedback

3. Toward Optimal LLM Alignments Using Two-Player Games

4. Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

5. Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

6. A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

7. Advancing Object Goal Navigation Through LLM-enhanced Object Affinities Transfer

8. Can Large Language Models Play Games? A Case Study of A Self-Play Approach

9. How Can LLM Guide RL? A Value-Based Approach

10. Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

11. Sparse PCA with Oracle Property

12. Empowering Autonomous Driving with Large Language Models: A Safety Perspective

13. Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks

14. A Principled Framework for Knowledge-enhanced Large Language Model

17. Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

18. Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

19. Learning Regularized Graphon Mean-Field Games with Unknown Graphons

20. Learning Regularized Monotone Graphon Mean-Field Games

21. Let Models Speak Ciphers: Multiagent Debate through Embeddings

22. Sample-Efficient Multi-Agent RL: An Optimization Perspective

23. Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

24. Streamed gaming

25. Contextual Dynamic Pricing with Strategic Buyers

26. A General Framework for Sequential Decision-Making under Adaptivity Constraints

27. Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

28. What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization

29. Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

30. Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

32. Dynamic Datasets and Market Environments for Financial Reinforcement Learning

33. Wardrop Equilibrium Can Be Boundedly Rational: A New Behavioral Theory of Route Choice

34. Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

35. A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations

36. Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models with Reinforcement Learning

37. Differentiable Arbitrating in Zero-sum Markov Games

38. Achieving Hierarchy-Free Approximation for Bilevel Programs With Equilibrium Constraints

39. An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

40. Offline Policy Optimization in RL with Variance Regularizaton

41. Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

42. Policy learning 'without' overlap: Pessimism and generalized empirical Bernstein's inequality

43. Latent Variable Representation for Reinforcement Learning

44. FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning

45. GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

46. A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

47. Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

48. Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

49. Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

50. Differentiable Bilevel Programming for Stackelberg Congestion Games

Catalog

Books, media, physical & digital resources