Author: "Cui, Qiwen" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cui, Qiwen"' showing total 21 results

Start Over Author "Cui, Qiwen"

21 results on '"Cui, Qiwen"'

1. BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

Author: Wang, Xuwu, Cui, Qiwen, Tao, Yunzhe, Wang, Yiran, Chai, Ziwei, Han, Xiaotian, Liu, Boyi, Yuan, Jianbo, Su, Jing, Wang, Guoyin, Liu, Tingkai, Chen, Liyu, Liu, Tianyi, Sun, Tao, Zhang, Yufeng, Zheng, Sirui, You, Quanzeng, Yang, Yang, and Yang, Hongxia
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) have become increasingly pivotal across various domains, especially in handling complex data types. This includes structured data processing, as exemplified by ChartQA and ChatGPT-Ada, and multimodal unstructured data processing as seen in Visual Question Answering (VQA). These areas have attracted significant attention from both industry and academia. Despite this, there remains a lack of unified evaluation methodologies for these diverse data handling scenarios. In response, we introduce BabelBench, an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution. BabelBench incorporates a dataset comprising 247 meticulously curated problems that challenge the models with tasks in perception, commonsense reasoning, logical reasoning, and so on. Besides the basic capabilities of multimodal understanding, structured data processing as well as code generation, these tasks demand advanced capabilities in exploration, planning, reasoning and debugging. Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement. The insights derived from our comprehensive analysis offer valuable guidance for future research within the community. The benchmark data can be found at https://github.com/FFD8FFE/babelbench.
Published: 2024

2. Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Author: Zhang, Natalia, Wang, Xinqi, Cui, Qiwen, Zhou, Runlong, Kakade, Sham M., and Du, Simon S.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems
Abstract: We initiate the study of Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games, a problem marked by the challenge of sparse feedback signals. Our theory establishes the upper complexity bounds for Nash Equilibrium in effective MARLHF, demonstrating that single-policy coverage is inadequate and highlighting the importance of unilateral dataset coverage. These theoretical insights are verified through comprehensive experiments. To enhance the practical performance, we further introduce two algorithmic techniques. (1) We propose a Mean Squared Error (MSE) regularization along the time axis to achieve a more uniform reward distribution and improve reward learning outcomes. (2) We utilize imitation learning to approximate the reference policy, ensuring stability and effectiveness in training. Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
Published: 2024

3. $\mathbf{(N,K)}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model

Author: Zhang, Yufeng, Chen, Liyu, Liu, Boyi, Yang, Yingxiang, Cui, Qiwen, Tao, Yunzhe, and Yang, Hongxia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Recent advances in reinforcement learning (RL) algorithms aim to enhance the performance of language models at scale. Yet, there is a noticeable absence of a cost-effective and standardized testbed tailored to evaluating and comparing these algorithms. To bridge this gap, we present a generalized version of the 24-Puzzle: the $(N,K)$-Puzzle, which challenges language models to reach a target value $K$ with $N$ integers. We evaluate the effectiveness of established RL algorithms such as Proximal Policy Optimization (PPO), alongside novel approaches like Identity Policy Optimization (IPO) and Direct Policy Optimization (DPO)., Comment: 8 pages
Published: 2024

4. Learning Optimal Tax Design in Nonatomic Congestion Games

Author: Cui, Qiwen, Fazel, Maryam, and Du, Simon S.
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: We study how to learn the optimal tax design to maximize the efficiency in nonatomic congestion games. It is known that self-interested behavior among the players can damage the system's efficiency. Tax mechanisms is a common method to alleviate this issue and induce socially optimal behavior. In this work, we take the initial step for learning the optimal tax that can minimize the social cost with \emph{equilibrium feedback}, i.e., the tax designer can only observe the equilibrium state under the enforced tax. Existing algorithms are not applicable due to the exponentially large tax function space, nonexistence of the gradient, and nonconvexity of the objective. To tackle these challenges, our algorithm leverages several novel components: (1) piece-wise linear tax to approximate the optimal tax; (2) an extra linear term to guarantee a strongly convex potential function; (3) efficient subroutine to find the ``boundary'' tax. The algorithm can find an $\epsilon$-optimal tax with $O(\beta F^2/\epsilon)$ sample complexity, where $\beta$ is the smoothness of the cost function and $F$ is the number of facilities., Comment: 19 pages
Published: 2024

5. Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

Author: Dai, Yan, Cui, Qiwen, and Du, Simon S.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning
Abstract: Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the "curse of multi-agents" (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ -- which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing *data-dependent* (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel *action-dependent bonuses* to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously., Comment: Accepted for presentation at the Conference on Learning Theory (COLT) 2024
Published: 2024

6. Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Author: Zhou, Zhaoyi, Zhu, Chuning, Zhou, Runlong, Cui, Qiwen, Gupta, Abhishek, and Du, Simon Shaolei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be important in sequential decision-making problems. In the presence of function approximation, however, these techniques often diverge due to the absence of Bellman completeness in the function classes considered, a crucial condition for the success of DP-based methods. In this paper, we show how off-policy learning techniques based on return-conditioned supervised learning (RCSL) are able to circumvent these challenges of Bellman completeness, converging under significantly more relaxed assumptions inherited from supervised learning. We prove there exists a natural environment in which if one uses two-layer multilayer perceptron as the function approximator, the layer width needs to grow linearly with the state space size to satisfy Bellman completeness while a constant layer width is enough for RCSL. These findings take a step towards explaining the superior empirical performance of RCSL methods compared to DP-based methods in environments with near-optimal datasets. Furthermore, in order to learn from sub-optimal datasets, we propose a simple framework called MBRCSL, granting RCSL methods the ability of dynamic programming to stitch together segments from distinct trajectories. MBRCSL leverages learned dynamics models and forward sampling to accomplish trajectory stitching while avoiding the need for Bellman completeness that plagues all dynamic programming algorithms. We propose both theoretical analysis and experimental evaluation to back these claims, outperforming state-of-the-art model-free and model-based offline RL algorithms across several simulated robotics problems.
Published: 2023

7. A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Author: Jiang, Haozhe, Cui, Qiwen, Xiong, Zhihan, Fazel, Maryam, and Du, Simon S.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve $\widetilde{O}\left(\Delta^{1/4}T^{3/4}\right)$ regret when the degree of nonstationarity, as measured by total variation $\Delta$, is known, and $\widetilde{O}\left(\Delta^{1/5}T^{4/5}\right)$ regret when $\Delta$ is unknown, where $T$ is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria., Comment: 26 Pages, 2 figures
Published: 2023

8. Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

Author: Cui, Qiwen, Zhang, Kaiqing, and Du, Simon S.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: We propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with each agent's own function class complexity, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games. In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in Daskalakis et al. 2022, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters., Comment: 51 pages. Update: Accepted for presentation at the Conference on Learning Theory (COLT) 2023
Published: 2023

9. Offline congestion games: How feedback type affects data coverage requirement

Author: Jiang, Haozhe, Cui, Qiwen, Xiong, Zhihan, Fazel, Maryam, and Du, Simon S.
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback., Comment: 20 pages, 3 figures
Published: 2022

10. Learning in Congestion Games with Bandit Feedback

Author: Cui, Qiwen, Xiong, Zhihan, Fazel, Maryam, and Du, Simon S.
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: In this paper, we investigate Nash-regret minimization in congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and obtain finite-sample guarantees. Then we propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design. By exploiting the structure of the congestion game, we show the sample complexity of both algorithms depends only polynomially on the number of players and the number of facilities, but not the size of the action set, which can be exponentially large in terms of the number of facilities. We further define a new problem class, Markov congestion games, which allows us to model the non-stationarity in congestion games. We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set., Comment: 34 pages, Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)
Published: 2022

11. On Gap-dependent Bounds for Offline Reinforcement Learning

Author: Wang, Xinqi, Cui, Qiwen, and Du, Simon S.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior work showed when the density ratio between an optimal policy and the behavior policy is upper bounded (the optimal policy coverage assumption), then the agent can achieve an $O\left(\frac{1}{\epsilon^2}\right)$ rate, which is also minimax optimal. We show under the optimal policy coverage assumption, the rate can be improved to $O\left(\frac{1}{\epsilon}\right)$ when there is a positive sub-optimality gap in the optimal $Q$-function. Furthermore, we show when the visitation probabilities of the behavior policy are uniformly lower bounded for states where an optimal policy's visitation probabilities are positive (the uniform optimal policy coverage assumption), the sample complexity of identifying an optimal policy is independent of $\frac{1}{\epsilon}$. Lastly, we present nearly-matching lower bounds to complement our gap-dependent upper bounds., Comment: 33 pages, 1 figure, submitted to NeurIPS 2022
Published: 2022

12. Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

Author: Cui, Qiwen and Du, Simon S.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems
Abstract: This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle that builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales $\sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th player and $m$ is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space $\Pi_{i=1}^m A_i$ due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-specified strategy class $\Pi$ as input and output a strategy that is close to the best strategy in $\Pi$. In this setting, the sample complexity only scales with $\log |\Pi|$ instead of $\sum_{i=1}^mA_i$., Comment: 34 pages; accepted by NeurIPS 2022
Published: 2022

13. When is Offline Two-Player Zero-Sum Markov Game Solvable?

Author: Cui, Qiwen and Du, Simon S.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning
Abstract: We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games. Our work serves as an important initial step towards understanding offline multi-agent reinforcement learning., Comment: 30 pages; accepted by NeurIPS 2022
Published: 2022

14. Randomized Exploration for Reinforcement Learning with General Value Function Approximation

Author: Ishfaq, Haque, Cui, Qiwen, Nguyen, Viet, Ayoub, Alex, Yang, Zhuoran, Wang, Zhaoran, Precup, Doina, and Yang, Lin F.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class $\mathcal{F}$, our algorithm achieves a worst-case regret bound of $\widetilde{O}(\mathrm{poly}(d_EH)\sqrt{T})$ where $T$ is the time elapsed, $H$ is the planning horizon and $d_E$ is the $\textit{eluder dimension}$ of $\mathcal{F}$. In the linear setting, our algorithm reduces to LSVI-PHE, a variant of RLSVI, that enjoys an $\widetilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret. We complement the theory with an empirical evaluation across known difficult exploration tasks., Comment: 32 page, 5 figures, in Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021
Published: 2021

15. NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

Author: Yang, Minghan, Xu, Dong, Cui, Qiwen, Wen, Zaiwen, and Xu, Pengxiang
Subjects: Mathematics - Optimization and Control, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper, a novel second-order method called NG+ is proposed. By following the rule ``the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with DLRM illustrate that GN+ is competitive with the state-of-the-art methods.
Published: 2021

16. Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

Author: Xiong, Zhihan, Shen, Ruoqi, Cui, Qiwen, Fazel, Maryam, and Du, Simon S.
Subjects: Computer Science - Machine Learning
Abstract: We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case $\widetilde{O}\left(H\sqrt{SAT}\right)$ regret bound for episodic time-inhomogeneous Markov Decision Process where $S$ is the size of state space, $A$ is the size of action space, $H$ is the planning horizon and $T$ is the number of interactions. This bound polynomially improves all existing bounds for algorithms based on randomized value functions, and for the first time, matches the $\Omega\left(H\sqrt{SAT}\right)$ lower bound up to logarithmic factors. Our result highlights that randomized exploration can be near-optimal, which was previously achieved only by optimistic algorithms. To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret., Comment: 41 pages, 3 figures, Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)
Published: 2021

17. Minimax Sample Complexity for Turn-based Stochastic Game

Author: Cui, Qiwen and Yang, Lin F.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning
Abstract: The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set., Comment: 15 pages
Published: 2020

18. Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Author: Cui, Qiwen and Yang, Lin F.
Subjects: Computer Science - Machine Learning
Abstract: It is believed that a model-based approach for reinforcement learning (RL) is the key to reduce sample complexity. However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case. This work considers sample complexity of finding an $\epsilon$-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an $\epsilon$-optimal policy in a $\gamma$-discounted MDP is $O(K/(1-\gamma)^3\epsilon^2)$, which only depends on the dimensionality $K$ of the feature space and has no dependence on the state or action space. We further extend our results to a relaxed setting where anchor-states may not exist and show that a plug-in approach can be sample efficient as well, providing a flexible approach to design model-based algorithms for RL., Comment: 30 pages, to appear in NeurIPS2020
Published: 2020

19. Clinical decision support model for tooth extraction therapy derived from electronic dental records

Author: Cui, Qiwen, Chen, Qingxiao, Liu, Pufan, Liu, Debin, and Wen, Zaiwen
Published: 2021
Full Text: View/download PDF

20. An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization

Author: Yang, Minghan, primary, Xu, Dong, additional, Cui, Qiwen, additional, Wen, Zaiwen, additional, and Xu, Pengxiang, additional
Published: 2022
Full Text: View/download PDF

21. An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization

Author: Yang, Minghan, Xu, Dong, Cui, Qiwen, Wen, Zaiwen, and Xu, Pengxiang
Abstract: Although the shapes of the parameters are not crucial for designing first-order optimization methods in large scale empirical risk minimization problems, they have important impact on the size of the matrix to be inverted when developing second-order type methods. In this article, we propose an efficient and novel second-order method based on the parameters in the real matrix space $\mathbb {R}^{m\times n}$Rm×n and a matrix-product approximate Fisher matrix (MatFisher) by using the products of gradients. The size of the matrix to be inverted is much smaller than that of the Fisher information matrix in the real vector space $\mathbb {R}^{d}$Rd. Moreover, by utilizing the matrix delayed update and the block diagonal approximation techniques, the computational cost can be controlled and is comparable with first-order methods. A global convergence and a superlinear local convergence analysis are established under mild conditions. Numerical results on image classification with ResNet50, quantum chemistry modeling with SchNet, and data-driven partial differential equations solution with PINN illustrate that our method is quite competitive to the state-of-the-art methods.
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

21 results on '"Cui, Qiwen"'

1. BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

2. Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

3. $\mathbf{(N,K)}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model

4. Learning Optimal Tax Design in Nonatomic Congestion Games

5. Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

6. Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

7. A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

8. Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

9. Offline congestion games: How feedback type affects data coverage requirement

10. Learning in Congestion Games with Bandit Feedback

11. On Gap-dependent Bounds for Offline Reinforcement Learning

12. Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

13. When is Offline Two-Player Zero-Sum Markov Game Solvable?

14. Randomized Exploration for Reinforcement Learning with General Value Function Approximation

15. NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

16. Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

17. Minimax Sample Complexity for Turn-based Stochastic Game

18. Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

19. Clinical decision support model for tooth extraction therapy derived from electronic dental records

20. An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization

21. An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

21 results on '"Cui, Qiwen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources