Author: "Yang, Kaige" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yang, Kaige"' showing total 6 results

Start Over Author "Yang, Kaige" Database arXiv

6 results on '"Yang, Kaige"'

1. Learn Dynamic-Aware State Embedding for Transfer Learning

Author: Yang, Kaige
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Transfer reinforcement learning aims to improve the sample efficiency of solving unseen new tasks by leveraging experiences obtained from previous tasks. We consider the setting where all tasks (MDPs) share the same environment dynamic except reward function. In this setting, the MDP dynamic is a good knowledge to transfer, which can be inferred by uniformly random policy. However, trajectories generated by uniform random policy are not useful for policy improvement, which impairs the sample efficiency severely. Instead, we observe that the binary MDP dynamic can be inferred from trajectories of any policy which avoids the need of uniform random policy. As the binary MDP dynamic contains the state structure shared over all tasks we believe it is suitable to transfer. Built on this observation, we introduce a method to infer the binary MDP dynamic on-line and at the same time utilize it to guide state embedding learning, which is then transferred to new tasks. We keep state embedding learning and policy learning separately. As a result, the learned state embedding is task and policy agnostic which makes it ideal for transfer learning. In addition, to facilitate the exploration over the state space, we propose a novel intrinsic reward based on the inferred binary MDP dynamic. Our method can be used out-of-box in combination with model-free RL algorithms. We show two instances on the basis of \algo{DQN} and \algo{A2C}. Empirical results of intensive experiments show the advantage of our proposed method in various transfer learning tasks., Comment: 11 pages
Published: 2021

2. Differentiable Linear Bandit Algorithm

Author: Yang, Kaige and Toni, Laura
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Upper Confidence Bound (UCB) is arguably the most commonly used method for linear multi-arm bandit problems. While conceptually and computationally simple, this method highly relies on the confidence bounds, failing to strike the optimal exploration-exploitation if these bounds are not properly set. In the literature, confidence bounds are typically derived from concentration inequalities based on assumptions on the reward distribution, e.g., sub-Gaussianity. The validity of these assumptions however is unknown in practice. In this work, we aim at learning the confidence bound in a data-driven fashion, making it adaptive to the actual problem structure. Specifically, noting that existing UCB-typed algorithms are not differentiable with respect to confidence bound, we first propose a novel differentiable linear bandit algorithm. Then, we introduce a gradient estimator, which allows the confidence bound to be learned via gradient ascent. Theoretically, we show that the proposed algorithm achieves a $\tilde{\mathcal{O}}(\hat{\beta}\sqrt{dT})$ upper bound of $T$-round regret, where $d$ is the dimension of arm features and $\hat{\beta}$ is the learned size of confidence bound. Empirical results show that $\hat{\beta}$ is significantly smaller than its theoretical upper bound and proposed algorithms outperforms baseline ones on both simulated and real-world datasets., Comment: 16 pages
Published: 2020

3. Laplacian-regularized graph bandits: Algorithms and theoretical analysis

Author: Yang, Kaige, Dong, Xiaowen, and Toni, Laura
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We consider a stochastic linear bandit problem with multiple users, where the relationship between users is captured by an underlying graph and user preferences are represented as smooth signals on the graph. We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as $\tilde{\mathcal{O}}(\Psi d \sqrt{T})$ with time horizon $T$, feature dimensionality $d$, and the scalar parameter $\Psi \in (0,1)$ that depends on the graph connectivity. This is an improvement over $\tilde{\mathcal{O}}(d \sqrt{T})$ in \algo{LinUCB}~\Ccite{li2010contextual}, where user relationship is not taken into account. In terms of network regret (sum of cumulative regret over $n$ users), the proposed algorithm leads to a scaling as $\tilde{\mathcal{O}}(\Psi d\sqrt{nT})$, which is a significant improvement over $\tilde{\mathcal{O}}(nd\sqrt{T})$ in the state-of-the-art algorithm \algo{Gob.Lin} \Ccite{cesa2013gang}. To improve scalability, we further propose a simplified algorithm with a linear computational complexity with respect to the number of users, while maintaining the same regret. Finally, we present a finite-time analysis on the proposed algorithms, and demonstrate their advantage in comparison with state-of-the-art graph-based bandit algorithms on both synthetic and real-world data.
Published: 2019

4. Error Analysis on Graph Laplacian Regularized Estimator

Author: Yang, Kaige, Dong, Xiaowen, and Toni, Laura
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We provide a theoretical analysis of the representation learning problem aimed at learning the latent variables (design matrix) $\Theta$ of observations $Y$ with the knowledge of the coefficient matrix $X$. The design matrix is learned under the assumption that the latent variables $\Theta$ are smooth with respect to a (known) topological structure $\mathcal{G}$. To learn such latent variables, we study a graph Laplacian regularized estimator, which is the penalized least squares estimator with penalty term proportional to a Laplacian quadratic form. This type of estimators has recently received considerable attention due to its capability in incorporating underlying topological graph structure of variables into the learning process. While the estimation problem can be solved efficiently by state-of-the-art optimization techniques, its statistical consistency properties have been largely overlooked. In this work, we develop a non-asymptotic bound of estimation error under the classical statistical setting, where sample size is larger than the ambient dimension of the latent variables. This bound illustrates theoretically the impact of the alignment between the data and the graph structure as well as the graph spectrum on the estimation accuracy. It also provides theoretical evidence of the advantage, in terms of convergence rate, of the graph Laplacian regularized estimator over classical ones (that ignore the graph structure) in case of a smoothness prior. Finally, we provide empirical results of the estimation error to corroborate the theoretical analysis.
Published: 2019

5. Data Driven Chiller Plant Energy Optimization with Domain Knowledge

Author: Vu, Hoang Dung, Chai, Kok Soon, Keating, Bryan, Tursynbek, Nurislam, Xu, Boyan, Yang, Kaige, Yang, Xiaoyan, and Zhang, Zhenjie
Subjects: Computer Science - Systems and Control, Computer Science - Machine Learning
Abstract: Refrigeration and chiller optimization is an important and well studied topic in mechanical engineering, mostly taking advantage of physical models, designed on top of over-simplified assumptions, over the equipments. Conventional optimization techniques using physical models make decisions of online parameter tuning, based on very limited information of hardware specifications and external conditions, e.g., outdoor weather. In recent years, new generation of sensors is becoming essential part of new chiller plants, for the first time allowing the system administrators to continuously monitor the running status of all equipments in a timely and accurate way. The explosive growth of data flowing to databases, driven by the increasing analytical power by machine learning and data mining, unveils new possibilities of data-driven approaches for real-time chiller plant optimization. This paper presents our research and industrial experience on the adoption of data models and optimizations on chiller plant and discusses the lessons learnt from our practice on real world plants. Instead of employing complex machine learning models, we emphasize the incorporation of appropriate domain knowledge into data analysis tools, which turns out to be the key performance improver over state-of-the-art deep learning techniques by a significant margin. Our empirical evaluation on a real world chiller plant achieves savings by more than 7% on daily power consumption., Comment: CIKM2017. Proceedings of the 26th ACM International Conference on Information and Knowledge Management. 2017
Published: 2018
Full Text: View/download PDF

6. Graph-Based Recommendation System

Author: Yang, Kaige and Toni, Laura
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this work, we study recommendation systems modelled as contextual multi-armed bandit (MAB) problems. We propose a graph-based recommendation system that learns and exploits the geometry of the user space to create meaningful clusters in the user domain. This reduces the dimensionality of the recommendation problem while preserving the accuracy of MAB. We then study the effect of graph sparsity and clusters size on the MAB performance and provide exhaustive simulation results both in synthetic and in real-case datasets. Simulation results show improvements with respect to state-of-the-art MAB algorithms.
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Yang, Kaige"'

1. Learn Dynamic-Aware State Embedding for Transfer Learning

2. Differentiable Linear Bandit Algorithm

3. Laplacian-regularized graph bandits: Algorithms and theoretical analysis

4. Error Analysis on Graph Laplacian Regularized Estimator

5. Data Driven Chiller Plant Energy Optimization with Domain Knowledge

6. Graph-Based Recommendation System

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

6 results on '"Yang, Kaige"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources