Author: "Lee Donghwan" / Database: OAIster - Searchworks@Jio Institute Digital Library Search Results

1. Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach

Author: Jeong, Narim, Lee, Donghwan, Jeong, Narim, and Lee, Donghwan
Abstract: Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems where an agent aims to maximize the entropy regularized value function. Despite its empirical success, there have been limited theoretical studies of soft Q-learning to date. This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms. We focus on two types of soft Q-learning algorithms: one utilizing the log-sum-exp operator and the other employing the Boltzmann operator. By using dynamical switching system models, we derive novel finite-time error bounds for both soft Q-learning algorithms. We hope that our analysis will deepen the current understanding of soft Q-learning by establishing connections with switching system models and may even pave the way for new frameworks in the finite-time analysis of other reinforcement learning algorithms., Comment: 15 pages
Published: 2024

2. Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation

Author: Lee, Donghwan and Lee, Donghwan
Abstract: This paper analyzes multi-step TD-learning algorithms within the `deadly triad' scenario, characterized by linear function approximation, off-policy learning, and bootstrapping. In particular, we prove that n-step TD-learning algorithms converge to a solution as the sampling horizon n increases sufficiently. The paper is divided into two parts. In the first part, we comprehensively examine the fundamental properties of their model-based deterministic counterparts, including projected value iteration, gradient descent algorithms, and the control theoretic approach, which can be viewed as prototype deterministic algorithms whose analysis plays a pivotal role in understanding and developing their model-free reinforcement learning counterparts. In particular, we prove that these algorithms converge to meaningful solutions when n is sufficiently large. Based on these findings, two n-step TD-learning algorithms are proposed and analyzed, which can be seen as the model-free reinforcement learning counterparts of the gradient and control theoretic algorithms.
Published: 2024

3. Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

Author: Lim, Han-Dong, Lee, HyeAnn, Lee, Donghwan, Lim, Han-Dong, Lee, HyeAnn, and Lee, Donghwan
Abstract: Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of $Q$-learning when integrated with a model-based approach. Through theoretical analyses and empirical evaluations, we seek to elucidate the conditions under which model-based $Q$-learning excels in terms of sample efficiency compared to its model-free counterpart.
Published: 2024

4. Harnessing Membership Function Dynamics for Stability Analysis of T-S Fuzzy Systems

Author: Lee, Donghwan, Kim, Do-Wan, Lee, Donghwan, and Kim, Do-Wan
Abstract: The main goal of this paper is to develop a new linear matrix inequality (LMI) condition for the asymptotic stability of continuous-time Takagi-Sugeno (T-S) fuzzy systems. A key advantage of this new condition is its independence from the bounds on the time-derivatives of the membership functions, a requirement present in the existing approaches. This is achieved by introducing a novel fuzzy Lyapunov function that incorporates an augmented state vector. Notably, this augmented state vector encompasses the membership functions, allowing the dynamics of these functions to be integrated into the proposed condition. This inclusion of additional information about the membership function serves to reduce the conservativeness of the suggested stability condition. To demonstrate the effectiveness of the proposed method, examples are provided., Comment: arXiv admin note: substantial text overlap with arXiv:2309.06841
Published: 2024

5. A finite time analysis of distributed Q-learning

Author: Lim, Han-Dong, Lee, Donghwan, Lim, Han-Dong, and Lee, Donghwan
Abstract: Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of $\tilde{\mathcal{O}}\left( \min\left\{\frac{1}{\epsilon^2}\frac{t_{\text{mix}}}{(1-\gamma)^6 d_{\min}^4 } ,\frac{1}{\epsilon}\frac{\sqrt{|\gS||\gA|}}{(1-\sigma_2(\boldsymbol{W}))(1-\gamma)^4 d_{\min}^3} \right\}\right)$ under tabular lookup
Published: 2024

6. Unified ODE Analysis of Smooth Q-Learning Algorithms

Author: Lee, Donghwan and Lee, Donghwan
Abstract: Convergence of Q-learning has been the focus of extensive research over the past several decades. Recently, an asymptotic convergence analysis for Q-learning was introduced using a switching system framework. This approach applies the so-called ordinary differential equation (ODE) approach to prove the convergence of the asynchronous Q-learning modeled as a continuous-time switching system, where notions from switching system theory are used to prove its asymptotic stability without using explicit Lyapunov arguments. However, to prove stability, restrictive conditions, such as quasi-monotonicity, must be satisfied for the underlying switching systems, which makes it hard to easily generalize the analysis method to other reinforcement learning algorithms, such as the smooth Q-learning variants. In this paper, we present a more general and unified convergence analysis that improves upon the switching system approach and can analyze Q-learning and its smooth variants. The proposed analysis is motivated by previous work on the convergence of synchronous Q-learning based on $p$-norm serving as a Lyapunov function. However, the proposed analysis addresses more general ODE models that can cover both asynchronous Q-learning and its smooth versions with simpler frameworks.
Published: 2024

7. Backstepping Temporal Difference Learning

Author: Lim, Han-Dong, Lee, Donghwan, Lim, Han-Dong, and Lee, Donghwan
Abstract: Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD-learning algorithms, including gradient-TD learning (GTD), and TD-learning with correction (TDC), have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective, and propose a new convergent algorithm. Our method relies on the backstepping technique, which is widely used in nonlinear control theory. Finally, convergence of the proposed algorithm is experimentally verified in environments where the standard TD-learning is known to be unstable.
Published: 2023

8. Demystifying Disagreement-on-the-Line in High Dimensions

Author: Lee, Donghwan, Moniri, Behrad, Huang, Xinmeng, Dobriban, Edgar, Hassani, Hamed, Lee, Donghwan, Moniri, Behrad, Huang, Xinmeng, Dobriban, Edgar, and Hassani, Hamed
Abstract: Evaluating the performance of machine learning models under distribution shift is challenging, especially when we only have unlabeled data from the shifted (target) domain, along with labeled data from the original (source) domain. Recent work suggests that the notion of disagreement, the degree to which two models trained with different randomness differ on the same input, is a key to tackle this problem. Experimentally, disagreement and prediction error have been shown to be strongly connected, which has been used to estimate model performance. Experiments have led to the discovery of the disagreement-on-the-line phenomenon, whereby the classification error under the target domain is often a linear function of the classification error under the source domain; and whenever this property holds, disagreement under the source and target domain follow the same linear relation. In this work, we develop a theoretical foundation for analyzing disagreement in high-dimensional random features regression; and study under what conditions the disagreement-on-the-line phenomenon occurs in our setting. Experiments on CIFAR-10-C, Tiny ImageNet-C, and Camelyon17 are consistent with our theory and support the universality of the theoretical findings.
Published: 2023

9. On Some Geometric Behavior of Value Iteration on the Orthant: Switching System Perspective

Author: Lee, Donghwan and Lee, Donghwan
Abstract: In this paper, the primary goal is to offer additional insights into the value iteration through the lens of switching system models in the control community. These models establish a connection between value iteration and switching system theory and reveal additional geometric behaviors of value iteration in solving discounted Markov decision problems. Specifically, the main contributions of this paper are twofold: 1) We provide a switching system model of value iteration and, based on it, offer a different proof for the contraction property of the value iteration. 2) Furthermore, from the additional insights, new geometric behaviors of value iteration are proven when the initial iterate lies in a special region. We anticipate that the proposed perspectives might have the potential to be a useful tool, applicable in various settings. Therefore, further development of these methods could be a valuable avenue for future research.
Published: 2023

10. TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Author: Choi, Jaehoon, Jung, Dongki, Lee, Taejae, Kim, Sangwook, Jung, Youngdong, Manocha, Dinesh, Lee, Donghwan, Choi, Jaehoon, Jung, Dongki, Lee, Taejae, Kim, Sangwook, Jung, Youngdong, Manocha, Dinesh, and Lee, Donghwan
Abstract: We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, we adopt the neural implicit surface reconstruction method, which allows for high-quality mesh and develops a new training process for applying a regularization provided by classical multi-view stereo methods. Moreover, we apply a differentiable rendering to fine-tune incomplete texture maps and generate textures which are perceptually closer to the original scene. Our pipeline can be applied to any common objects in the real world without the need for either in-the-lab environments or accurate mask images. We demonstrate results of captured objects with complex shapes and validate our method numerically against existing 3D reconstruction and texture mapping methods., Comment: Accepted to CVPR23. Project Page: https://jh-choi.github.io/TMO
Published: 2023

11. A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks

Author: Moniri, Behrad, Lee, Donghwan, Hassani, Hamed, Dobriban, Edgar, Moniri, Behrad, Lee, Donghwan, Hassani, Hamed, and Dobriban, Edgar
Abstract: Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the training and test errors, we demonstrate that these non-linear features can enhance learning.
Published: 2023

12. Suppressing Overestimation in Q-Learning through Adversarial Behaviors

Author: Lee, HyeAnn, Lee, Donghwan, Lee, HyeAnn, and Lee, Donghwan
Abstract: The goal of this paper is to propose a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ), that can effectively regulate the overestimation bias in standard Q-learning. With the dummy player, the learning can be formulated as a two-player zero-sum game. The proposed DAQ unifies several Q-learning variations to control overestimation biases, such as maxmin Q-learning and minmax Q-learning (proposed in this paper) in a single framework. The proposed DAQ is a simple but effective way to suppress the overestimation bias thourgh dummy adversarial behaviors and can be easily applied to off-the-shelf reinforcement learning algorithms to improve the performances. A finite-time convergence of DAQ is analyzed from an integrated perspective by adapting an adversarial Q-learning. The performance of the suggested DAQ is empirically demonstrated under various benchmark environments.
Published: 2023

13. A primal-dual perspective for distributed TD-learning

Author: Lim, Han-Dong, Lee, Donghwan, Lim, Han-Dong, and Lee, Donghwan
Abstract: The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.
Published: 2023

14. Relaxed Conditions for Parameterized Linear Matrix Inequality in the Form of Nested Fuzzy Summations

Author: Kim, Do Wan, Lee, Donghwan, Kim, Do Wan, and Lee, Donghwan
Abstract: The aim of this study is to investigate less conservative conditions for parameterized linear matrix inequalities (PLMIs) that are formulated as nested fuzzy summations. Such PLMIs are commonly encountered in stability analysis and control design problems for Takagi-Sugeno (T-S) fuzzy systems. Utilizing the weighted inequality of arithmetic and geometric means (AM-GM inequality), we develop new, less conservative linear matrix inequalities for the PLMIs. This methodology enables us to efficiently handle the product of membership functions that have intersecting indices. Through empirical case studies, we demonstrate that our proposed conditions produce less conservative results compared to existing approaches in the literature., Comment: This work has been submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems for possible publications
Published: 2023

15. On the Local Quadratic Stability of T-S Fuzzy Systems in the Vicinity of the Origin

Author: Lee, Donghwan, Kim, Do Wan, Lee, Donghwan, and Kim, Do Wan
Abstract: The main goal of this paper is to introduce new local stability conditions for continuous-time Takagi-Sugeno (T-S) fuzzy systems. These stability conditions are based on linear matrix inequalities (LMIs) in combination with quadratic Lyapunov functions. Moreover, they integrate information on the membership functions at the origin and effectively leverage the linear structure of the underlying nonlinear system in the vicinity of the origin. As a result, the proposed conditions are proved to be less conservative compared to existing methods using fuzzy Lyapunov functions in the literature. Moreover, we establish that the proposed methods offer necessary and sufficient conditions for the local exponential stability of T-S fuzzy systems. The paper also includes discussions on the inherent limitations associated with fuzzy Lyapunov approaches. To demonstrate the theoretical results, we provide comprehensive examples that elucidate the core concepts and validate the efficacy of the proposed conditions.
Published: 2023

16. Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes

Author: Lee, Donghwan, Lim, Han-Dong, Kim, Do Wan, Lee, Donghwan, Lim, Han-Dong, and Kim, Do Wan
Abstract: The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Moreover, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. We first introduce a novel distributed DP, inspired by the distributed optimization method of Wang and Elia. Next, a new distributed DP is introduced through a decoupling process. The convergence of the DP algorithms is proved through systems and control perspectives. The study in this paper sets the stage for new distributed temporal different learning algorithms.
Published: 2023

17. Temporal Difference Learning with Experience Replay

Author: Lim, Han-Dong, Lee, Donghwan, Lim, Han-Dong, and Lee, Donghwan
Abstract: Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time behavior, including the finite time bound on mean squared error and sample complexity. On the empirical side, experience replay has been a key ingredient in the success of deep RL algorithms, but its theoretical effects on RL have yet to be fully understood. In this paper, we present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay. Specifically, under the Markovian observation model, we demonstrate that for both the averaged iterate and final iterate cases, the error term induced by a constant step-size can be effectively controlled by the size of the replay buffer and the mini-batch sampled from the experience replay buffer.
Published: 2023

18. Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

Author: Lee, Donghwan and Lee, Donghwan
Abstract: The objective of this paper is to investigate the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games. Specifically, we establish a finite-time analysis of both the minimax Q-learning algorithm and the corresponding value iteration method. To enhance the analysis of both value iteration and Q-learning, we employ the switching system model of minimax Q-learning and the associated value iteration. This approach provides further insights into minimax Q-learning and facilitates a more straightforward and insightful convergence analysis. We anticipate that the introduction of these additional insights has the potential to uncover novel connections and foster collaboration between concepts in the fields of control theory and reinforcement learning communities., Comment: arXiv admin note: text overlap with arXiv:2205.05455
Published: 2023

19. Optimal Heterogeneous Collaborative Linear Regression and Contextual Bandits

Author: Huang, Xinmeng, Xu, Kan, Lee, Donghwan, Hassani, Hamed, Bastani, Hamsa, Dobriban, Edgar, Huang, Xinmeng, Xu, Kan, Lee, Donghwan, Hassani, Hamed, Bastani, Hamsa, and Dobriban, Edgar
Abstract: Large and complex datasets are often collected from several, possibly heterogeneous sources. Collaborative learning methods improve efficiency by leveraging commonalities across datasets while accounting for possible differences among them. Here we study collaborative linear regression and contextual bandits, where each instance's associated parameters are equal to a global parameter plus a sparse instance-specific term. We propose a novel two-stage estimator called MOLAR that leverages this structure by first constructing an entry-wise median of the instances' linear regression estimates, and then shrinking the instance-specific estimates towards the median. MOLAR improves the dependence of the estimation error on the data dimension, compared to independent least squares estimates. We then apply MOLAR to develop methods for sparsely heterogeneous collaborative contextual bandits, which lead to improved regret guarantees compared to independent bandit methods. We further show that our methods are minimax optimal by providing a number of lower bounds. Finally, we support the efficiency of our methods by performing experiments on both synthetic data and the PISA dataset on student educational outcomes from heterogeneous countries.
Published: 2023

20. Block Double-Submission Attack: Block Withholding Can Be Self-Destructive

Author: Lee, Suhyeon, Lee, Donghwan, Kim, Seungjoo, Lee, Suhyeon, Lee, Donghwan, and Kim, Seungjoo
Abstract: Proof-of-Work (PoW) is a Sybil control mechanism adopted in blockchain-based cryptocurrencies. It prevents the attempt of malicious actors to manipulate distributed ledgers. Bitcoin has successfully suppressed double-spending by accepting the longest PoW chain. Nevertheless, PoW encountered several major security issues surrounding mining competition. One of them is a Block WithHolding (BWH) attack that can exploit a widespread and cooperative environment called a mining pool. This attack takes advantage of untrustworthy relationships between mining pools and participating agents. Moreover, detecting or responding to attacks is challenging due to the nature of mining pools. In this paper, however, we suggest that BWH attacks also have a comparable trust problem. Because a BWH attacker cannot have complete control over BWH agents, they can betray the belonging mining pool and seek further benefits by trading with victims. We prove that this betrayal is not only valid in all attack parameters but also provides double benefits; finally, it is the best strategy for BWH agents. Furthermore, our study implies that BWH attacks may encounter self-destruction of their own revenue, contrary to their intention., Comment: This paper is an extended version of a paper accepted to ACM Advances in Financial Techologies - AFT 2022
Published: 2022

21. Finite-Time Analysis of Asynchronous Q-learning under Diminishing Step-Size from Control-Theoretic View

Author: Lim, Han-Dong, Lee, Donghwan, Lim, Han-Dong, and Lee, Donghwan
Abstract: Q-learning has long been one of the most popular reinforcement learning algorithms, and theoretical analysis of Q-learning has been an active research topic for decades. Although researches on asymptotic convergence analysis of Q-learning have a long tradition, non-asymptotic convergence has only recently come under active study. The main goal of this paper is to investigate new finite-time analysis of asynchronous Q-learning under Markovian observation models via a control system viewpoint. In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes for our analysis, which significantly improves recent development of the switching system analysis with constant step-sizes, and leads to $\mathcal{O}\left( \sqrt{\frac{\log k}{k}} \right)$ convergence rate that is comparable to or better than most of the state of the art results in the literature. In the mean while, a technique using the similarly transformation is newly applied to avoid the difficulty in the analysis posed by diminishing step-sizes. The proposed analysis brings in additional insights, covers different scenarios, and provides new simplified templates for analysis to deepen our understanding on Q-learning via its unique connection to discrete-time switching systems.
Published: 2022

22. Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective

Author: Lee, Donghwan, Kim, Do Wan, Lee, Donghwan, and Kim, Do Wan
Abstract: TD-learning is a fundamental algorithm in the field of reinforcement learning (RL), that is employed to evaluate a given policy by estimating the corresponding value function for a Markov decision process. While significant progress has been made in the theoretical analysis of TD-learning, recent research has uncovered guarantees concerning its statistical efficiency by developing finite-time error bounds. This paper aims to contribute to the existing body of knowledge by presenting a novel finite-time analysis of tabular temporal difference (TD) learning, which makes direct and effective use of discrete-time stochastic linear system models and leverages Schur matrix properties. The proposed analysis can cover both on-policy and off-policy settings in a unified manner. By adopting this approach, we hope to offer new and straightforward templates that not only shed further light on the analysis of TD-learning and related RL algorithms but also provide valuable insights for future research in this domain., Comment: arXiv admin note: text overlap with arXiv:2112.14417
Published: 2022

23. A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments

Author: Lim, Hyungtae, Yeon, Suyong, Ryu, Soohyun, Lee, Yonghan, Kim, Youngji, Yun, Jaeseong, Jung, Euigon, Lee, Donghwan, Myung, Hyun, Lim, Hyungtae, Yeon, Suyong, Ryu, Soohyun, Lee, Yonghan, Kim, Youngji, Yun, Jaeseong, Jung, Euigon, Lee, Donghwan, and Myung, Hyun
Abstract: Global registration using 3D point clouds is a crucial technology for mobile platforms to achieve localization or manage loop-closing situations. In recent years, numerous researchers have proposed global registration methods to address a large number of outlier correspondences. Unfortunately, the degeneracy problem, which represents the phenomenon in which the number of estimated inliers becomes lower than three, is still potentially inevitable. To tackle the problem, a degeneracy-robust decoupling-based global registration method is proposed, called Quatro. In particular, our method employs quasi-SO(3) estimation by leveraging the Atlanta world assumption in urban environments to avoid degeneracy in rotation estimation. Thus, the minimum degree of freedom (DoF) of our method is reduced from three to one. As verified in indoor and outdoor 3D LiDAR datasets, our proposed method yields robust global registration performance compared with other global registration methods, even for distant point cloud pairs. Furthermore, the experimental results confirm the applicability of our method as a coarse alignment. Our code is available: https://github.com/url-kaist/quatro., Comment: 8 pages. Acccepted by ICRA 2022
Published: 2022

24. SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Author: Choi, Jaehoon, Jung, Dongki, Lee, Yonghan, Kim, Deokhwa, Manocha, Dinesh, Lee, Donghwan, Choi, Jaehoon, Jung, Dongki, Lee, Yonghan, Kim, Deokhwa, Manocha, Dinesh, and Lee, Donghwan
Abstract: Monocular depth estimation in the wild inherently predicts depth up to an unknown scale. To resolve scale ambiguity issue, we present a learning algorithm that leverages monocular simultaneous localization and mapping (SLAM) with proprioceptive sensors. Such monocular SLAM systems can provide metrically scaled camera poses. Given these metric poses and monocular sequences, we propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation. Our approach is based on a teacher-student formulation which guides our network to predict high-quality depths. We demonstrate that our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments. Our full system shows improvements over recent self-supervised depth estimation and completion methods on EuRoC, OpenLORIS, and ScanNet datasets.
Published: 2022

25. T-Cal: An optimal test for the calibration of predictive models

Author: Lee, Donghwan, Huang, Xinmeng, Hassani, Hamed, Dobriban, Edgar, Lee, Donghwan, Huang, Xinmeng, Hassani, Hamed, and Dobriban, Edgar
Abstract: The prediction accuracy of machine learning methods is steadily increasing, but the calibration of their uncertainty predictions poses a significant challenge. Numerous works focus on obtaining well-calibrated predictive models, but less is known about reliably assessing model calibration. This limits our ability to know when algorithms for improving calibration have a real effect, and when their improvements are merely artifacts due to random noise in finite datasets. In this work, we consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. The null hypothesis is that the predictive model is calibrated, while the alternative hypothesis is that the deviation from calibration is sufficiently large. We find that detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. When the conditional class probabilities are H\"older continuous, we propose T-Cal, a minimax optimal test for calibration based on a debiased plug-in estimator of the $\ell_2$-Expected Calibration Error (ECE). We further propose Adaptive T-Cal, a version that is adaptive to unknown smoothness. We verify our theoretical findings with a broad range of experiments, including with several popular deep neural net architectures and several standard post-hoc calibration methods. T-Cal is a practical general-purpose tool, which -- combined with classical tests for discrete-valued predictors -- can be used to test the calibration of virtually any probabilistic classification method., Comment: The implementation of T-Cal is available at https://github.com/dh7401/T-Cal
Published: 2022

26. Regularized Q-learning

Author: Lim, Han-Dong, Lee, Donghwan, Lim, Han-Dong, and Lee, Donghwan
Abstract: Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This paper develops a new Q-learning algorithm that converges when linear function approximation is used. We prove that simply adding an appropriate regularization term ensures convergence of the algorithm. We prove its stability using a recent analysis tool based on switching system models. Moreover, we experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge. We also provide an error bound on the solution where the algorithm converges.
Published: 2022

27. Collaborative Learning of Discrete Distributions under Heterogeneity and Communication Constraints

Author: Huang, Xinmeng, Lee, Donghwan, Dobriban, Edgar, Hassani, Hamed, Huang, Xinmeng, Lee, Donghwan, Dobriban, Edgar, and Hassani, Hamed
Abstract: In modern machine learning, users often have to collaborate to learn the distribution of the data. Communication can be a significant bottleneck. Prior work has studied homogeneous users -- i.e., whose data follow the same discrete distribution -- and has provided optimal communication-efficient methods for estimating that distribution. However, these methods rely heavily on homogeneity, and are less applicable in the common case when users' discrete distributions are heterogeneous. Here we consider a natural and tractable model of heterogeneity, where users' discrete distributions only vary sparsely, on a small number of entries. We propose a novel two-stage method named SHIFT: First, the users collaborate by communicating with the server to learn a central distribution; relying on methods from robust statistics. Then, the learned central distribution is fine-tuned to estimate their respective individual distribution. We show that SHIFT is minimax optimal in our model of heterogeneity and under communication constraints. Further, we provide experimental results using both synthetic data and $n$-gram frequency estimation in the text domain, which corroborate its efficiency.
Published: 2022

28. Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark

Author: Humenberger, Martin, Cabon, Yohann, Pion, Noé, Weinzaepfel, Philippe, Lee, Donghwan, Guérin, Nicolas, Sattler, Torsten, Csurka, Gabriela, Humenberger, Martin, Cabon, Yohann, Pion, Noé, Weinzaepfel, Philippe, Lee, Donghwan, Guérin, Nicolas, Sattler, Torsten, and Csurka, Gabriela
Abstract: Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of "ground truth" for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still sgnificant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization., Comment: International Journal of Computer Vision (2022). arXiv admin note: text overlap with arXiv:2011.11946
Published: 2022
Full Text: View/download PDF

29. Control Theoretic Analysis of Temporal Difference Learning

Author: Lee, Donghwan, Kim, Do Wan, Lee, Donghwan, and Kim, Do Wan
Abstract: The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the value function associated with a given policy in a Markov Decision Process. Despite several existing works that have contributed to the theoretical understanding of TD-learning, it is only in recent years that researchers have been able to establish concrete guarantees on its statistical efficiency. In this paper, we introduce a finite-time, control-theoretic framework for analyzing TD-learning, leveraging established concepts from the field of linear systems control. Consequently, this paper provides additional insights into the mechanics of TD learning and the broader landscape of reinforcement learning, all while employing straightforward analytical tools derived from control theory., Comment: The contents of this paper have some overlaps with some other arxiv paper we have submitted. Therefore, this paper is redundant in my opinion
Published: 2021

30. New Versions of Gradient Temporal Difference Learning

Author: Lee, Donghwan, Lim, Han-Dong, Park, Jihoon, Choi, Okyong, Lee, Donghwan, Lim, Han-Dong, Park, Jihoon, and Choi, Okyong
Abstract: Sutton, Szepesv\'{a}ri and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this paper is (a) to propose some variants of GTDs with extensive comparative analysis and (b) to establish new theoretical analysis frameworks for the GTDs. These variants are based on convex-concave saddle-point interpretations of GTDs, which effectively unify all the GTDs into a single framework, and provide simple stability analysis based on recent results on primal-dual gradient dynamics. Finally, numerical comparative analysis is given to evaluate these approaches.
Published: 2021

31. On the Semidefinite Duality of Finite-Horizon LQG Problem

Author: Lee, Donghwan and Lee, Donghwan
Abstract: In this paper, our goal is to study fundamental foundations of linear quadratic Gaussian (LQG) control problems for stochastic linear time-invariant systems via Lagrangian duality of semidefinite programming (SDP) problems. In particular, we derive an SDP formulation of the finite-horizon LQG problem, and its Lagrangian duality. Moreover, we prove that Riccati equation for LQG can be derived the KKT optimality condition of the corresponding SDP problem. Besides, the proposed primal problem efficiently decouples the system matrices and the gain matrix. This allows us to develop new convex relaxations of non-convex structured control design problems such as the decentralized control problem. We expect that this work would provide new insights on the LQG problem and may potentially facilitate developments of new formulations of various optimal control problems. Numerical examples are given to demonstrate the effectiveness of the proposed methods., Comment: arXiv admin note: substantial text overlap with arXiv:2108.01457
Published: 2021

32. DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Author: Jung, Dongki, Choi, Jaehoon, Lee, Yonghan, Kim, Deokhwa, Kim, Changick, Manocha, Dinesh, Lee, Donghwan, Jung, Dongki, Choi, Jaehoon, Lee, Yonghan, Kim, Deokhwa, Kim, Changick, Manocha, Dinesh, and Lee, Donghwan
Abstract: We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Since it is difficult to collect dense depth maps from crowded indoor environments, we design our training framework without requiring depths produced from depth sensing devices. Our network leverages RGB images and sparse depth maps generated from traditional 3D reconstruction methods to estimate dense depth maps. We use two constraints to handle depth for non-rigidly moving people without tracking their motion explicitly. We demonstrate that our approach offers consistent improvements over recent depth estimation methods on the NAVERLABS dataset, which includes complex and crowded scenes.
Published: 2021

33. Convergence of Dynamic Programming on the Semidefinite Cone

Author: Lee, Donghwan and Lee, Donghwan
Abstract: The goal of this paper is to investigate new and simple convergence analysis of dynamic programming for linear quadratic regulator problem of discrete-time linear time-invariant systems. In particular, bounds on errors are given in terms of both matrix inequalities and matrix norm. Under a mild assumption on the initial parameter, we prove that the Q-value iteration exponentially converges to the optimal solution. Moreover, a global asymptotic convergence is also presented. These results are then extended to the policy iteration. We prove that in contrast to the Q-value iteration, the policy iteration always converges exponentially fast. An example is given to illustrate the results.
Published: 2021

34. Data-Driven Control Design with LMIs and Dynamic Programming

Author: Lee, Donghwan, Kim, Do Wan, Lee, Donghwan, and Kim, Do Wan
Abstract: The goal of this paper is to develop data-driven control design and evaluation strategies based on linear matrix inequalities (LMIs) and dynamic programming. We consider deterministic discrete-time LTI systems, where the system model is unknown. We propose efficient data collection schemes from the state-input trajectories together with data-driven LMIs to design state-feedback controllers for stabilization and linear quadratic regulation (LQR) problem. In addition, we investigate theoretically guaranteed exploration schemes to acquire valid data from the trajectories under different scenarios. In particular, we prove that as more and more data is accumulated, the collected data becomes valid for the proposed algorithms with higher probability. Finally, data-driven dynamic programming algorithms with convergence guarantees are then discussed.
Published: 2021

35. Multi-Objective LQG Design with Primal-Dual Method

Author: Lee, Donghwan, Kim, Do Wan, Lee, Donghwan, and Kim, Do Wan
Abstract: The goal of this paper is to study a multi-objective linear quadratic Gaussian (LQG) control problem. In particular, we consider an optimal control problem minimizing a quadratic cost over a finite time horizon for linear stochastic systems subject to control energy constraints. To solve the problem, we suggest an efficient bisection line search algorithm which is computationally efficient compared to other approaches such as the semidefinite programming. The main idea is to use the Lagrangian function and Karush-Kuhn-Tucker (KKT) optimality conditions to solve the constrained optimization problem. The Lagrange multiplier is searched using the bisection line search. Numerical examples are given to demonstrate the effectiveness of the proposed methods.
Published: 2021

36. Large-scale Localization Datasets in Crowded Indoor Spaces

Author: Lee, Donghwan, Ryu, Soohyun, Yeon, Suyong, Lee, Yonghan, Kim, Deokhwa, Han, Cheolho, Cabon, Yohann, Weinzaepfel, Philippe, Guérin, Nicolas, Csurka, Gabriela, Humenberger, Martin, Lee, Donghwan, Ryu, Soohyun, Yeon, Suyong, Lee, Yonghan, Kim, Deokhwa, Han, Cheolho, Cabon, Yohann, Weinzaepfel, Philippe, Guérin, Nicolas, Csurka, Gabriela, and Humenberger, Martin
Abstract: Estimating the precise location of a camera using visual localization enables interesting applications such as augmented reality or robot navigation. This is particularly useful in indoor environments where other localization technologies, such as GNSS, fail. Indoor spaces impose interesting challenges on visual localization algorithms: occlusions due to people, textureless surfaces, large viewpoint changes, low light, repetitive textures, etc. Existing indoor datasets are either comparably small or do only cover a subset of the mentioned challenges. In this paper, we introduce 5 new indoor datasets for visual localization in challenging real-world environments. They were captured in a large shopping mall and a large metro station in Seoul, South Korea, using a dedicated mapping platform consisting of 10 cameras and 2 laser scanners. In order to obtain accurate ground truth camera poses, we developed a robust LiDAR SLAM which provides initial poses that are then refined using a novel structure-from-motion based optimization. We present a benchmark of modern visual localization algorithms on these challenging datasets showing superior performance of structure-based methods using robust image features. The datasets are available at: https://naverlabs.com/datasets
Published: 2021

37. Simulation Studies on Deep Reinforcement Learning for Building Control with Human Interaction

Author: Lee, Donghwan, He, Niao, Lee, Seungjae, Karava, Panagiota, Hu, Jianghai, Lee, Donghwan, He, Niao, Lee, Seungjae, Karava, Panagiota, and Hu, Jianghai
Abstract: The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings. Inspired by recent advances in reinforcement learning (RL), this paper aims at assessing the potential of RL in building climate control problems with occupant interaction. We apply a recent RL approach, called DDPG (deep deterministic policy gradient), for the continuous building control tasks and assess its performance with simulation studies in terms of its ability to handle (a) the partial state observability due to sensor limitations; (b) complex stochastic system with high-dimensional state-spaces, which are jointly continuous and discrete; (c) uncertainties due to ambient weather conditions, occupant's behavior, and comfort feelings. Especially, the partial observability and uncertainty due to the occupant interaction significantly complicate the control problem. Through simulation studies, the policy learned by DDPG demonstrates reasonable performance and computational tractability.
Published: 2021

38. A Discrete-Time Switching System Analysis of Q-learning

Author: Lee, Donghwan, Hu, Jianghai, He, Niao, Lee, Donghwan, Hu, Jianghai, and He, Niao
Abstract: This paper develops a novel control-theoretic framework to analyze the non-asymptotic convergence of Q-learning. We show that the dynamics of asynchronous Q-learning with a constant step-size can be naturally formulated as a discrete-time stochastic affine switching system. Moreover, the evolution of the Q-learning estimation error is over- and underestimated by trajectories of two simpler dynamical systems. Based on these two systems, we derive a new finite-time error bound of asynchronous Q-learning when a constant stepsize is used. Our analysis also sheds light on the overestimation phenomenon of Q-learning. We further illustrate and validate the analysis through numerical simulations.
Published: 2021

39. HARMer: Cyber-attacks Automation and Evaluation

Author: Enoch, Simon Yusuf, Huang, Zhibin, Moon, Chun Yong, Lee, Donghwan, Ahn, Myung Kil, Kim, Dong Seong, Enoch, Simon Yusuf, Huang, Zhibin, Moon, Chun Yong, Lee, Donghwan, Ahn, Myung Kil, and Kim, Dong Seong
Abstract: With the increasing growth of cyber-attack incidences, it is important to develop innovative and effective techniques to assess and defend networked systems against cyber attacks. One of the well-known techniques for this is performing penetration testing which is carried by a group of security professionals (i.e, red team). Penetration testing is also known to be effective to find existing and new vulnerabilities, however, the quality of security assessment can be depending on the quality of the red team members and their time and devotion to the penetration testing. In this paper, we propose a novel automation framework for cyber-attacks generation named `HARMer' to address the challenges with respect to manual attack execution by the red team. Our novel proposed framework, design, and implementation is based on a scalable graphical security model called Hierarchical Attack Representation Model (HARM). (1) We propose the requirements and the key phases for the automation framework. (2) We propose security metrics-based attack planning strategies along with their algorithms. (3) We conduct experiments in a real enterprise network and Amazon Web Services. The results show how the different phases of the framework interact to model the attackers' operations. This framework will allow security administrators to automatically assess the impact of various threats and attacks in an automated manner., Comment: 19 pages, journal
Published: 2020
Full Text: View/download PDF

40. Periodic Q-Learning

Author: Lee, Donghwan, He, Niao, Lee, Donghwan, and He, Niao
Abstract: The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning algorithm (PQ-learning for short), which resembles the technique used in deep Q-learning for solving infinite-horizon discounted Markov decision processes (DMDP) in the tabular setting. PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate. The online estimate follows the standard Q-learning update, while the target estimate is updated periodically. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilon-optimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Q-learning algorithms.
Published: 2020

41. SelfDeco: Self-Supervised Monocular Depth Completion in Challenging Indoor Environments

Author: Choi, Jaehoon, Jung, Dongki, Lee, Yonghan, Kim, Deokhwa, Manocha, Dinesh, Lee, Donghwan, Choi, Jaehoon, Jung, Dongki, Lee, Yonghan, Kim, Deokhwa, Manocha, Dinesh, and Lee, Donghwan
Abstract: We present a novel algorithm for self-supervised monocular depth completion. Our approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels. Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions. Our novel architecture leverages both deep stacks of sparse convolution blocks to extract sparse depth features and pixel-adaptive convolutions to fuse image and depth features. We compare with existing approaches in NYUv2, KITTI, and NAVERLABS indoor datasets, and observe 5-34 % improvements in root-means-square error (RMSE) reduction.
Published: 2020

42. SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware Feature Extraction

Author: Choi, Jaehoon, Jung, Dongki, Lee, Donghwan, Kim, Changick, Choi, Jaehoon, Jung, Dongki, Lee, Donghwan, and Kim, Changick
Abstract: Self-supervised monocular depth estimation has emerged as a promising method because it does not require groundtruth depth maps during training. As an alternative for the groundtruth depth map, the photometric loss enables to provide self-supervision on depth prediction by matching the input image frames. However, the photometric loss causes various problems, resulting in less accurate depth values compared with supervised approaches. In this paper, we propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss. Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge. Therefore, we introduce multi-task learning schemes to incorporate semantic-awareness into the representation of depth features. Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods. Furthermore, extensive experiments on different datasets show its better generalization ability and robustness to various conditions, such as low-light or adverse weather.
Published: 2020

43. DEP domain-containing mTOR-interacting protein suppresses lipogenesis and ameliorates hepatic steatosis and acute-on-chronic liver injury in alcoholic liver disease

Author: Massachusetts Institute of Technology. Department of Biology, Chen, Hanqing, Shen, Feng, Sherban, Alex, Nocon, Allison, Li, Yu, Wang, Hua, Xu, Ming-Jiang, Rui, Xianliang, Han, Jinyan, Jiang, Bingbing, Lee, Donghwan, Li, Na, Keyhani-Nejad, Farnaz, Fan, Jian-gao, Liu, Feng, Kamat, Amrita, Musi, Nicolas, Guarente, Leonard Pershing, Pacher, Pal, Gao, Bin, Zang, Mengwei, Massachusetts Institute of Technology. Department of Biology, Chen, Hanqing, Shen, Feng, Sherban, Alex, Nocon, Allison, Li, Yu, Wang, Hua, Xu, Ming-Jiang, Rui, Xianliang, Han, Jinyan, Jiang, Bingbing, Lee, Donghwan, Li, Na, Keyhani-Nejad, Farnaz, Fan, Jian-gao, Liu, Feng, Kamat, Amrita, Musi, Nicolas, Guarente, Leonard Pershing, Pacher, Pal, Gao, Bin, and Zang, Mengwei
Abstract: Alcoholic liver disease (ALD) is characterized by lipid accumulation and liver injury. However, how chronic alcohol consumption causes hepatic lipid accumulation remains elusive. The present study demonstrates that activation of the mechanistic target of rapamycin complex 1 (mTORC1) plays a causal role in alcoholic steatosis, inflammation, and liver injury. Chronic-plus-binge ethanol feeding led to hyperactivation of mTORC1, as evidenced by increased phosphorylation of mTOR and its downstream kinase S6 kinase 1 (S6K1) in hepatocytes. Aberrant activation of mTORC1 was likely attributed to the defects of the DEP domain-containing mTOR-interacting protein (DEPTOR) and the nicotinamide adenine dinucleotide–dependent deacetylase sirtuin 1 (SIRT1) in the liver of chronic-plus-binge ethanol-fed mice and in the liver of patients with ALD. Conversely, adenoviral overexpression of hepatic DEPTOR suppressed mTORC1 signaling and ameliorated alcoholic hepatosteatosis, inflammation, and acute-on-chronic liver injury. Mechanistically, the lipid-lowering effect of hepatic DEPTOR was attributable to decreased proteolytic processing, nuclear translocation, and transcriptional activity of the lipogenic transcription factor sterol regulatory element-binding protein-1 (SREBP-1). DEPTOR-dependent inhibition of mTORC1 also attenuated alcohol-induced cytoplasmic accumulation of the lipogenic regulator lipin 1 and prevented alcohol-mediated inhibition of fatty acid oxidation. Pharmacological intervention with rapamycin alleviated the ability of alcohol to up-regulate lipogenesis, to down-regulate fatty acid oxidation, and to induce steatogenic phenotypes. Chronic-plus-binge ethanol feeding led to activation of SREBP-1 and lipin 1 through S6K1-dependent and independent mechanisms. Furthermore, hepatocyte-specific deletion of SIRT1 disrupted DEPTOR function, enhanced mTORC1 activity, and exacerbated alcoholic fatty liver, inflammation, and liver injury in mice. Conclusion: The dysregulation o
Published: 2020

44. A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms

Author: Lee, Donghwan, He, Niao, Lee, Donghwan, and He, Niao
Abstract: In this paper, we introduce a unified framework for analyzing a large family of Q-learning algorithms, based on switching system perspectives and ODE-based stochastic approximation. We show that the nonlinear ODE models associated with these Q-learning algorithms can be formulated as switched linear systems, and analyze their asymptotic stability by leveraging existing switching system theories. Our approach provides the first O.D.E. analysis of the asymptotic convergence of various Q-learning algorithms, including asynchronous Q-learning and averaging Q-learning. We also extend the approach to analyze Q-learning with linear function approximation and derive a new sufficient condition for its convergence., Comment: This paper has been accepted in NeurIPS2020
Published: 2019

45. Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents

Author: Lee, Donghwan, He, Niao, Kamalaruban, Parameswaran, Cevher, Volkan, Lee, Donghwan, He, Niao, Kamalaruban, Parameswaran, and Cevher, Volkan
Abstract: This article reviews recent advances in multi-agent reinforcement learning algorithms for large-scale control systems and communication networks, which learn to communicate and cooperate. We provide an overview of this emerging field, with an emphasis on the decentralized setting under different coordination protocols. We highlight the evolution of reinforcement learning algorithms from single-agent to multi-agent systems, from a distributed optimization perspective, and conclude with future directions and challenges, in the hope to catalyze the growing synergy among distributed optimization, signal processing, and reinforcement learning communities.
Published: 2019
Full Text: View/download PDF

46. Target-Based Temporal Difference Learning

Author: Lee, Donghwan, He, Niao, Lee, Donghwan, and He, Niao
Abstract: The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal difference (TD) learning algorithms and provide theoretical analysis on their convergences. In contrast to the standard TD-learning, target-based TD algorithms maintain two separate learning parameters-the target variable and online variable. Particularly, we introduce three members in the family, called the averaging TD, double TD, and periodic TD, where the target variable is updated through an averaging, symmetric, or periodic fashion, mirroring those techniques used in deep Q-learning practice. We establish asymptotic convergence analyses for both averaging TD and double TD and a finite sample analysis for periodic TD. In addition, we also provide some simulation results showing potentially superior convergence of these target-based TD algorithms compared to the standard TD-learning. While this work focuses on linear function approximation and policy evaluation setting, we consider this as a meaningful step towards the theoretical understanding of deep Q-learning variants with target networks.
Published: 2019

47. Learning to Communicate: A Machine Learning Framework for Heterogeneous Multi-Agent Robotic Systems

Author: Yoon, Hyung-Jin, Chen, Huaiyu, Long, Kehan, Zhang, Heling, Gahlawat, Aditya, Lee, Donghwan, Hovakimyan, Naira, Yoon, Hyung-Jin, Chen, Huaiyu, Long, Kehan, Zhang, Heling, Gahlawat, Aditya, Lee, Donghwan, and Hovakimyan, Naira
Abstract: We present a machine learning framework for multi-agent systems to learn both the optimal policy for maximizing the rewards and the encoding of the high dimensional visual observation. The encoding is useful for sharing local visual observations with other agents under communication resource constraints. The actor-encoder encodes the raw images and chooses an action based on local observations and messages sent by the other agents. The machine learning agent generates not only an actuator command to the physical device, but also a communication message to the other agents. We formulate a reinforcement learning problem, which extends the action space to consider the communication action as well. The feasibility of the reinforcement learning framework is demonstrated using a 3D simulation environment with two collaborating agents. The environment provides realistic visual observations to be used and shared between the two agents., Comment: AIAA SciTech 2019
Published: 2018

48. Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

Author: Yoon, Hyung-Jin, Lee, Donghwan, Hovakimyan, Naira, Yoon, Hyung-Jin, Lee, Donghwan, and Hovakimyan, Naira
Abstract: The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.
Published: 2018

49. One CNV Discordance in NRXN1 Observed Upon Genome-wide Screening in 38 Pairs of Adult Healthy Monozygotic Twins

Author: Magnusson, Patrik K. E., Lee, Donghwan, Chen, Xu, Szatkiewicz, Jin, Pramana, Setia, Teo, Shumei, Sullivan, Patrick F., Feuk, Lars, Pawitan, Yudi, Magnusson, Patrik K. E., Lee, Donghwan, Chen, Xu, Szatkiewicz, Jin, Pramana, Setia, Teo, Shumei, Sullivan, Patrick F., Feuk, Lars, and Pawitan, Yudi
Abstract: Monozygotic (MZ) twins stem from the same single fertilized egg and therefore share all their inherited genetic variation. This is one of the unequivocal facts on which genetic epidemiology and twin studies are based. To what extent this also implies that MZ twins share genotypes in adult tissues is not precisely established, but a common pragmatic assumption is that MZ twins are 100% genetically identical also in adult tissues. During the past decade, this view has been challenged by several reports, with observations of differences in post-zygotic copy number variations (CNVs) between members of the same MZ pair. In this study, we performed a systematic search for differences of CNVs within 38 adult MZ pairs who had been misclassified as dizygotic (DZ) twins by questionnaire-based assessment. Initial scoring by PennCNV suggested a total of 967 CNV discor dances. The within-pair correlation in number of CNVs detected was strongly dependent on confidence score filtering and reached a plateau of r = 0.8 when restricting to CNVs detected with confidence score larger than 50. The top-ranked discordances were subsequently selected for validation by quantitative polymerase chain reaction (qPCR), from which one single similar to 120kb deletion in NRXN1 on chromosome 2 (bp 51017111-51136802) was validated. Despite involving an exon, no sign of cognitive/mental consequences was apparent in the affected twin pair, potentially reflecting limited or lack of expression of the transcripts containing this exon in nerve/brain.
Published: 2016
Full Text: View/download PDF

50. Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies

Author: Ganna, Andrea, Lee, Donghwan, Ingelsson, Erik, Pawitan, Yudi, Ganna, Andrea, Lee, Donghwan, Ingelsson, Erik, and Pawitan, Yudi
Abstract: It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood of reporting false-positive findings. Validation becomes critical when dealing with high-throughput experiments, where the large number of tests increases the chance to observe false-positive results. In this article, we review common approaches to determine statistical thresholds for validation and describe the factors influencing the proportion of significant findings from a 'training' sample that are replicated in a 'validation' sample. We refer to this proportion as rediscovery rate (RDR). In high-throughput studies, the RDR is a function of false-positive rate and power in both the training and validation samples. We illustrate the application of the RDR using simulated data and real data examples from metabolomics experiments. We further describe an online tool to calculate the RDR using t-statistics. We foresee two main applications. First, if the validation study has not yet been collected, the RDR can be used to decide the optimal combination between the proportion of findings taken to validation and the size of the validation study. Secondly, if a validation study has already been done, the RDR estimated using the training data can be compared with the observed RDR from the validation data; hence, the success of the validation study can be assessed.
Published: 2015
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

50 results on '"Lee Donghwan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources