4 results
Search Results
2. Thompson Sampling for Stochastic Control: The Continuous Parameter Case.
- Author
-
Banjevic, Dragan and Kim, Michael Jong
- Subjects
COST control ,UNCERTAIN systems ,MARKOV processes ,SAMPLING errors - Abstract
Recently, Thompson sampling has been shown to achieve good theoretical performance guarantees for stochastic control problems with parameter uncertainty when the state, control, and parameter spaces are all finite. Much less is known however about the performance of Thompson sampling when applied to continuous or more general spaces, which constitutes an important class of problems in practice. In this paper, we study Thompson sampling when applied to a broad class of average cost stochastic control problems where the state, control, and parameter spaces are all general measurable spaces. The main contributions of our paper are establishing theoretical performance guarantees for Thompson sampling as measured by: first, expected posterior sampling error; and second, average per period regret. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
3. Informational Cascades With Nonmyopic Agents.
- Author
-
Bistritz, Ilai, Heydaribeni, Nasimeh, and Anastasopoulos, Achilleas
- Subjects
INFORMATION storage & retrieval systems ,PRODUCT quality ,EQUILIBRIUM - Abstract
We consider an environmentwhere players need to decide whether to buy a certain product (or adopt a technology) or not. The product is either good or bad, but its true value is unknown to the players. Instead, each player has her own private information on its quality. Each player can observe the previous actions of other players and estimate the quality of the product. A classic result in the literature shows that in similar settings, informational cascades occur, where learning stops for the whole network and players repeat the actions of their predecessors. In contrast to this literature, in this paper, players get more than one opportunity to act. In each turn, a player is chosen uniformly at random from all the players and can decide to buy the product and leave the market or wait. Her utility is the total expected discounted reward, and thus, myopic strategies may not constitute equilibria. We provide a characterization of perfect Bayesian equilibria (PBEs) with forward-looking strategies through a fixed-point equation of dimensionality that grows only quadratically with the number of players. Using this tractable fixed-point equation, we show the existence of a PBE and characterize PBEs with threshold strategies. Based on this characterization, we study informational cascades in two regimes. First, we show that for a discount factor $\delta$ strictly smaller than 1, informational cascades happen with high probability as the number of players $N$ increases. Furthermore, only a small portion of the total information in the system is revealed before a cascade occurs. Second, and more surprisingly, we show that for a fixed $N$ , and for a sufficiently large $\delta < 1$ , when the product is bad, there exists an equilibrium where an informational cascade can happen only after at least half of the players revealed their private information, and consequently, the probability for a “bad cascade” where all the players buy the product vanishes exponentially with $N$. Finally, when $\delta =1$ and the product is bad, there exists an equilibrium where informational cascades do not happen at all. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Efficient Learning for Selecting Important Nodes in Random Network.
- Author
-
Li, Haidong, Xu, Xiaoyun, Peng, Yijie, and Chen, Chun-Hung
- Subjects
MARKOV processes ,TAYLOR'S series ,STOCHASTIC processes ,PROBABILITY theory - Abstract
In this article, we consider the problem of selecting important nodes in a random network, where the nodes connect to each other randomly with certain transition probabilities. The node importance is characterized by the stationary probabilities of the corresponding nodes in a Markov chain defined over the network, as in Google's PageRank. Unlike a deterministic network, the transition probabilities in a random network are unknown but can be estimated by sampling. Under a Bayesian learning framework, we apply the first-order Taylor expansion and normal approximation to provide a computationally efficient posterior approximation of the stationary probabilities. In order to maximize the probability of correct selection, we propose a dynamic sampling procedure, which uses not only posterior means and variances of certain interaction parameters between different nodes, but also the sensitivities of the stationary probabilities with respect to each interaction parameter. Numerical experiment results demonstrate the superiority of the proposed sampling procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.