Author: "Laroche, Romain" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Laroche, Romain"' showing total 131 results

Start Over Author "Laroche, Romain"

131 results on '"Laroche, Romain"'

1. Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Author: Zang, Hongyu, Li, Xin, Zhang, Leiji, Liu, Yang, Sun, Baigui, Islam, Riashat, Combes, Remi Tachet des, and Laroche, Romain
Subjects: Computer Science - Machine Learning
Abstract: While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation. We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce. Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data. Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space. We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL. Codes are provided at \url{https://github.com/zanghyu/Offline_Bisimulation}., Comment: NeurIPS 2023
Published: 2023

2. Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Author: Hong, Zhang-Wei, Kumar, Aviral, Karnik, Sathwik, Bhandwaldar, Abhishek, Srivastava, Akash, Pajarinen, Joni, Laroche, Romain, Gupta, Abhishek, and Agrawal, Pulkit
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl., Comment: Accepted NeurIPS 2023
Published: 2023

3. Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

Author: Zhao, Mingde, Alver, Safa, van Seijen, Harm, Laroche, Romain, Precup, Doina, and Bengio, Yoshua
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods., Comment: ICLR 2024 Camera-Ready
Published: 2023

4. Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Author: Hong, Zhang-Wei, Agrawal, Pulkit, Combes, Rémi Tachet des, and Laroche, Romain
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior policy and, thus, the trajectory return distribution of the dataset. We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset whose behavior policy has a higher return. This re-weighted sampling strategy may be combined with any offline RL algorithm. We further analyze that the opportunity for performance improvement over the behavior policy correlates with the positive-sided variance of the returns of the trajectories in the dataset. We empirically show that while CQL, IQL, and TD3+BC achieve only a part of this potential policy improvement, these same algorithms combined with our reweighted sampling strategy fully exploit the dataset. Furthermore, we empirically demonstrate that, despite its theoretical limitation, the approach may still be efficient in stochastic environments. The code is available at https://github.com/Improbable-AI/harness-offline-rl.
Published: 2023

5. Think Before You Act: Decision Transformers with Working Memory

Author: Kang, Jikun, Laroche, Romain, Yuan, Xingdi, Trischler, Adam, Liu, Xue, and Fu, Jie
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Inspired by this, we propose a working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in Atari games and Meta-World object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture., Comment: Accepted at ICML 2024
Published: 2023

6. Behavior Prior Representation learning for Offline Reinforcement Learning

Author: Zang, Hongyu, Li, Xin, Yu, Jie, Liu, Chen, Islam, Riashat, Combes, Remi Tachet Des, and Laroche, Romain
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at \url{https://github.com/bit1029public/offline_bpr}., Comment: ICLR 2023
Published: 2022

7. Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

Author: Islam, Riashat, Zang, Hongyu, Goyal, Anirudh, Lamb, Alex, Kawaguchi, Kenji, Li, Xin, Laroche, Romain, Bengio, Yoshua, and Combes, Remi Tachet Des
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives. How to \textit{specify} and \textit{ground} these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in the space of noisy and high-dimensional sensory inputs poses a challenge for training goal-conditioned agents, or even for generalization to novel goals. We propose to address this by learning factorial representations of goals and processing the resulting representation via a discretization bottleneck, for coarser goal specification, through an approach we call DGRL. We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups, by experimentally evaluating this method on tasks ranging from maze environments to complex robotic navigation and manipulation. Additionally, we prove a theorem lower-bounding the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive combinatorial structure., Comment: Neurips 2022
Published: 2022

8. Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication

Author: Karch, Tristan, Lemesle, Yoann, Laroche, Romain, Moulin-Frier, Clément, and Oudeyer, Pierre-Yves
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, given the delivered message. The utterances are drawing images produced using dynamical motor primitives combined with a sketching library. To tackle GREG we present CURVES: a multimodal contrastive deep learning mechanism that represents the energy (alignment) between named referents and utterances generated through gradient ascent on the learned energy landscape. We demonstrate that CURVES not only succeeds at solving the GREG but also enables agents to self-organize a language that generalizes to feature compositions never seen during training. In addition to evaluating the communication performance of our approach, we also explore the structure of the emerging language. Specifically, we show that the resulting language forms a coherent lexicon shared between agents and that basic compositional rules on the graphical productions could not explain the compositional generalization.
Published: 2022

9. Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

Author: Lu, Yuchen, Liu, Zhen, Baratin, Aristide, Laroche, Romain, Courville, Aaron, and Sordoni, Alessandro
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of expressiveness and learnability. We propose to use the Intrinsic Dimension (ID) to assess expressiveness and introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN classifier trained to predict labels obtained by clustering the representations with K-means. We thus combine CL and ID into a single predictor -- CLID. Through a large-scale empirical study with a diverse family of SSL algorithms, we find that CLID better correlates with in-distribution model performance than other competing recent evaluation schemes. We also benchmark CLID on out-of-domain generalization, where CLID serves as a predictor of the transfer performance of SSL models on several visual classification tasks, yielding improvements with respect to the competing baselines.
Published: 2022

10. Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

Author: Brandfonbrener, David, Combes, Remi Tachet des, and Laroche, Romain
Subjects: Computer Science - Machine Learning
Abstract: Most theoretically motivated work in the offline reinforcement learning setting requires precise uncertainty estimates. This requirement restricts the algorithms derived in that work to the tabular and linear settings where such estimates exist. In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces. We use recent innovations in uncertainty estimation from the deep learning community to get more scalable uncertainty estimates to plug into deep-SPIBB. While these uncertainty estimates do not allow for the same theoretical guarantees as in the tabular case, we argue that the SPIBB mechanism for incorporating uncertainty is more robust and flexible than pessimistic approaches that incorporate the uncertainty as a value function penalty. We bear this out empirically, showing that deep-SPIBB outperforms pessimism based approaches with access to the same uncertainty estimates and performs at least on par with a variety of other strong baselines across several environments and datasets.
Published: 2022

11. When does return-conditioned supervised learning work for offline reinforcement learning?

Author: Brandfonbrener, David, Bietti, Alberto, Buckman, Jacob, Laroche, Romain, and Bruna, Joan
Subjects: Computer Science - Machine Learning
Abstract: Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous study of the capabilities and limitations of RCSL, something which is crucially missing in previous work. We find that RCSL returns the optimal policy under a set of assumptions that are stronger than those needed for the more traditional dynamic programming-based algorithms. We provide specific examples of MDPs and datasets that illustrate the necessity of these assumptions and the limits of RCSL. Finally, we present empirical evidence that these limitations will also cause issues in practice by providing illustrative experiments in simple point-mass environments and on datasets from the D4RL benchmark.
Published: 2022

12. Non-Markovian policies occupancy measures

Author: Laroche, Romain, Combes, Remi Tachet des, and Buckman, Jacob
Subjects: Computer Science - Machine Learning, Electrical Engineering and Systems Science - Systems and Control
Abstract: A central object of study in Reinforcement Learning (RL) is the Markovian policy, in which an agent's actions are chosen from a memoryless probability distribution, conditioned only on its current state. The family of Markovian policies is broad enough to be interesting, yet simple enough to be amenable to analysis. However, RL often involves more complex policies: ensembles of policies, policies over options, policies updated online, etc. Our main contribution is to prove that the occupancy measure of any non-Markovian policy, i.e., the distribution of transition samples collected with it, can be equivalently generated by a Markovian policy. This result allows theorems about the Markovian policy class to be directly extended to its non-Markovian counterpart, greatly simplifying proofs, in particular those involving replay buffers and datasets. We provide various examples of such applications to the field of Reinforcement Learning., Comment: 9p+sup. mat
Published: 2022

13. One-Shot Learning from a Demonstration with Hierarchical Latent Language

Author: Weir, Nathaniel, Yuan, Xingdi, Côté, Marc-Alexandre, Hausknecht, Matthew, Laroche, Romain, Momennejad, Ida, Van Seijen, Harm, and Van Durme, Benjamin
Subjects: Computer Science - Computation and Language
Abstract: Humans have the capability, aided by the expressive compositionality of their language, to learn quickly by demonstration. They are able to describe unseen task-performing procedures and generalize their execution to other contexts. In this work, we introduce DescribeWorld, an environment designed to test this sort of generalization skill in grounded agents, where tasks are linguistically and procedurally composed of elementary concepts. The agent observes a single task demonstration in a Minecraft-like grid world, and is then asked to carry out the same task in a new map. To enable such a level of generalization, we propose a neural agent infused with hierarchical latent language--both at the level of task inference and subtask planning. Our agent first generates a textual description of the demonstrated unseen task, then leverages this description to replicate it. Through multiple evaluation scenarios and a suite of generalization tests, we find that agents that perform text-based inference are better equipped for the challenge under a random split of tasks.
Published: 2022

14. Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms

Author: Laroche, Romain and Tachet, Remi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: In Reinforcement Learning, the optimal action at a given state is dependent on policy decisions at subsequent states. As a consequence, the learning targets evolve with time and the policy optimization process must be efficient at unlearning what it previously learnt. In this paper, we discover that the policy gradient theorem prescribes policy updates that are slow to unlearn because of their structural symmetry with respect to the value target. To increase the unlearning speed, we study a novel policy update: the gradient of the cross-entropy loss with respect to the action maximizing $q$, but find that such updates may lead to a decrease in value. Consequently, we introduce a modified policy update devoid of that flaw, and prove its guarantees of convergence to global optimality in $\mathcal{O}(t^{-1})$ under classic assumptions. Further, we assess standard policy updates and our cross-entropy policy updates along six analytical dimensions. Finally, we empirically validate our theoretical findings., Comment: 9p+appendix, accepted to AISTATS 2022
Published: 2022

15. On the Convergence of SARSA with Linear Function Approximation

Author: Zhang, Shangtong, Tachet, Remi, and Laroche, Romain
Subjects: Computer Science - Machine Learning
Abstract: SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the region that we project into, provided that the magnitude of the reward is not too large. Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime., Comment: ICML 2023
Published: 2022

16. Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

Author: Zhang, Shangtong, Tachet, Remi, and Laroche, Romain
Subjects: Computer Science - Machine Learning
Abstract: In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy. Our work goes beyond existing works on the optimality of policy gradient methods in that existing works use the exact policy gradient for updating the policy parameters while we use an approximate and stochastic update step. Our update step is not a gradient update because we do not use a density ratio to correct the state distribution, which aligns well with what practitioners do. Our update is approximate because we use a learned critic instead of the true value function. Our update is stochastic because at each step the update is done for only the current state action pair. Moreover, we remove several restrictive assumptions from existing works in our analysis. Central to our work is the finite sample analysis of a generic stochastic approximation algorithm with time-inhomogeneous update operators on time-inhomogeneous Markov chains, based on its uniform contraction properties., Comment: Journal of Machine Learning Research 2022
Published: 2021

17. Batched Bandits with Crowd Externalities

Author: Laroche, Romain, Safsafi, Othmane, Feraud, Raphael, and Broutin, Nicolas
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy update is not controlled by the BMAB algorithm, but instead the amount of data received during each batch, called \textit{crowd}, is influenced by the past selection of arms. We first design a near-optimal policy with approximate knowledge of the parameters that we prove to have a regret in $\mathcal{O}(\sqrt{\frac{\ln x}{x}}+\epsilon)$ where $x$ is the size of the crowd and $\epsilon$ is the parameter error. Next, we implement a UCB-inspired algorithm that guarantees an additional regret in $\mathcal{O}\left(\max(K\ln T,\sqrt{T\ln T})\right)$, where $K$ is the number of arms and $T$ is the horizon., Comment: 31 pages
Published: 2021

18. Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

Author: Laroche, Romain and Tachet, Remi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: The policy gradient theorem states that the policy should only be updated in states that are visited by the current policy, which leads to insufficient planning in the off-policy states, and thus to convergence to suboptimal policies. We tackle this planning issue by extending the policy gradient theory to policy updates with respect to any state density. Under these generalized policy updates, we show convergence to optimality under a necessary and sufficient condition on the updates' state densities, and thereby solve the aforementioned planning issue. We also prove asymptotic convergence rates that significantly improve those in the policy gradient literature. To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (JH), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores. JH's independent policies allow to record two separate replay buffers: one on-policy (Dr Jekyll's) and one off-policy (Mr Hyde's), and therefore to update JH's models with a mixture of on-policy and off-policy updates. More than an algorithm, JH defines principles for actor-critic algorithms to satisfy the requirements we identify in our analysis. We extensively test on finite MDPs where JH demonstrates a superior ability to recover from converging to a suboptimal policy without impairing its speed of convergence. We also implement a deep version of the algorithm and test it on a simple problem where it shows promising results., Comment: accepted to NeurIPS as a poster
Published: 2021

19. The Emergence of the Shape Bias Results from Communicative Efficiency

Author: Portelance, Eva, Frank, Michael C., Jurafsky, Dan, Sordoni, Alessandro, and Laroche, Romain
Subjects: Computer Science - Computation and Language, Computer Science - Information Theory, Computer Science - Neural and Evolutionary Computing
Abstract: By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias. They are thought to learn this bias by observing that their caregiver's language is biased towards shape based categories. This presents a chicken and egg problem: if the shape bias must be present in the language in order for children to learn it, how did it arise in language in the first place? In this paper, we propose that communicative efficiency explains both how the shape bias emerged and why it persists across generations. We model this process with neural emergent language agents that learn to communicate about raw pixelated images. First, we show that the shape bias emerges as a result of efficient communication strategies employed by agents. Second, we show that pressure brought on by communicative need is also necessary for it to persist across generations; simply having a shape bias in an agent's input language is insufficient. These results suggest that, over and above the operation of other learning strategies, the shape bias in human learners may emerge and be sustained by communicative pressures., Comment: Accepted at CoNLL 2021
Published: 2021

20. Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

Author: Satija, Harsh, Thomas, Philip S., Pineau, Joelle, and Laroche, Romain
Subjects: Computer Science - Machine Learning
Abstract: We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals while ensuring that the new policy performs at least as well as the baseline policy along each individual objective. We build on traditional SPI algorithms and propose a novel method based on Safe Policy Iteration with Baseline Bootstrapping (SPIBB, Laroche et al., 2019) that provides high probability guarantees on the performance of the agent in the true environment. We show the effectiveness of our method on a synthetic grid-world safety task as well as in a real-world critical care context to learn a policy for the administration of IV fluids and vasopressors to treat sepsis.
Published: 2021

21. Massive multi-player multi-armed bandits for IoT networks: An application on LoRa networks

Author: Dakdouk, Hiba, Féraud, Raphaël, Varsier, Nadège, Maillé, Patrick, and Laroche, Romain
Published: 2023
Full Text: View/download PDF

22. A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

Author: Zhang, Shangtong, Laroche, Romain, van Seijen, Harm, Whiteson, Shimon, and Combes, Remi Tachet des
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $\gamma^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually ignore the discounting ($\gamma^t$) for the actor while using a discounted critic. We investigate this mismatch in two scenarios. In the first scenario, we consider optimizing an undiscounted objective $(\gamma = 1)$ where $\gamma^t$ disappears naturally $(1^t = 1)$. We then propose to interpret the discounting in critic in terms of a bias-variance-representation trade-off and provide supporting empirical results. In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results., Comment: AAMAS 2022
Published: 2020

23. Reinforcement Learning Framework for Deep Brain Stimulation Study

Author: Krylov, Dmitrii, Tachet, Remi, Laroche, Romain, Rosenblum, Michael, and Dylov, Dmitry V.
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Systems and Control
Abstract: Malfunctioning neurons in the brain sometimes operate synchronously, reportedly causing many neurological diseases, e.g. Parkinson's. Suppression and control of this collective synchronous activity are therefore of great importance for neuroscience, and can only rely on limited engineering trials due to the need to experiment with live human brains. We present the first Reinforcement Learning gym framework that emulates this collective behavior of neurons and allows us to find suppression parameters for the environment of synthetic degenerate models of neurons. We successfully suppress synchrony via RL for three pathological signaling regimes, characterize the framework's stability to noise, and further remove the unwanted oscillations by engaging multiple PPO agents., Comment: 7 pages + 1 references, 7 figures. arXiv admin note: text overlap with arXiv:1909.12154
Published: 2020
Full Text: View/download PDF

24. Learning Dynamic Belief Graphs to Generalize on Text-Based Games

Author: Adhikari, Ashutosh, Yuan, Xingdi, Côté, Marc-Alexandre, Zelinka, Mikuláš, Rondeau, Marc-Antoine, Laroche, Romain, Poupart, Pascal, Tang, Jian, Trischler, Adam, and Hamilton, William L.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Playing text-based games requires skills in processing natural language and sequential decision making. Achieving human-level performance on text-based games remains an open challenge, and prior research has largely relied on hand-crafted structured representations and heuristics. In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text. We propose a novel graph-aided transformer agent (GATA) that infers and updates latent belief graphs during planning to enable effective action selection by capturing the underlying game dynamics. GATA is trained using a combination of reinforcement and self-supervised learning. Our work demonstrates that the learned graph-based representations help agents converge to better policies than their text-only counterparts and facilitate effective generalization across game configurations. Experiments on 500+ unique games from the TextWorld suite show that our best agent outperforms text-based baselines by an average of 24.2%., Comment: Bug fixed in Table 1
Published: 2020

25. Building Dynamic Knowledge Graphs from Text-based Games

Author: Zelinka, Mikuláš, Yuan, Xingdi, Côté, Marc-Alexandre, Laroche, Romain, and Trischler, Adam
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We are interested in learning how to update Knowledge Graphs (KG) from text. In this preliminary work, we propose a novel Sequence-to-Sequence (Seq2Seq) architecture to generate elementary KG operations. Furthermore, we introduce a new dataset for KG extraction built upon text-based game transitions (over 300k data points). We conduct experiments and discuss the results., Comment: NeurIPS 2019, Graph Representation Learning (GRL) Workshop
Published: 2019

26. Safe Policy Improvement with an Estimated Baseline Policy

Author: Simão, Thiago D., Laroche, Romain, and Combes, Rémi Tachet des
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance. However, in many real-world applications such as dialogue systems, pharmaceutical tests or crop management, data is collected under human supervision and the baseline remains unknown. In this paper, we apply SPIBB algorithms with a baseline estimate built from the data. We formally show safe policy improvement guarantees over the true baseline even without direct access to it. Our empirical experiments on finite and continuous states tasks support the theoretical findings. It shows little loss of performance in comparison with SPIBB when the baseline policy is given, and more importantly, drastically and significantly outperforms competing algorithms both in safe policy improvement, and in average performance., Comment: Published at AAMAS 2020
Published: 2019

27. Safe Policy Improvement with Soft Baseline Bootstrapping

Author: Nadjahi, Kimia, Laroche, Romain, and Combes, Rémi Tachet des
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance as compared to using the basic RL objective, which boils down to solving the MDP with maximum likelihood. Here, we build on that work and improve more precisely the SPI with Baseline Bootstrapping algorithm (SPIBB) by allowing the policy search over a wider set of policies. Instead of binarily classifying the state-action pairs into two sets (the \textit{uncertain} and the \textit{safe-to-train-on} ones), we adopt a softer strategy that controls the error in the value estimates by constraining the policy change according to the local model uncertainty. The method can take more risks on uncertain actions all the while remaining provably-safe, and is therefore less conservative than the state-of-the-art methods. We propose two algorithms (one optimal and one approximate) to solve this constrained optimization problem and empirically show a significant improvement over existing SPI algorithms both on finite MDPs and on infinite MDPs with a neural network function approximation., Comment: Accepted paper at ECML-PKDD2019
Published: 2019

28. Budgeted Reinforcement Learning in Continuous State Space

Author: Carrara, Nicolas, Leurent, Edouard, Laroche, Romain, Urvoy, Tanguy, Maillard, Odalric-Ambrym, and Pietquin, Olivier
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving., Comment: N. Carrara and E. Leurent have equally contributed
Published: 2019

29. Decentralized Exploration in Multi-Armed Bandits -- Extended version

Author: Féraud, Raphaël, Alami, Réda, and Laroche, Romain
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment. The objective is to insure privacy in the best arm identification problem between asynchronous, collaborative, and thrifty players. In the context of a digital service, we advocate that this decentralized approach allows a good balance between the interests of users and those of service providers: the providers optimize their services, while protecting the privacy of the users and saving resources. We define the privacy level as the amount of information an adversary could infer by intercepting the messages concerning a single user. We provide a generic algorithm Decentralized Elimination, which uses any best arm identification algorithm as a subroutine. We prove that this algorithm insures privacy, with a low communication cost, and that in comparison to the lower bound of the best arm identification problem, its sample complexity suffers from a penalty depending on the inverse of the probability of the most frequent players. Then, thanks to the genericity of the approach, we extend the proposed algorithm to the non-stationary bandits. Finally, experiments illustrate and complete the analysis.
Published: 2018

30. Counting to Explore and Generalize in Text-based Games

Author: Yuan, Xingdi, Côté, Marc-Alexandre, Sordoni, Alessandro, Laroche, Romain, Combes, Remi Tachet des, Hausknecht, Matthew, and Trischler, Adam
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments. We show promising results on a set of generated text-based games of varying difficulty where the goal is to collect a coin located at the end of a chain of rooms. In contrast to previous text-based RL approaches, we observe that our agent learns policies that generalize to unseen games of greater difficulty.
Published: 2018

31. Safe Policy Improvement with Baseline Bootstrapping

Author: Laroche, Romain, Trichelair, Paul, and Combes, Rémi Tachet des
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guaranteed to perform at least as well as the baseline policy used to collect the data. Our approach, called SPI with Baseline Bootstrapping (SPIBB), is inspired by the knows-what-it-knows paradigm: it bootstraps the trained policy with the baseline when the uncertainty is high. Our first algorithm, $\Pi_b$-SPIBB, comes with SPI theoretical guarantees. We also implement a variant, $\Pi_{\leq b}$-SPIBB, that is even more efficient in practice. We apply our algorithms to a motivational stochastic gridworld domain and further demonstrate on randomly generated MDPs the superiority of SPIBB with respect to existing algorithms, not only in safety but also in mean performance. Finally, we implement a model-free version of SPIBB and show its benefits on a navigation task with deep RL implementation called SPIBB-DQN, which is, to the best of our knowledge, the first RL algorithm relying on a neural network representation able to train efficiently and reliably from batch data, without any interaction with the environment., Comment: accepted as a long oral at ICML2019
Published: 2017

32. The Complex Negotiation Dialogue Game

Author: Laroche, Romain
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: This position paper formalises an abstract model for complex negotiation dialogue. This model is to be used for the benchmark of optimisation algorithms ranging from Reinforcement Learning to Stochastic Games, through Transfer Learning, One-Shot Learning or others., Comment: Position paper for Sigdial/Semdial 2017 special session on negotiation dialogue
Published: 2017

33. Hybrid Reward Architecture for Reinforcement Learning

Author: van Seijen, Harm, Fatemi, Mehdi, Romoff, Joshua, Laroche, Romain, Barnes, Tavian, and Tsang, Jeffrey
Subjects: Computer Science - Learning
Abstract: One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.
Published: 2017

34. Multi-Advisor Reinforcement Learning

Author: Laroche, Romain, Fatemi, Mehdi, Romoff, Joshua, and van Seijen, Harm
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We consider tackling a single-agent RL problem by distributing it to $n$ learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is flawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefficient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical findings on a fruit collection task., Comment: Submitted at ICLR2018
Published: 2017

35. Reinforcement Learning Algorithm Selection

Author: Laroche, Romain and Feraud, Raphael
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Learning, Mathematics - Optimization and Control
Abstract: This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically evaluated on a dialogue task where it is shown to outperform each individual algorithm in most configurations. ESBAS is then adapted to a true online setting where algorithms update their policies after each transition, which we call SSBAS. SSBAS is evaluated on a fruit collection task where it is shown to adapt the stepsize parameter more efficiently than the classical hyperbolic decay, and on an Atari game, where it improves the performance by a wide margin.
Published: 2017

36. Separation of Concerns in Reinforcement Learning

Author: van Seijen, Harm, Fatemi, Mehdi, Romoff, Joshua, and Laroche, Romain
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence
Abstract: In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task. This approach has two main advantages: 1) it allows for training specialized agents on different parts of the task, and 2) it provides a new way to transfer knowledge, by transferring trained agents. Our framework generalizes the traditional hierarchical decomposition, in which, at any moment in time, a single agent has control until it has solved its particular subtask. We illustrate our framework with empirical experiments on two domains.
Published: 2016

37. Safe Policy Improvement with Soft Baseline Bootstrapping

Author: Nadjahi, Kimia, Laroche, Romain, Tachet des Combes, Rémi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Brefeld, Ulf, editor, Fromont, Elisa, editor, Hotho, Andreas, editor, Knobbe, Arno, editor, Maathuis, Marloes, editor, and Robardet, Céline, editor
Published: 2020
Full Text: View/download PDF

38. Safe Policy Improvement with Soft Baseline Bootstrapping

Author: Nadjahi, Kimia, primary, Laroche, Romain, additional, and Tachet des Combes, Rémi, additional
Published: 2020
Full Text: View/download PDF

39. The Negotiation Dialogue Game

Author: Laroche, Romain, Genevay, Aude, Jokinen, Kristiina, editor, and Wilcock, Graham, editor
Published: 2017
Full Text: View/download PDF

40. Incremental Human-Machine Dialogue Simulation

Author: Khouzaimi, Hatim, Laroche, Romain, Lefèvre, Fabrice, Jokinen, Kristiina, editor, and Wilcock, Graham, editor
Published: 2017
Full Text: View/download PDF

41. Compact and Interpretable Dialogue State Representation with Genetic Sparse Distributed Memory

Author: Asri, Layla El, Laroche, Romain, Pietquin, Olivier, Jokinen, Kristiina, editor, and Wilcock, Graham, editor
Published: 2017
Full Text: View/download PDF

42. A methodology for turn-taking capabilities enhancement in Spoken Dialogue Systems using Reinforcement Learning

Author: Khouzaimi, Hatim, Laroche, Romain, and Lefèvre, Fabrice
Published: 2018
Full Text: View/download PDF

43. Dialogue Efficiency Evaluation of Turn-Taking Phenomena in a Multi-layer Incremental Simulated Environment

Author: Khouzaimi, Hatim, Laroche, Romain, Lefèvre, Fabrice, and Stephanidis, Constantine, editor
Published: 2015
Full Text: View/download PDF

44. Think Before You Act: Decision Transformers with Internal Working Memory

Author: Kang, Jikun, Laroche, Romain, Yuan, Xindi, Trischler, Adam, Liu, Xue, and Fu, Jie
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.
Published: 2023
Full Text: View/download PDF

45. Batched Bandits with Crowd Externalities

Author: Laroche, Romain, Safsafi, Othmane, Feraud, Raphael, Broutin, Nicolas, and Broutin, Nicolas
Subjects: Computer Science::Machine Learning, FOS: Computer and information sciences, Computer Science - Machine Learning, [MATH.MATH-PR] Mathematics [math]/Probability [math.PR], Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy update is not controlled by the BMAB algorithm, but instead the amount of data received during each batch, called \textit{crowd}, is influenced by the past selection of arms. We first design a near-optimal policy with approximate knowledge of the parameters that we prove to have a regret in $\mathcal{O}(\sqrt{\frac{\ln x}{x}}+\epsilon)$ where $x$ is the size of the crowd and $\epsilon$ is the parameter error. Next, we implement a UCB-inspired algorithm that guarantees an additional regret in $\mathcal{O}\left(\max(K\ln T,\sqrt{T\ln T})\right)$, where $K$ is the number of arms and $T$ is the horizon., Comment: 31 pages
Published: 2023

46. Emergence of Shared Sensory-motor Graphical Language from Visual Input

Author: Karch, Tristan, Lemesle, Yoann, Laroche, Romain, Moulin-Frier, Clément, Oudeyer, Pierre-Yves, Flowing Epigenetic Robots and Systems (Flowers), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Microsoft Research
Subjects: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: The framework of Language Games studies the emergence of languages in populations of agents. Recent contributions relying on deep learning methods focused on agents communicating via an idealized communication channel, where utterances produced by a speaker are directly perceived by a listener. This comes in contrast with human communication, which instead relies on a sensory-motor channel, where motor commands produced by the speaker (e.g. vocal or gestural articulators) result in sensory effects perceived by the listener (e.g. audio or visual). Here, we investigate if agents can evolve a shared language when they are equipped with a continuous sensory-motor system to produce and perceive signs, e.g. drawings. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object consisting of combinations of MNIST digits while a listener has to select the corresponding object among distractor referents, given the produced message. The utterances are drawing images produced using dynamical motor primitives combined with a sketching library. To tackle GREG we present CURVES: a multimodal contrastive deep learning mechanism that represents the energy (alignment) between named referents and utterances generated through gradient ascent on the learned energy landscape. We, then, present a set of experiments and metrics based on a systematic compositional dataset to evaluate the resulting language. We show that our method allows the emergence of a shared, graphical language with compositional properties.
Published: 2022

47. Contextual Bandit for Active Learning: Active Thompson Sampling

Author: Bouneffouf, Djallel, Laroche, Romain, Urvoy, Tanguy, Feraud, Raphael, Allesiardo, Robin, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Loo, Chu Kiong, editor, Yap, Keem Siah, editor, Wong, Kok Wai, editor, Teoh, Andrew, editor, and Huang, Kaizhu, editor
Published: 2014
Full Text: View/download PDF

48. Massive Multi-Player Multi-Armed Bandits for IoT Networks: An Application on LoRa Networks

Author: Dakdouk, Hiba, Feraud, Raphaël, Varsier, Nadège, Maillé, Patrick, Laroche, Romain, Orange Labs, Département Systèmes Réseaux, Cybersécurité et Droit du numérique (IMT Atlantique - SRCD), IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Dependability Interoperability and perfOrmance aNalYsiS Of networkS (DIONYSOS), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES (IRISA-D2), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Microsoft Research, Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes 1 (UR1), and Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique)
Subjects: [INFO]Computer Science [cs]
Abstract: More and more manufacturers, as part of the transition toward Industry 4.0, are using Internet of Things (IoT) networks for more efficient production. The wide and extensive expansion of IoT devices and the variety of applications generate different challenges, mainly in terms of reliability and energy efficiency. In this paper, we propose an approach to optimize the performance of IoT networks by making the IoT devices intelligent using machine learning techniques. We formulate the optimization problem as a massive multi-player multi-armed bandit and introduce two novel policies: Decreasing-Order-Reward-Greedy (DORG) focuses on the number of successful transmissions, while Decreasing-Order-Fair-Greedy (DOFG) also guarantees some measure of fairness between the devices. We then present an efficient way to manage the trade-off between energy consumption and packet losses in Long-Range (LoRa) networks using our algorithms, by which LoRa nodes adjust their emission parameters (Spreading Factor and transmitting power). We implement our algorithms on a LoRa network simulator and show that such learning techniques largely outperform the Adaptive Data Rate (ADR) algorithm currently implemented in LoRa devices, in terms of both energy consumption and packet losses.
Published: 2022

49. Reward Shaping for Statistical Optimisation of Dialogue Management

Author: El Asri, Layla, Laroche, Romain, Pietquin, Olivier, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Dediu, Adrian-Horia, editor, Martín-Vide, Carlos, editor, Mitkov, Ruslan, editor, and Truthe, Bianca, editor
Published: 2013
Full Text: View/download PDF

50. Clinical interest of quantitative bone SPECT-CT in the preoperative assessment of knee osteoarthritis

Author: De Laroche, Romain, Simon, Erwan, Suignard, Nicolas, Williams, Thomas, Henry, Marc-Pierre, Robin, Philippe, Abgral, Ronan, Bourhis, David, Salaun, Pierre-Yves, Dubrana, Frédéric, and Querellou, Solène
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

131 results on '"Laroche, Romain"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources