40 results on '"Laroche, Romain"'
Search Results
2. A methodology for turn-taking capabilities enhancement in Spoken Dialogue Systems using Reinforcement Learning
- Author
-
Khouzaimi, Hatim, Laroche, Romain, and Lefèvre, Fabrice
- Published
- 2018
- Full Text
- View/download PDF
3. Batched Bandits with Crowd Externalities
- Author
-
Laroche, Romain, Safsafi, Othmane, Feraud, Raphael, Broutin, Nicolas, and Broutin, Nicolas
- Subjects
Computer Science::Machine Learning ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,[MATH.MATH-PR] Mathematics [math]/Probability [math.PR] ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Machine Learning (cs.LG) - Abstract
In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy update is not controlled by the BMAB algorithm, but instead the amount of data received during each batch, called \textit{crowd}, is influenced by the past selection of arms. We first design a near-optimal policy with approximate knowledge of the parameters that we prove to have a regret in $\mathcal{O}(\sqrt{\frac{\ln x}{x}}+\epsilon)$ where $x$ is the size of the crowd and $\epsilon$ is the parameter error. Next, we implement a UCB-inspired algorithm that guarantees an additional regret in $\mathcal{O}\left(\max(K\ln T,\sqrt{T\ln T})\right)$, where $K$ is the number of arms and $T$ is the horizon., Comment: 31 pages
- Published
- 2023
4. Emergence of Shared Sensory-motor Graphical Language from Visual Input
- Author
-
Karch, Tristan, Lemesle, Yoann, Laroche, Romain, Moulin-Frier, Clément, Oudeyer, Pierre-Yves, Flowing Epigenetic Robots and Systems (Flowers), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Microsoft Research
- Subjects
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] - Abstract
The framework of Language Games studies the emergence of languages in populations of agents. Recent contributions relying on deep learning methods focused on agents communicating via an idealized communication channel, where utterances produced by a speaker are directly perceived by a listener. This comes in contrast with human communication, which instead relies on a sensory-motor channel, where motor commands produced by the speaker (e.g. vocal or gestural articulators) result in sensory effects perceived by the listener (e.g. audio or visual). Here, we investigate if agents can evolve a shared language when they are equipped with a continuous sensory-motor system to produce and perceive signs, e.g. drawings. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object consisting of combinations of MNIST digits while a listener has to select the corresponding object among distractor referents, given the produced message. The utterances are drawing images produced using dynamical motor primitives combined with a sketching library. To tackle GREG we present CURVES: a multimodal contrastive deep learning mechanism that represents the energy (alignment) between named referents and utterances generated through gradient ascent on the learned energy landscape. We, then, present a set of experiments and metrics based on a systematic compositional dataset to evaluate the resulting language. We show that our method allows the emergence of a shared, graphical language with compositional properties.
- Published
- 2022
5. Massive Multi-Player Multi-Armed Bandits for IoT Networks: An Application on LoRa Networks
- Author
-
Dakdouk, Hiba, Feraud, Raphaël, Varsier, Nadège, Maillé, Patrick, Laroche, Romain, Orange Labs, Département Systèmes Réseaux, Cybersécurité et Droit du numérique (IMT Atlantique - SRCD), IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Dependability Interoperability and perfOrmance aNalYsiS Of networkS (DIONYSOS), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES (IRISA-D2), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Microsoft Research, Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes 1 (UR1), and Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique)
- Subjects
[INFO]Computer Science [cs] - Abstract
More and more manufacturers, as part of the transition toward Industry 4.0, are using Internet of Things (IoT) networks for more efficient production. The wide and extensive expansion of IoT devices and the variety of applications generate different challenges, mainly in terms of reliability and energy efficiency. In this paper, we propose an approach to optimize the performance of IoT networks by making the IoT devices intelligent using machine learning techniques. We formulate the optimization problem as a massive multi-player multi-armed bandit and introduce two novel policies: Decreasing-Order-Reward-Greedy (DORG) focuses on the number of successful transmissions, while Decreasing-Order-Fair-Greedy (DOFG) also guarantees some measure of fairness between the devices. We then present an efficient way to manage the trade-off between energy consumption and packet losses in Long-Range (LoRa) networks using our algorithms, by which LoRa nodes adjust their emission parameters (Spreading Factor and transmitting power). We implement our algorithms on a LoRa network simulator and show that such learning techniques largely outperform the Adaptive Data Rate (ADR) algorithm currently implemented in LoRa devices, in terms of both energy consumption and packet losses.
- Published
- 2022
6. Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication
- Author
-
Karch, Tristan, Lemesle, Yoann, Laroche, Romain, Moulin-Frier, Clément, and Oudeyer, Pierre-Yves
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, given the delivered message. The utterances are drawing images produced using dynamical motor primitives combined with a sketching library. To tackle GREG we present CURVES: a multimodal contrastive deep learning mechanism that represents the energy (alignment) between named referents and utterances generated through gradient ascent on the learned energy landscape. We demonstrate that CURVES not only succeeds at solving the GREG but also enables agents to self-organize a language that generalizes to feature compositions never seen during training. In addition to evaluating the communication performance of our approach, we also explore the structure of the emerging language. Specifically, we show that the resulting language forms a coherent lexicon shared between agents and that basic compositional rules on the graphical productions could not explain the compositional generalization.
- Published
- 2022
7. Clinical interest of quantitative bone SPECT-CT in the preoperative assessment of knee osteoarthritis
- Author
-
De Laroche, Romain, Simon, Erwan, Suignard, Nicolas, Williams, Thomas, Henry, Marc-Pierre, Robin, Philippe, Abgral, Ronan, Bourhis, David, Salaun, Pierre-Yves, Dubrana, Frédéric, and Querellou, Solène
- Published
- 2018
- Full Text
- View/download PDF
8. Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning
- Author
-
Brandfonbrener, David, Combes, Remi Tachet des, and Laroche, Romain
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
Most theoretically motivated work in the offline reinforcement learning setting requires precise uncertainty estimates. This requirement restricts the algorithms derived in that work to the tabular and linear settings where such estimates exist. In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces. We use recent innovations in uncertainty estimation from the deep learning community to get more scalable uncertainty estimates to plug into deep-SPIBB. While these uncertainty estimates do not allow for the same theoretical guarantees as in the tabular case, we argue that the SPIBB mechanism for incorporating uncertainty is more robust and flexible than pessimistic approaches that incorporate the uncertainty as a value function penalty. We bear this out empirically, showing that deep-SPIBB outperforms pessimism based approaches with access to the same uncertainty estimates and performs at least on par with a variety of other strong baselines across several environments and datasets.
- Published
- 2022
9. When does return-conditioned supervised learning work for offline reinforcement learning?
- Author
-
Brandfonbrener, David, Bietti, Alberto, Buckman, Jacob, Laroche, Romain, and Bruna, Joan
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous study of the capabilities and limitations of RCSL, something which is crucially missing in previous work. We find that RCSL returns the optimal policy under a set of assumptions that are stronger than those needed for the more traditional dynamic programming-based algorithms. We provide specific examples of MDPs and datasets that illustrate the necessity of these assumptions and the limits of RCSL. Finally, we present empirical evidence that these limitations will also cause issues in practice by providing illustrative experiments in simple point-mass environments and on datasets from the D4RL benchmark.
- Published
- 2022
10. One-Shot Learning from a Demonstration with Hierarchical Latent Language
- Author
-
Weir, Nathaniel, Yuan, Xingdi, C��t��, Marc-Alexandre, Hausknecht, Matthew, Laroche, Romain, Momennejad, Ida, Van Seijen, Harm, and Van Durme, Benjamin
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computation and Language (cs.CL) - Abstract
Humans have the capability, aided by the expressive compositionality of their language, to learn quickly by demonstration. They are able to describe unseen task-performing procedures and generalize their execution to other contexts. In this work, we introduce DescribeWorld, an environment designed to test this sort of generalization skill in grounded agents, where tasks are linguistically and procedurally composed of elementary concepts. The agent observes a single task demonstration in a Minecraft-like grid world, and is then asked to carry out the same task in a new map. To enable such a level of generalization, we propose a neural agent infused with hierarchical latent language--both at the level of task inference and subtask planning. Our agent first generates a textual description of the demonstrated unseen task, then leverages this description to replicate it. Through multiple evaluation scenarios and a suite of generalization tests, we find that agents that perform text-based inference are better equipped for the challenge under a random split of tasks.
- Published
- 2022
11. On the Convergence of SARSA with Linear Function Approximation
- Author
-
Zhang, Shangtong, Tachet, Remi, and Laroche, Romain
- Subjects
FOS: Computer and information sciences ,Computer Science::Machine Learning ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region. However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. Importantly, the region is much smaller than the region that we project into, provided that the magnitude of the reward is not too large. Existing works regarding the convergence of linear SARSA to a fixed point all require the Lipschitz constant of SARSA's policy improvement operator to be sufficiently small; our analysis instead applies to arbitrary Lipschitz constants and thus characterizes the behavior of linear SARSA for a new regime., ICML 2023
- Published
- 2022
12. Collaborative Exploration and Exploitation in massively Multi-Player Bandits
- Author
-
Dakdouk, Hiba, Féraud, Raphaël, Laroche, Romain, Varsier, Nadège, Maillé, Patrick, Orange Labs, Microsoft Research, Département Systèmes Réseaux, Cybersécurité et Droit du numérique (IMT Atlantique - SRCD), IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Dependability Interoperability and perfOrmance aNalYsiS Of networkS (DIONYSOS), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES (IRISA-D2), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES (IRISA-D2), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes 1 (UR1), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Inria Rennes – Bretagne Atlantique, and Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
[INFO]Computer Science [cs] - Abstract
In this paper, we propose an approach to optimize the performance of Internet of Things (IoT) networks. We formulate the optimization problem as a massive multi-player multi-armed bandit problem, where the devices are the players and the radio channels are the arms, with collisions possibly preventing message reception. For handling a realistic IoT network, we do not assume that sensing information is available (i.e. that the collision are observed) or that the number of players is smaller than the number of arms. As the optimization problem is intractable, we propose two greedy policies: the first one focusing on the number of successful communications, while the second one also takes into account fairness between players. In order to implement an approximation of the targeted policies, we propose an explore-then-exploit approach, and establish a regret lower bound in Ω T 2/3 log T N + K 3/2. For estimating the mean reward of arms, we propose a decentralized exploration algorithm with controlled information exchanges between players. Then we state that the regret of the estimated target policy is optimal with respect to the time horizon T. Finally, we provide some experimental evidences that the proposed algorithms outperform several baselines.
- Published
- 2021
13. A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms
- Author
-
Zhang, Shangtong, Laroche, Romain, van Seijen, Harm, Whiteson, Shimon, and Combes, Remi Tachet des
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,ComputerApplications_GENERAL ,Machine Learning (cs.LG) - Abstract
We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $\gamma^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually ignore the discounting ($\gamma^t$) for the actor while using a discounted critic. We investigate this mismatch in two scenarios. In the first scenario, we consider optimizing an undiscounted objective $(\gamma = 1)$ where $\gamma^t$ disappears naturally $(1^t = 1)$. We then propose to interpret the discounting in critic in terms of a bias-variance-representation trade-off and provide supporting empirical results. In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results., Comment: AAMAS 2022
- Published
- 2020
14. Budgeted Reinforcement Learning in Continuous State Space
- Author
-
Carrara, Nicolas, Leurent, Edouard, Laroche, Romain, Urvoy, Tanguy, Maillard, Odalric-Ambrym, Pietquin, Olivier, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), RENAULT, Microsoft Research Canada, Microsoft Research, Canada, Orange Labs [Lannion], France Télécom, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), and ANR-16-CE40-0002,BADASS,BANDITS MANCHOTS POUR SIGNAUX NON-STATIONNAIRES ET STRUCTURES(2016)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving., N. Carrara and E. Leurent have equally contributed
- Published
- 2019
15. Safe Policy Improvement with Soft Baseline Bootstrapping
- Author
-
Nadjahi, Kimia, Laroche, Romain, and Combes, R��mi Tachet des
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance as compared to using the basic RL objective, which boils down to solving the MDP with maximum likelihood. Here, we build on that work and improve more precisely the SPI with Baseline Bootstrapping algorithm (SPIBB) by allowing the policy search over a wider set of policies. Instead of binarily classifying the state-action pairs into two sets (the \textit{uncertain} and the \textit{safe-to-train-on} ones), we adopt a softer strategy that controls the error in the value estimates by constraining the policy change according to the local model uncertainty. The method can take more risks on uncertain actions all the while remaining provably-safe, and is therefore less conservative than the state-of-the-art methods. We propose two algorithms (one optimal and one approximate) to solve this constrained optimization problem and empirically show a significant improvement over existing SPI algorithms both on finite MDPs and on infinite MDPs with a neural network function approximation., Accepted paper at ECML-PKDD2019
- Published
- 2019
16. Safe transfer learning for dialogue applications
- Author
-
Carrara, Nicolas, Laroche, Romain, Bouraoui, Jean-Léon, Urvoy, Tanguy, Pietquin, Olivier, Orange Labs [Lannion], France Télécom, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Maluuba, Google Inc, and Research at Google
- Subjects
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[INFO]Computer Science [cs] ,Transfer Learning ,Dialogue ,Safety ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] - Abstract
International audience; In this paper, we formulate the hypothesis that the first dialogues with a new user should be handle in a very conservative way, for two reasons : avoid user dropout; gather more successful dialogues to speedup the learning of the asymptotic strategy. To this extend, we propose to transfer a safe strategy to initiate the first dialogues.
- Published
- 2018
17. A Fitted-Q Algorithm for Budgeted MDPs
- Author
-
Carrara, Nicolas, Laroche, Romain, Bouraoui, Jean-Léon, Urvoy, Tanguy, Pietquin, Olivier, Orange Labs [Lannion], France Télécom, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Maluuba, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), and Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Fitted-Q ,Reinforcement Learning ,Budgeted-MDP ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] - Abstract
International audience; We address the problem of budgeted reinforcement learning, in continuous state-space, using a batch of transitions. To this extend, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). Benchmarks show that BFTQ performs as well as a regular Fitted-Q algorithm in a continuous 2-D world but also allows one to choose the right amount of budget that fits to a given task without the need of engineering the rewards. We believe that the general principles used to design BFTQ can be applied to extend others classical reinforcement learning algorithms for budgeted oriented applications.
- Published
- 2018
18. Training Dialogue Systems With Human Advice
- Author
-
Barlier, Merwan, Laroche, Romain, Pietquin, Olivier, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Microsoft Research, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), and European Project: 687831,H2020,H2020-ICT-2015,BabyRobot(2016)
- Subjects
Human-robot/agent interaction ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,communication ,observation) ,Learning agent capabilities (agent models ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] - Abstract
International audience; One major drawback of Reinforcement Learning (RL) Spoken Dialogue Systems is that they inherit from the general explorationrequirements of RL which makes them hard to deploy from an industry perspective. On the other hand, industrial systems rely onhuman expertise and hand written rules so as to avoid irrelevant behavior to happen and maintain acceptable experience from theuser point of view. In this paper, we attempt to bridge the gap between those two worlds by providing an easy way to incorporate allkinds of human expertise in the training phase of a Reinforcement Learning Dialogue System. Our approach, based on the TAMERframework, enables safe and efficient policy learning by combining the traditional Reinforcement Learning reward signal with anadditional reward, encoding expert advice. Experimental results show that our method leads to substantial improvements over moretraditional Reinforcement Learning methods.
- Published
- 2018
19. Counting to Explore and Generalize in Text-based Games
- Author
-
Yuan, Xingdi, C��t��, Marc-Alexandre, Sordoni, Alessandro, Laroche, Romain, Combes, Remi Tachet des, Hausknecht, Matthew, and Trischler, Adam
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments. We show promising results on a set of generated text-based games of varying difficulty where the goal is to collect a coin located at the end of a chain of rooms. In contrast to previous text-based RL approaches, we observe that our agent learns policies that generalize to unseen games of greater difficulty.
- Published
- 2018
20. Safe Policy Improvement with Baseline Bootstrapping
- Author
-
Laroche, Romain, Trichelair, Paul, and Combes, R��mi Tachet des
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guaranteed to perform at least as well as the baseline policy used to collect the data. Our approach, called SPI with Baseline Bootstrapping (SPIBB), is inspired by the knows-what-it-knows paradigm: it bootstraps the trained policy with the baseline when the uncertainty is high. Our first algorithm, $\Pi_b$-SPIBB, comes with SPI theoretical guarantees. We also implement a variant, $\Pi_{\leq b}$-SPIBB, that is even more efficient in practice. We apply our algorithms to a motivational stochastic gridworld domain and further demonstrate on randomly generated MDPs the superiority of SPIBB with respect to existing algorithms, not only in safety but also in mean performance. Finally, we implement a model-free version of SPIBB and show its benefits on a navigation task with deep RL implementation called SPIBB-DQN, which is, to the best of our knowledge, the first RL algorithm relying on a neural network representation able to train efficiently and reliably from batch data, without any interaction with the environment., Comment: accepted as a long oral at ICML2019
- Published
- 2017
21. Transfer Reinforcement Learning with Shared Dynamics
- Author
-
Laroche, Romain, Barlier, Merwan, Orange Labs [Issy les Moulineaux], France Télécom, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), and Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,General Medicine ,ComputingMilieux_MISCELLANEOUS ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] - Abstract
This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do not change from one task to another, and only the reward function does. Our method relies on two ideas, the first one is that transition samples obtained from a task can be reused to learn on any other task: an immediate reward estimator is learnt in a supervised fashion and for each sample, the reward entry is changed by its reward estimate. The second idea consists in adopting the optimism in the face of uncertainty principle and to use upper bound reward estimates. Our method is tested on a navigation task, under four Transfer RL experimental settings: with a known reward function, with strong and weak expert knowledge on the reward function, and with a completely unknown reward function. It is also evaluated in a Multi-Task RL experiment and compared with the state-of-the-art algorithms. Results reveal that this method constitutes a major improvement for transfer/multi-task problems that share dynamics.
- Published
- 2017
22. Score-based Inverse Reinforcement Learning
- Author
-
El Asri, Layla, Piot, Bilal, Geist, Matthieu, Laroche, Romain, Pietquin, Olivier, Orange Labs [Issy les Moulineaux], France Télécom, Georgia Tech Lorraine [Metz], Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-Georgia Institute of Technology [Atlanta]-CentraleSupélec-Ecole Nationale Supérieure des Arts et Metiers Metz-Centre National de la Recherche Scientifique (CNRS), Université de Lille, Sciences et Technologies, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), MAchine Learning and Interactive Systems (MALIS), SUPELEC-Campus Metz, Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-CentraleSupélec, Institut Universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), Ecole Nationale Supérieure des Arts et Metiers Metz-Georgia Institute of Technology [Atlanta]-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), and Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)
- Subjects
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Markov Decision Processes ,Learning from Demonstration ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,Reinforcement Learning ,Inverse Reinforcement Learning ,Spoken Dialogue Systems - Abstract
International audience; This paper reports theoretical and empirical results obtained for the score-based Inverse Reinforcement Learning (IRL) algorithm. It relies on a non-standard setting for IRL consisting of learning a reward from a set of globally scored trajec-tories. This allows using any type of policy (optimal or not) to generate trajectories without prior knowledge during data collection. This way, any existing database (like logs of systems in use) can be scored a posteriori by an expert and used to learn a reward function. Thanks to this reward function, it is shown that a near-optimal policy can be computed. Being related to least-square regression, the algorithm (called SBIRL) comes with theoretical guarantees that are proven in this paper. SBIRL is compared to standard IRL algorithms on synthetic data showing that annotations do help under conditions on the quality of the trajectories. It is also shown to be suitable for real-world applications such as the optimisation of a spoken dialogue system.
- Published
- 2016
23. Compact and Interpretable Dialogue State Representation with Genetic Sparse Distributed Memory
- Author
-
El Asri, Layla, Laroche, Romain, Pietquin, Olivier, Orange Labs [Issy les Moulineaux], France Télécom, Georgia Tech Lorraine [Metz], Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-Georgia Institute of Technology [Atlanta]-CentraleSupélec-Ecole Nationale Supérieure des Arts et Metiers Metz-Centre National de la Recherche Scientifique (CNRS), Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Institut Universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), Université de Lille, Sciences et Technologies, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Ecole Nationale Supérieure des Arts et Metiers Metz-Georgia Institute of Technology [Atlanta]-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), and Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)
- Subjects
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] - Abstract
International audience; t User satisfaction is often considered as the objective that should be achieved by spoken dialogue systems. This is why, the reward function of Spoken Dialogue Systems (SDS) trained by Reinforcement Learning (RL) is often designed to reflect user satisfaction. To do so, the state space representation should be based on features capturing user satisfaction characteristics such as the mean speech recognition confidence score for instance. On the other hand, for deployment in industrial systems, there is a need for state representations that are understandable by system engineers. In this paper, we propose to represent the state space using a Genetic Sparse Distributed Memory. This is a state aggregation method computing state prototypes which are selected so as to lead to the best linear representation of the value function in RL. To do so, previous work on Genetic Sparse Distributed Memory for classification is adapted to the Reinforcement Learning task and a new way of building the prototypes is proposed. The approach is tested on a corpus of dialogues collected with an appointment scheduling system. The results are compared to a grid-based linear parametrisation. It is shown that learning is accelerated and made more memory efficient. It is also shown that the framework is calable in that it is possible to include many dialogue features in the representation, interpret the resulting policy and identify the most important dialogue features.
- Published
- 2016
24. Task Completion Transfer Learning for Reward Inference
- Author
-
El Asri, Layla, Laroche, Romain, Pietquin, Olivier, Georgia Tech Lorraine [Metz], Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-Georgia Institute of Technology [Atlanta]-CentraleSupélec-Ecole Nationale Supérieure des Arts et Metiers Metz-Centre National de la Recherche Scientifique (CNRS), Orange Labs [Issy les Moulineaux], France Télécom, Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Ecole Nationale Supérieure des Arts et Metiers Metz-Georgia Institute of Technology [Atlanta]-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC), and Pietquin, Olivier
- Subjects
[SPI]Engineering Sciences [physics] ,[SPI] Engineering Sciences [physics] ,[INFO]Computer Science [cs] ,[INFO] Computer Science [cs] - Abstract
International audience; Reinforcement learning-based spoken dialogue systems aimto compute an optimal strategy for dialogue managementfrom interactions with users. They compare their differentmanagement strategies on the basis of a numerical rewardfunction. Reward inference consists of learning a rewardfunction from dialogues scored by users. A major issue for rewardinference algorithms is that important parameters influenceuser evaluations and cannot be computed online. This isthe case of task completion. This paper introduces Task CompletionTransfer Learning (TCTL): a method to exploit theexact knowledge of task completion on a corpus of dialoguesscored by users in order to optimise online learning. Comparedto previously proposed reward inference techniques,TCTL returns a reward function enhanced with the possibilityto manage the online non-observability of task completion.A reward function is learnt with TCTL on dialogues with arestaurant seeking system. It is shown that the reward functionreturned by TCTL is a better estimator of dialogue performancethan the one returned by reward inference.
- Published
- 2014
25. NASTIA : Negotiating Appointment Setting Interface
- Author
-
El Asri, Layla, Lemonnier, Rémi, Laroche, Romain, Khouzaimi, Hatim, Pietquin, Olivier, Georgia Tech Lorraine [Metz], Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-Georgia Institute of Technology [Atlanta]-CentraleSupélec-Ecole Nationale Supérieure des Arts et Metiers Metz-Centre National de la Recherche Scientifique (CNRS), Orange Labs [Issy les Moulineaux], France Télécom, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Pietquin, Olivier, Ecole Nationale Supérieure des Arts et Metiers Metz-Georgia Institute of Technology [Atlanta]-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), and Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)
- Subjects
[SPI]Engineering Sciences [physics] ,[SPI] Engineering Sciences [physics] ,[INFO]Computer Science [cs] ,[INFO] Computer Science [cs] - Abstract
International audience; This paper describes a French Spoken Dialogue System (SDS) named NASTIA (Negotiating Appointment SeTting InterfAce).Appointment scheduling is a hybrid task halfway between slot-filling and negotiation. NASTIA implements three different negotiationstrategies. These strategies were tested on 1734 dialogues with 385 users who interacted at most 5 times with the SDS and gave a ratingon a scale of 1 to 10 for each dialogue. Previous appointment scheduling systems were evaluated with the same experimental protocol.NASTIA is different from these systems in that it can adapt its strategy during the dialogue. The highest system task completion ratewith these systems was 81% whereas NASTIA had an 88% average and its best performing strategy even reached 92%. This strategyalso significantly outperformed previous systems in terms of overall user rating with an average of 8.28 against 7.40. The experimentalso enabled highlighting global recommendations for building spoken dialogue systems.
- Published
- 2014
26. DINASTI : Dialogues with a Negotiating Appointment Setting Interface
- Author
-
El Asri, Layla, Laroche, Romain, Pietquin, Olivier, Georgia Tech Lorraine [Metz], Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-Georgia Institute of Technology [Atlanta]-CentraleSupélec-Ecole Nationale Supérieure des Arts et Metiers Metz-Centre National de la Recherche Scientifique (CNRS), Orange Labs [Issy les Moulineaux], France Télécom, Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Pietquin, Olivier, Ecole Nationale Supérieure des Arts et Metiers Metz-Georgia Institute of Technology [Atlanta]-Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), and Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)
- Subjects
[SPI]Engineering Sciences [physics] ,[SPI] Engineering Sciences [physics] ,[INFO]Computer Science [cs] ,[INFO] Computer Science [cs] - Abstract
International audience; This paper describes the DINASTI (DIalogues with a Negotiating Appointment SeTting Interface) corpus, which is composed of 1734dialogues with the French spoken dialogue system NASTIA (Negotiating Appointment SeTting InterfAce). NASTIA is a reinforcementlearning-based system. The DINASTI corpus was collected while the system was following a uniform policy. Each entry of the corpusis a system-user exchange annotated with 120 automatically computable features.The corpus contains a total of 21587 entries, with 385testers. Each tester performed at most five scenario-based interactions with NASTIA. The dialogues last an average of 10.82 dialogueturns, with 4.45 reinforcement learning decisions. The testers filled an evaluation questionnaire after each dialogue. The questionnaireincludes three questions to measure task completion. In addition, it comprises 7 Likert-scaled items evaluating several aspects of theinteraction, a numerical overall evaluation on a scale of 1 to 10, and a free text entry. Answers to this questionnaire are provided withDINASTI. This corpus is meant for research on reinforcement learning modelling for dialogue management.
- Published
- 2014
27. The Negotiation Dialogue Game.
- Author
-
Laroche, Romain and Genevay, Aude
- Published
- 2017
- Full Text
- View/download PDF
28. Incremental Human-Machine Dialogue Simulation.
- Author
-
Khouzaimi, Hatim, Laroche, Romain, and Lefèvre, Fabrice
- Published
- 2017
- Full Text
- View/download PDF
29. Compact and Interpretable Dialogue State Representation with Genetic Sparse Distributed Memory.
- Author
-
Asri, Layla El, Laroche, Romain, and Pietquin, Olivier
- Published
- 2017
- Full Text
- View/download PDF
30. Reward Function Learning for Dialogue Management
- Author
-
El Asri, Layla, Laroche, Romain, Pietquin, Olivier, IMS : Information, Multimodalité & Signal, SUPELEC-Campus Metz, Ecole Supérieure d'Electricité - SUPELEC (FRANCE)-Ecole Supérieure d'Electricité - SUPELEC (FRANCE), Orange Labs [Issy les Moulineaux], France Télécom, and K. Kersting and M. Toussaint
- Subjects
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] - Abstract
International audience; This paper addresses the problem of defining, from data, a reward function in a Reinforcement Learning (RL) problem. This issue is applied to the case of Spoken Dialogue Systems (SDS), which are interfaces enabling users to interact in natural language. A new methodology which, from system evaluation, apportions rewards over the system's state space, is suggested. A corpus of dialogues is collected on-line and then evaluated by experts, assigning a numerical performance score to each dialogue according to the quality of dialogue management. The approach described in this paper infers, from these scores, a locally distributed reward function which can be used on-line. Two algorithms achieving this goal are proposed. These algorithms are tested on an SDS and it is showed that in both cases, the resulting numerical rewards are close to the performance scores and thus, that it is possible to extract relevant information from performance evaluation to optimise on- line learning.
- Published
- 2012
31. Uncertainty Management in Dialogue Systems
- Author
-
Laroche, Romain, Bouchon-Meunier, Bernadette, Bretier, P., Machine Learning and Information Retrieval (MALIRE), Laboratoire d'Informatique de Paris 6 (LIP6), Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), and Publications, Lip6
- Subjects
[INFO]Computer Science [cs] ,[INFO] Computer Science [cs] - Abstract
International audience; Dialogue systems find more and more applications but the scientific improvements face difficulties to be transferred into the dialog technologies. France Telecom R&D is designing a new architecture based on uncertainty management in dialogue. Dialogue constraints lead to define a Logical Framework for Probabilistic Reasoning (LFPR) which results are close to the ones of the Evidence Theory and the Assumption-based Truth Maintenance System (ATMS). The paper eventually compares it to the Theory of Hints of Kohlas and Monney.
- Published
- 2008
32. Incremental diagnostic utility of systematic double-bed SPECT/CT for bone scintigraphy in initial staging of cancer patients.
- Author
-
Guezennec, Catherine, Keromnes, Nathalie, Robin, Philippe, Abgral, Ronan, Bourhis, David, Querellou, Solène, de Laroche, Romain, Le Duc-Pennec, Alexandra, Salaün, Pierre-Yves, and Le Roux, Pierre-Yves
- Published
- 2017
- Full Text
- View/download PDF
33. Content finder AssistanT.
- Author
-
Laroche, Romain
- Published
- 2015
- Full Text
- View/download PDF
34. Dialogue Efficiency Evaluation of Turn-Taking Phenomena in a Multi-layer Incremental Simulated Environment.
- Author
-
Khouzaimi, Hatim, Laroche, Romain, and Lefèvre, Fabrice
- Published
- 2015
- Full Text
- View/download PDF
35. Contextual Bandit for Active Learning: Active Thompson Sampling.
- Author
-
Bouneffouf, Djallel, Laroche, Romain, Urvoy, Tanguy, Feraud, Raphael, and Allesiardo, Robin
- Published
- 2014
- Full Text
- View/download PDF
36. Ordinal regression for interaction quality prediction.
- Author
-
Asri, Layla El, Khouzaimi, Hatim, Laroche, Romain, and Pietquin, Olivier
- Published
- 2014
- Full Text
- View/download PDF
37. Pulmonary Scintigraphy for the Diagnosis of Acute Pulmonary Embolism: A Survey of Current Practices in Australia, Canada, and France.
- Author
-
Le Roux, Pierre-Yves, Pelletier-Galarneau, Matthieu, De Laroche, Romain, Hofman, Michael S., Zuckier, Lionel S., Roach, Paul, Vuillez, Jean-Philippe, Hicks, Rodney J., Le Gal, Grégoire, and Salaun, Pierre-Yves
- Published
- 2015
- Full Text
- View/download PDF
38. Reward Shaping for Statistical Optimisation of Dialogue Management.
- Author
-
El Asri, Layla, Laroche, Romain, and Pietquin, Olivier
- Published
- 2013
- Full Text
- View/download PDF
39. Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch.
- Author
-
Shangtong Zhang, des Combes, Remi Tachet, and Laroche, Romain
- Subjects
- *
MARKOV operators , *STOCHASTIC analysis , *MARKOV processes , *CRITICS , *APPROXIMATION algorithms , *MARKOV chain Monte Carlo , *STOCHASTIC approximation - Abstract
In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy. Our work goes beyond existing works on the optimality of policy gradient methods in that existing works use the exact policy gradient for updating the policy parameters while we use an approximate and stochastic update step. Our update step is not a gradient update because we do not use a density ratio to correct the state distribution, which aligns well with what practitioners do. Our update is approximate because we use a learned critic instead of the true value function. Our update is stochastic because at each step the update is done for only the current state action pair. Moreover, we remove several restrictive assumptions from existing works in our analysis. Central to our work is the finite sample analysis of a generic stochastic approximation algorithm with time-inhomogeneous update operators on time-inhomogeneous Markov chains, based on its uniform contraction properties. [ABSTRACT FROM AUTHOR]
- Published
- 2022
40. Feasibility Study and Preliminary Results of Prognostic Value of Bone SPECT-CT Quantitative Indices for the Response Assessment of Bone Metastatic Prostate Carcinoma to Abiraterone.
- Author
-
de Laroche R, Bourhis D, Robin P, Delcroix O, Querellou S, Malhaire JP, Schlurmann F, Bourbonne V, Salaün PY, Schick U, and Abgral R
- Abstract
Objective: We assessed the prognostic value of quantitative indices extracted from bone SPECT-CT to evaluate the response of bone metastatic castrate-resistant prostate cancer (BmCRPC) to abiraterone. Methods: Consecutive patients with BmCRPC initiating treatment with abiraterone from March 2014 to March 2015 were prospectively included. Three 2-bed SPECT-CT [at baseline [M0], after 3 months [M3], and 6 months [M6] of treatment], were planned (Symbia Intevo
® , Siemens). SPECT data were reconstructed using an Ordered Subset Conjugate Gradient Minimization (OSCGM) algorithm allowing SUV quantification. SUVmax and SUVpeak of the highest uptake lesion were measured in each SPECT-CT. Total Neoplastic Osteoblastic Metabolic Volume (NOMV) was assessed. PSA level was recorded at baseline, M3, and M6 of treatment. Overall survival (OS), progression-free survival (PFS), and disease-specific survival (DSS) were calculated. Results: Nineteen patients aged 71.1 ± 7.7 years were included. Low M0 SUVmax was significantly predictive of longer OS ( p = 0.04). Low NOMV at M0 were significantly predictive of longer PFS ( p = 0.02). Patients with increase of at least 12.5% of the SUVpeak of the highest uptake lesion between M0 and M3 (ΔSUVpeakM0M3) had a significantly longer OS ( p = 0.03). Patients with increase (or decrease lesser than 25%) of ΔSUVpeakM0M3 had a significantly longer DSS ( p = 0.01). Patients with increase of NOMV of at least 45% between M0 and M6 had a significantly shorter PFS ( p < 0.001). Variations of NOMV between M0 and M6 were significantly correlated with PSA variations between M0 and M6 ( rs = 0.73, p = 0.02). Conclusions: Quantitative bone SPECT-CT appears to be a promising tool of BmCRPC assessment. Early flare-up phenomenon seems to predict longer OS., (Copyright © 2020 de Laroche, Bourhis, Robin, Delcroix, Querellou, Malhaire, Schlurmann, Bourbonne, Salaün, Schick and Abgral.)- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.