Start Over

Agent learning for automated bilateral negotiations

Authors :: Bagga, Pallavi
Publication Year :: 2021
Publisher :: Royal Holloway, University of London, 2021.
Abstract: The potential of automated negotiating agents is high as it plays a prominent part in various domains, such as economics, behavioural psychology, and commerce systems. However, in the literature, most of the negotiating agents use fixed or heuristic strategies which possess scalability issues as they may play well in one domain but not in another. Henceforth, endowing negotiating agents with a learning ability has gained a great deal of attention in the community of automated negotiation recently, in order to help obtain the beneficial agreement in a variety of negotiation situations. In this thesis, we explore the idea of using a Deep Reinforcement Learning (DRL) approach to develop learnable strategies for self-interested agents in the domain of automated bilateral negotiations. There are various forms of negotiation which require a strategy. This thesis starts by looking at the strategy where an agent can learn when it negotiates with many agents concurrently, but individual negotiations take place bilaterally over only one issue, such as the price of an item. In this setting, we propose ANEGMA, a novel agent model that uses an existing actor-critic architecture-based DRL to estimate the agent's negotiation strategy. The strategy also benefits from supervised training from synthetic negotiation data generated by teachers' strategies, thereby decreasing the exploration time required for learning during negotiation. As a result, an automated agent has been built that can adapt to different negotiation domains without the need to be pre-programmed. Experimental results show that the learned strategy outperforms the state-of-the-art "teacher" strategies in a range of settings for single-issue bilateral negotiation. We further extend our approach to deal with one-to-one non-concurrent negotiations over multiple issues such as the size, color, and price of an item. In this setting, we propose an extended model, called ANESIA, that relies upon interpretable "strategy templates" representing negotiation tactics or heuristics with learnable parameters. ANESIA uses a meta-heuristic approach offline, to learn the best combination of these tactics so that they can be employed during negotiation. In addition, ANESIA assumes that the agent has only partial information about the preferences of the user and does not know the opponent agent's preferences. To handle user preference uncertainties, ANESIA uses a stochastic search to best approximate the real user preferences. Besides this, ANESIA also combines multi-objective optimization and multi-criteria decision-making techniques to generate (near) Pareto-optimal bids during negotiation. A revised model called DLST-ANESIA is also developed to learn the combination of tactics on-line, using DRL. Both models, ANESIA and DLSTANESIA are experimentally evaluated, and the experiments show how these models increase the number of "win-win" outcomes. Since ANESIA agents attempt to approximate the real preferences of both negotiating parties, there is uncertainty involved in their estimated preferences. To address this uncertainty while proposing bids to the opponent party, we further extend the model by introducing an additional fuzzy component and name the model fuzzyANESIA. This model involves a two-phase bid generation step involving the use of fuzzy-multi-objective optimization and fuzzy-multi-criteria decision-making methods. The experimental evaluation empirically shows that our proposed negotiation model outperforms the state-of-the-art agents (used in previous years' negotiation competition) in most of the settings. On a short note, this thesis focuses on bilateral negotiations (i.e., negotiations between two agents), in which the agents exchange offers in turns. It primarily contributes towards learning ability of a negotiating agent where concurrency control is required for one or more issues. During the negotiation, the domain is known to both the negotiating agents, but their preferences and behaviour are private information. Our negotiating agent seeks to reach 'win-win' outcome within various time constraints (such as a deadline or discount factor) including modelling the user as well as the preferences of opponent agents.